CN107480549B - A kind of sensitive information desensitization method and system that data-oriented is shared - Google Patents

A kind of sensitive information desensitization method and system that data-oriented is shared Download PDF

Info

Publication number
CN107480549B
CN107480549B CN201710506066.1A CN201710506066A CN107480549B CN 107480549 B CN107480549 B CN 107480549B CN 201710506066 A CN201710506066 A CN 201710506066A CN 107480549 B CN107480549 B CN 107480549B
Authority
CN
China
Prior art keywords
data
desensitization
sensitive
sensitive information
attributes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710506066.1A
Other languages
Chinese (zh)
Other versions
CN107480549A (en
Inventor
张云云
王开红
于海龙
吴培文
陈涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yinjiang Technology Co.,Ltd.
Original Assignee
Enjoyor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Enjoyor Co Ltd filed Critical Enjoyor Co Ltd
Priority to CN201710506066.1A priority Critical patent/CN107480549B/en
Publication of CN107480549A publication Critical patent/CN107480549A/en
Application granted granted Critical
Publication of CN107480549B publication Critical patent/CN107480549B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The sensitive information desensitization method and system shared the present invention relates to a kind of data-oriented; the present invention uses statistics, natural language processing technique and machine learning techniques; the protection for using this whole process sensitive data from data publication to data application is realized, is proposed based on the automatic identification for constructing sensitive information keywords database, naming the sensitive informations such as entity class and address class;The Sensitive Attributes degree of association is calculated using Sigmoid function;Desensitization strategy is carried out using the building form of building Sensitive Attributes create-rule library and the desensitization algorithm for naming entity desensitization rule and core;Respectively in connection with numeric type Sensitive Attributes and categorical attribute desensitization depth calculation, obtain the desensitization degree of whole data set, and the method for taking download link address Hash realizes the controlled output etc. of data, it can guarantee data sensitive information security and maximize the sensitive information processing strategie for meeting analysis mining requirement, have the characteristics that desensitization effect is good, highly reliable.

Description

A kind of sensitive information desensitization method and system that data-oriented is shared
Technical field
It is shared the present invention relates to the interleaving techniques field more particularly to a kind of data-oriented of information technology and data safety Sensitive information desensitization method and system.
Background technique
In recent years, information technology and economic society cross to merge and have caused data and rapidly increase, and data have become important Sexual development resource.2016, government pushed information system and common data to interconnect opening and shares energetically, accelerated government information platform Information island is eliminated in integration, and recommending data resource is open to the society, guides social development, better services are in the public.However big Under data background, data opening and shares also bringing challenges property the problem of, Various types of data leakage event frequently occur, such as Anhui nearly six Thousand newborn's information leakage events, have targetedly fraudulent call event etc. at annual college entrance examination information leakage, so that the whole society More collaboration focused data safeguard protection is transferred to from data opening and shares are focused on.For this purpose, many countries promulgate range of information Safety-related laws and regulations, such as " privacy act " and " Government of the People's Republic of China's information discloses regulations " in China, this is just It is required that data have to comply with specific condition during opening and shares, it cannot be personal comprising mark in open data set The data of identity, to guarantee that the user of data set cannot be inferred to individual privacy information etc. easily;And again reasonably Meet common people's diversified demand, guarantee that data resource can generate new value.Therefore, data security protecting is realized, and can be most Bigization plays data resource utility value, is the challenging problem of current information security processing technology field.
In recent years, a large amount of research has been done in terms of protecting sensitive data.Patent No. CN201511026582.1 is from number It sets out according to the angle of desensitization system, describes the sensitive data under big data environment and circulating, exchanging the entire rings such as shared, transaction The protection of section, and different sensitive guard methods has been used in each link, it is also proposed that it is based on expert system and natural language The sensitive data of processing finds method, finally also passes through the metric data desensitization ring of verifying desensitization result correctness and authenticity Section.Patent No. CN201610338383.2 propose it is a kind of in a network environment to after data encryption will encryption code key and encryption after Desensitization Data Physical separate storage, and to encryption code key and desensitization data stringent access authority is set, guarantee that data add Close or decryption safety.The structured query language SQL that patent No. CN201510303954.4 is sent by receiving user Instruction judges to include sensitive data in accessed data, and passes through access privilege and pre-set desensitization conversion rule Then SQL instruction is converted, so that the desensitization data that the instruction after conversion is accessed.Patent No. CN201510755773.5 It discloses one kind and desensitization method is retained using format to different types of private data, be put in storage it with ciphertext form, can keep away Exempt from ciphertext length and define length greater than literary name section, cause data to load and occur, avoids type and source number after number field encryption It is mismatched according to type, data is caused to load error.
However in above-mentioned desensitization system or desensitization method, all have some limitations.Main cause is: (1) Most of desensitization system and method are both in the structural data of database, and for unstructured data (such as textual data According to) be not involved with and how to handle;(2) lack the completeness for considering sensitive data desensitization, if sensitive data desensitization depth is not It is enough, it prevents using non-sensibility data reconstruction sensitive data;(3) mark uniqueness is consistent with format after not can guarantee data desensitization Property require, such as hospital data, generally identified and positioned with identification card number it is personal, if calculated using desensitization algorithm or encryption Method, so that ID card information loses the uniqueness of mark and the consistency of format.
Summary of the invention
The present invention is to overcome above-mentioned shortcoming, and it is an object of the present invention to provide the sensitive information desensitization that a kind of data-oriented is shared Method and system, the present invention use statistics, natural language processing technique and machine learning techniques, realize from data publication to Data application uses the protection of this whole process sensitive data, proposes real based on building sensitive information keywords database, name The automatic identification of the sensitive informations such as body class and address class;The Sensitive Attributes degree of association is calculated using Sigmoid function;It is quick using constructing The building form of the desensitization algorithm of sense attribute create-rule library and name entity desensitization rule and core carries out desensitization strategy;Point Not Jie He numeric type Sensitive Attributes and categorical attribute desensitize depth calculation, obtain the desensitization degree of whole data set, and take The method of download link address Hash realizes the controlled output etc. of data, can guarantee data sensitive information security and maximize full The sensitive information processing strategie that sufficient analysis mining requires.
The present invention is to reach above-mentioned purpose by the following technical programs: a kind of sensitive information desensitization side that data-oriented is shared Method includes the following steps:
(1) sensitive information automatic identification rule and sensitive information processing rule are preset, wherein the sensitive information is certainly Dynamic recognition rule includes constructing all kinds of sensitive information keywords databases, the automatic knowledge to sensitive information in sensitive information keywords database Not, the automatic identification of number and numerical value class sensitive information, the automatic identification for naming entity class sensitive information, address class sensitive information Accurately identify;The sensitive information processing rule includes Sensitive Attributes create-rule, setting desensitization algorithm, name entity desensitization Processing, address information desensitization process;The data of data set provider publication are checked in data consumer's request;
(2) data are pre-processed, pre-processes laggard style of writing notebook data participle and part-of-speech tagging;
(3) automatic identification is carried out to sensitive information according to pre-set sensitive information automatic identification rule;
(4) it is analyzed by the Sensitive Attributes calculation of relationship degree to sensitive information, retains the Sensitive Attributes degree of association and be higher than threshold value Sensitive information;Wherein threshold value is preset;
(5) rule is handled according to pre-set sensitive information and desensitization process is carried out to sensitive information;
(6) the desensitization depth of sensitive information is calculated, and judges whether desensitization depth meets preset requirement;If no Meet, then return step (5) re-starts desensitization process;Otherwise, the data set after desensitization is exported, for data consumer It checks.
Preferably, the pretreatment operation of the step (2) is as follows: being divided according to data type the data of publication Class, data type include structured form types of databases data, list data, data warehouse data and non-structured document Data;It needs to check the integrality, consistency, correctness of attribute value when pretreatment, and by non-structured number of files According to text data is parsed into, parsed when document data parses using analytical tool.
Preferably, the automatic identification of the name entity class sensitive information is used based on hidden Markov HMM model The part-of-speech tagging and building name entity knowledge base combination of Viterbi algorithm are realized;The address class sensitive information It accurately identifies by judging that the adjacent sequence of terms of address information is realized.
Preferably, the Sensitive Attributes calculation of relationship degree method is as follows:
(a) degree of association of classifying type Sensitive Attributes is standardized using Sigmoid function, is such as given a definition:
Wherein, the codomain section of the function is [0,1], and continuous, smooth, monotonic increase;
(b) assume that every record has p attribute { u in data set T1,u2,...,up, and if each attribute respectively correspond Dry attribute value, is divided into and is denoted as { q1,q2,...,qp};In one record, the corresponding attribute value of Sensitive Attributes occurs being denoted as 1, Do not occur being denoted as 0, then this record can be expressed as (a q1+q2+...+qp) dimension row vectorWhen data set T has n item Record, is successively denoted as { t1,t2,...,tn, then just there are n (q1+q2+...+qp) dimension row vector, it is expressed as
(c) by (q1+q2+...+qp) correspond in dimension row vector value on position carry out with or and XOR operation, useIndicate with or when operation correspond to the case where attribute value is collectively labeled as 1 on position, useIt indicates with or transports Attribute value on position is corresponded to when calculation is collectively labeled as 0;The then degree of association S (I between two attributes1,I2) calculation formula is as follows:
Wherein, by parameter lambda in calculating1, λ2, λ3It is set to 0.5,0.25,0.25, and codomain is 0≤S (I1,I2)≤1。
Preferably, described check numbers carries out desensitization process with the sensitive information of numeric type specifically: sensitive by formulating The rule is stored in Sensitive Attributes create-rule library by the rule that attribute generates, and is called preset based on data distortion and encryption Desensitization algorithm converts newborn Sensitive Attributes value according to desensitization task, the data after eventually forming desensitization.
Preferably, code table of the described pair of name entity class sensitive information using a common Chinese name entity, storage The mechanism name and Chinese Name of million ranks are replaced after original name entity progress Hash tables look-up, complete desensitization process; Method to address class sensitive information is to be desensitized according to the level of detail of address information, will switch to longitude and latitude by address, If can not parse original sensitive address information, do not need to desensitize, explanation is that comparison obscures address;If can parse Related latitude and longitude information generates another new ground then according to longitude and latitude is converted in the range of original address location/county out Location information, and address to street/small towns is obscured according to user's access right.
Preferably, the desensitization depth is difference degree between the data set and raw data set measured after desensitizing, Difference degree size is directly proportional to desensitization depth size, and calculation method is as follows: the calculating of (I) Numeric Attributes desensitization depth:
Assuming that Numeric Attributes codomain of attribute value before desensitization isAttribute value after desensitizationThen Numeric Attributes desensitization depth Dsz(m,m*):
(II) categorical attribute desensitization depth calculation:
The desensitization depth of categorical attribute is sought by extensive tree-model is constructed, categorical attribute is calculated using following formula Desensitize depth Dfl(r,r*):
Dfl(r,r*)=((Nh-1)×Step(r,r*))/((N-1)×step(r,e))
Wherein, r, r*Attribute value after indicating attribute value before desensitizing and desensitizing, NhIndicate a certain preceding attribute of categorical attribute desensitization The child node number of value and its same father node, N indicate extensive leaf nodes number, and e indicates root node, setp (x, y) table Show attribute value node x desensitization after attribute node y the number of steps of;
(III) combining step (I) and step (II) obtain data set desensitization depth calculation formula D (T, T*), it is as follows:
Wherein, n indicates contained record number in data set;c1, c2It is expressed as Numeric Attributes number and categorical attribute Number.
Preferably, the method that the data set after described pair of desensitization takes Hash converts under original storage link generation newly The mode for carrying chained address carries out the controlled output of data.
A kind of shared sensitive information of data-oriented desensitizes system, including System Management Unit, data source administrative unit, quick Feel information identificating unit, sensitive information processing unit, data outputting unit;The System Management Unit is for constructing desensitization system User account and access control identify the role and permission of user, only allow the legal user's operation for closing power corresponding Data;The data source administrative unit includes storing data source information;The sensitive information recognition unit is each for automatic identification Sensitive information in categorical data source, and calculate data source and concentrate each Sensitive Attributes relevance;The sensitive information processing unit For automatically creating desensitization task, matching desensitization strategy and desensitization algorithm;The data outputting unit is for safely and effectively controlling The data output that sensitive data processed uses;System Management Unit, data source administrative unit, sensitive information recognition unit, sensitive letter Breath processing unit, data outputting unit are sequentially connected.
Preferably, the data source administrative unit includes data source types, IP address, storage address and data source data Structure extraction and management;The sensitive information recognition unit is based on natural language processing technique and carries out at participle to text data Reason, on the basis ofs constructing all kinds of sensitive information knowledge bases, mark sensitive information rank etc. using manual type, rule-based and mould Formula matching way automatic identification sensitive information, while introducing Sigmoid functional based method and calculating the Sensitive Attributes degree of association;The sensitivity Information process unit is based on natural language processing technique and data request for utilization is examined and created automatically corresponding desensitization times Business, Sensitive Attributes create-rule library is respectively adopted, searched using Hash table converted within the scope of mode and address information longitude and latitude and The modes such as all kinds of desensitization algorithms carry out desensitization process to all kinds of sensitive informations;The data outputting unit is quick by desensitization process Feel attribute value and replace original Sensitive Attributes value, and new storage address is generated using hash algorithm transformation initial data storage address Output data.
The beneficial effects of the present invention are: (1) present invention can to avoid desensitization data in data set unique identification belong to Property duplicating property problem;(2) by the present invention in that calculating the degree of association of Sensitive Attributes with Sigmoid function, realization will be close The high Attribute transposition of degree can not only prevent desensitization data and be rebuild, can also delete the weak attribute of correlation at one group, Operation efficiency is provided;(3) present invention combines the desensitization depth of Numeric Attributes and categorical attribute, to calculate entire data set Desensitization degree efficiently control desensitization effect in such a way that threshold value is set;(4) method that the present invention takes Hash converts Original storage link generates new download link address, realizes the controlled output of data, can guarantee data sensitive information security Protection;(5) present invention is suitable for the sensitive information of the structural data of type of database and the unstructured data of Doctype Desensitization, has the characteristics that desensitization effect is good, highly reliable.
Detailed description of the invention
Fig. 1 is the configuration diagram of present system;
Fig. 2 is the flow diagram of the method for the present invention;
Fig. 3 is the data source format schematic diagram inputted in the embodiment of the present invention;
Fig. 4 is the name Entity recognition block diagram in the embodiment of the present invention.
Specific embodiment
The present invention is described further combined with specific embodiments below, but protection scope of the present invention is not limited in This:
Embodiment: as shown in Figure 1, the sensitive information desensitization system that a kind of data-oriented is shared includes for being arranged and managing System user account information constructs the System Management Unit of role and authority configuration;For the data source capsule of storing data source information Manage unit;Can sensitive information in all types of data sources of automatic identification, and data source can be calculated and concentrate each Sensitive Attributes association The sensitive information recognition unit of property;Desensitization task can be automatically created, matching is desensitized at tactful and desensitization algorithm sensitive information Manage unit;The data outputting unit that sensitive data uses can safely and effectively be controlled.The System Management Unit includes building The system user account that desensitizes and access control, identify the role and permission of user, only allow the legal user behaviour for closing power Make corresponding data.
The data source administrative unit includes storing data source information, including original data source information and target data source Information, the type of data source are that database data, document data, data warehouse data etc. are one of or a variety of.Unified Global control sensitive data source, including data-source IP address, storage address, title, data class may be implemented in data source control Type and type of database and username and password etc.;Can all types of data sources be carried out with pretreatment operation, pretreatment simultaneously Data source afterwards regenerates address link, uses for subsequent sensitive information recognition unit and sensitive information processing unit.
The sensitive information recognition unit can according to building sensitive information knowledge base, default sensitive information discovery rule, The all types of data sources of automatic identifications such as customized discovery rule are marked related to Sensitive Attributes by the sensitive information rank of priori Property analysis, further determine that Sensitive Attributes at different levels and the relevance between it, prevent because sensitive data desensitization degree not Cause sensitive data to be rebuild deeply, causes secondary leakage.
The sensitive information processing unit can be based on user right and access control, be arranged for Sensitive Attributes at different levels Corresponding desensitization strategy, desensitization rule and desensitization algorithm, while supporting customized setting desensitization process method.
The data outputting unit can be realized in data using being protected in downloading process, and output protection method is will The Sensitive Attributes value of desensitization process replaces original Sensitive Attributes value, and generates new storage address, but do not change source data Storage address and content, the address data memory after desensitization is raw by using hash algorithm transformation initial data storage address At, while for the storage efficiency of less big data platform, desensitization data are destroyed in time.
The source of the data set of the present embodiment is the people's mediation document of certain city part, in every deed of arrangement, except concluding a case It is all document data by details and reconciliation agreement, such as PDF, word document, other attributes are deposited in the form of structural data It is placed in database table.
As shown in Fig. 2, the sensitive information desensitization method that a kind of data-oriented is shared, specific embodiment are as follows:
Step 1: acquisition, the pretreatment of data
Step 1.1: data acquisition
Data set provider is released news by the account and permission obtained in System Management Unit, and the data that will acquire It is stored in data source administrative unit, if table 1 is people's mediation case field composed structure.
Table 1
Input data source format (due to being related to individual privacy, has been done in the data of input as shown in Figure 3 in systems Desensitization process replaces number with letter, but is considered truthful data for the time being in the present invention): when data consumer obtains data When, applied, after examination & approval pass through, system requests to carry out data desensitization operation according to application.
Step 1.2: structured data type pretreatment and document data parsing
The pretreatment of structural data mainly to it is noise-containing in each attribute value (including mistake, exist deviation expectation Outlier), it is inconsistent that (representation of certain attribute values is inconsistent in data set, as gone out in date of birth and identity card Phase birthday is inconsistent), there are the data that unique identification attribute has situations such as repeated (such as ID card No. repetition), missing values It is marked;Expression is not inconsistent normally, if case is 16-06-12 by time of origin, initial data should be transformed to 2016-06- 12;
Document data parsing is to use corresponding analytical tool parsing to extract document text content, as POI parses WORD Document can also parse other document formats with PDFBox operation pdf document etc., as HTML, WORD, XML, PDF, EXCEL, TXT。
Step 1.3: text data participle and part-of-speech tagging
(1) it reconciles as follows in case " case is by details ":
Party A and Party B fasten neighborhood downstairs, live in Shanghai City Center Road ABC and do No. A 203 Room water taps It does not fasten, causes 103 Room cabinet of Zhang San family infiltration downstairs, clothing drenches, go to and upstairs solved, and discovery interior should not have People exists, and then looks for property and holds consultation, and learns that owner's name is Li Si, contact method 19821210912 is contacted immediately and wanted Timely processing is asked, but after 3 days, is upstairs also not handled by, user's heavy losses downstairs, the present village the Xiang Yi people are had resulted in Mediation committee's application is reconciled, it is desirable that 103 Room owner's reimbursements of damages.
(2) it introduces dictionary and stop words is segmented
It is more customized about dictionaries such as mechanism name suffix, area, new word, special words, be such as added " mediation committee ", " upper to go downstairs ", the Belt and Road, " construction project ", participle can pay the utmost attention to dictionary, then " promote the Belt and Road construction project ", just Preferentially it is divided into propulsion/the Belt and Road/construction project;Existing various deactivated vocabularys in network are arranged, the base of duplicate removal, leak repairing is carried out On on plinth, arrange one than more comprehensive vocabulary, such as " Party A ", " Party B ", " both sides ", " progress ", " even if " word and each Kind punctuation mark etc..
(3) " case is by details " participle and part-of-speech tagging result
Upper/n neighbours/n inhabitation/Shanghai City v/Center Road the ns/room No. A/m/m203/m of the ns/n tap water/l faucet/n that goes downstairs Fasten/v downstairs/the s Zhang San/room nr 103/m/n cabinet/n infiltration/v clothing/n drenches/n goes to/v upstairs/n solution/v discovery/v Interior/s do not have/and v people/n searching/v property/n negotiation/n learns/v owner/n name/n Li Si/nr contact method/n 19821210912/m connection/n requirement/v processing in time/i 3/m days/q upstairs/n handle/n downstairs/n user/n seriously damages Mistake/mono- village the l/n people's mediation committee/n application/v reconciles the room v 103/m/n owner/n reimbursement of damages/n.
Wherein, step 1.1 belongs to System Management Unit functional category, and step 1.2,1.3 belong to data source administrative unit function It can scope.
Step 2: construction sensitive information keywords database
All kinds of sensitive information keywords databases are constructed by manual type and mark its sensitive rank, in number and numerical value class The keywords database of contact method has telephone number, contact method, cell-phone number (code), communication modes, home Tel, mobile number, connection The various expression ways such as logical number, telecom number.Sensitive information is divided by four kinds of ranks, first level according to sensitivity simultaneously To can recognize attribute, the attribute of someone, such as ID card No., name, address can be definitely positioned;Second level is half identification Attribute, i.e., single-row attribute can not position someone, but multiple row information can be used to potentially identify someone;Third level is Sensitive Attributes, such as disease, income, schooling;Fourth level is non-sensitive attribute;As shown in table 2;In addition in the present invention The sensitive information being discussed is first three rank Sensitive Attributes, as shown in table 3.
Table 2
Table 3
Step 3: number and numerical value class sensitive information automatic identification
Number and numeric type sensitive information are identity card ID, various card accounts and password, contact method, virtual account and close The information such as code, license plate number, Social Security Number.The identification of such sensitive information can be based on create-rule, pass through regular expression Mode finds, and such sensitive information is all that can definitely identify someone, therefore be such attribute labeling is identifiable Sensitive Attributes.
Step 4: naming the automatic identification of entity class
The recognition methods of name, mechanism name is based on the Viterbi of the hidden Markov HMM model of natural language processing technique The part-of-speech tagging and building name entity knowledge base combination of algorithm are identified.
Wherein building name entity knowledge base includes construction sensitive information keywords database, all kinds of name entity patterns, front and back Sew rule and situation template.Training corpus can be passed through by wherein naming entity patterns, front and back to sew rule and the discovery of position Library, the feature vocabulary and front and back for obtaining name entity sew regular vocabulary and corresponding position vocabulary, then have marked with participle tool Part of speech is combined mode and extracts entity part above and below part of speech, as shown in Figure 4.
Step 5: address category information accurately identifies
It is obtained by the adjacent sequence of terms of the address fragment information after judgement participle than better address information, if adjacent It connects sequence of terms (the continuous 2-3 word of context) to be expressed as address category information or meet address matching rule, then carries out Combination is again identified that, and will carry out longitude and latitude conversion, if it is possible to calculate latitude and longitude information, then it represents that such address Information is that can recognize attribute value.For example, Shanghai City/Zhongshan Road ns/No. A/m/m of ns, detects Shanghai City, the table of Zhongshan Road two Take over the land for use location Sensitive Attributes, and according to address pattern match obtain below No. A also belong to address information, so that it may these are abutted Sequence of terms combination, so that it may obtain this better address of Shanghai City Center Road A, and calculate its longitude and latitude.
Step 6: Sensitive Attributes calculation of relationship degree
It is calculated by Attribute Association degree, finds the correlation in data set between Sensitive Attributes, and the degree of association is bigger, then say Bright correlation is stronger.By calculating the degree of association between Sensitive Attributes, can to gather the more close Sensitive Attributes of the degree of association It is combined, the very weak attribute of the degree of association is deleted, the size of desensitization data set can not only be reduced in this way, reduce data The operand of desensitization process, improves the execution efficiency of respective algorithms, at the same can also by identity that priori knowledge determines and Half identity Sensitive Attributes, then excavate arrive other Sensitive Attributes in this way, data desensitization effect is further increased, is prevented Recombination causes leakage may between Sensitive Attributes.
In the present invention, it is standardized using the degree of association of the Sigmoid function to classifying type Sensitive Attributes, it is as follows Definition:
Wherein, the codomain section of the function is [0,1], and continuous, smooth, monotonic increase.As x=0, codomain 0.5.
Assuming that every record has p attribute { u in data set T1,u2,...,up, and each attribute respectively correspond it is several A attribute value, is divided into and is denoted as { q1,q2,...,qp, in a record, the corresponding attribute value of Sensitive Attributes occurs being denoted as 1, no Appearance is denoted as 0, then this record can be expressed as (a q1+q2+...+qp) dimension row vectorWhen data set T has n item note Record, is successively denoted as { t1,t2,...,tn, then just there are n (q1+q2+...+qp) dimension row vector, it is expressed as
By (q1+q2+...+qp) correspond in dimension row vector value on position carry out with or and XOR operation, use Indicate with or when operation correspond to the case where attribute value is collectively labeled as 1 on position, useIndicate with or when operation correspond to position Upper attribute value is collectively labeled as 0;The then degree of association S (I of two attributes1,I2) calculation formula are as follows:
Wherein, by parameter lambda in the present invention1, λ2, λ3It is respectively set to 0.5,0.25,0.25, and codomain is 0≤S (I1,I2) ≤1。
The degree of association between Sensitive Attributes is measured by constructing Sigmoid function in the present embodiment.Use formula (1), formula (2) calculates the related coefficient of two Sensitive Attributes, and related coefficient is bigger, then correlation is higher.
Such as: schooling attribute value { university, senior middle school, junior middle school, primary school }, wage category attribute value have 10K or more, 10K-8K, 8K-6K, 8K-6K, 2K-4K, 2K or less }, will according to schooling and wage category attribute value university, senior middle school, just In, primary school, 10K or more, 10K-8K, 8K-6K, 8K-6K, 2K-4K, 2K or less }, when record 1, record 2, the column that record 3 obtains Vector is
{ 1,0,0,0,1,0,0,0,0,0 }, { 0,0,1,0,0,0,1,0,0,0 }, { 1,0,0,0,1,0,0,0,0,0 }.
Three above record calculate two-by-two with or with exclusive or, θ (x)=0.4 is obtained, is then calculated using formula (1) Obtaining correlation is 0.95.
In the present invention, other methods can also be used to be calculated when Sensitive Attributes calculation of relationship degree, in the present invention It protects in right, such as Apriori algorithm based on correlation rule frequent item set, the sensitivity for the condition that meets is found by iterative manner Attribute frequent item set;In addition there are also use mean-square contingency coefficient, it is assumed that two Sensitive Attributes are I1And I2, codomain is respectively {v11,v12,...,v1pAnd { v21,v22,...,v2q}.So I1And I2Mean-square contingency coefficient are as follows:
Wherein, Sensitive Attributes value v1iAnd v2jFrequency of occurrence f is concentrated in initial dataiAnd fjTo indicate.fijIndicate v1iAnd v2j The number occurred in same record.Therefore, fiAnd f.jWith following relationship:AndAnd 0 ≤Φ2(S1,S2)≤1。
Wherein, step 2 to step 6 belongs to sensitive information recognition unit functional category.
Following steps 7 to step 11 belongs to sensitive information processing unit functional category, and system can be based on natural language Application content (including applied field is filled in the automatic desensitization request or data consumer that processing technique submits data set provider Scape, application purpose etc.) examination & approval automatically, copending by rear, desensitization task, the system accordingly of creating automatically identify request for data Middle sensitive information, and desensitization process is carried out according to corresponding desensitization task.
Step 7: setting desensitization algorithm
Desensitization algorithm based on data distortion and encryption is set in systems, as random number replacement, customized exchange replace, Hash, Encryption Algorithm etc. convert initial data;It can also carry out blocking number according to the actual requirement of desensitization task simultaneously According to certain characters, data generaliza-tion etc.;
Step 8: being desensitized based on Sensitive Attributes create-rule library
For the rule that the sensitive data of number or numeric type can be generated by formulating Sensitive Attributes, which is stored in Sensitive field create-rule library;The rule that the sensitivity field generates can be equal to the generation of the sensitive field in initial data completely Rule, then preset desensitization algorithm converts newborn Sensitive Attributes value according to desensitization task in invocation step 7, last shape At data after desensitization.Create-rule, the create-rule on date such as ID card No. carry out on sensitive position further according to certain rule Character such as is replaced, obscures at the operation, and retains the character that there is administrative region, age bracket, gender etc. to statistically analyze meaning, this Complete high emulation may be implemented in sample, the uniqueness of identification number is also ensured, and provide convenience for statistical analysis, so that can not Differentiate its authenticity.
Step 9: naming the desensitization process of entity
Desensitization method for the name entity of mechanism first name and last name name is the code table that entity is named using a common Chinese, The mechanism name and Chinese Name for storing million ranks are replaced after original name entity progress Hash tables look-up;
Step 10: address information desensitization process
It for the sensitive data of address class, can be desensitized according to the level of detail of address information, method is to pass through ground Location switchs to longitude and latitude, if can not parse original sensitive address information, does not need to desensitize, explanation is to compare faintly Location;If related latitude and longitude information can be parsed, according to longitude and latitude is converted in the range of original address location/county, give birth to Address to street/small towns is obscured at another new address information, and according to user's access right.
Step 11: sensitive information desensitization depth calculation
Desensitization depth is difference degree between the data set and raw data set measured after desensitizing, if otherness is bigger, Indicate that desensitization depth is bigger, i.e., Information Security is bigger;On the contrary then safety is with regard to smaller.Wherein, desensitize depth computing method It is as follows:
11.1) Numeric Attributes desensitization depth
Assuming that Numeric Attributes codomain of attribute value before desensitization isAttribute value after desensitizationSo Numeric Attributes desensitization depth Dsz(m,m*):
11.2) categorical attribute desensitization depth calculation
In the present invention, it needs to seek the desensitization depth of categorical attribute by extensive tree-model is constructed, using following public affairs Formula calculates categorical attribute desensitization depth Dfl(r,r*):
Dfl(r,r*)=((Nh-1)×Step(r,r*))/((N-1)×step(r,e)) (5)
Wherein, r, r*Attribute value after indicating attribute value before desensitizing and desensitizing, NhIndicate a certain preceding attribute of categorical attribute desensitization The child node number of value and its same father node, N indicate extensive leaf nodes number, and e indicates root node, setp (x, y) table Show attribute value node x desensitization after attribute node y the number of steps of.
11.3) data set desensitization depth calculation
It is D (T, T in conjunction with data set desensitization depth calculation formula 11.1) and 12.1), is obtained*):
Wherein, n indicates contained record number in data set;c1, c2It is expressed as Numeric Attributes number and categorical attribute Number.
In the present embodiment, numeric type Sensitive Attributes are calculated separately out using formula (4), formula (5) and classifying type is sensitive The desensitization depth of attribute is calculating entire data set desensitization depth using formula (6).
In the present invention, the calculating of data set desensitization depth is not limited to the calculation method of step 11, while can also use Other methods such as indicate data desensitization depth, expression formula using the information loss amount of entropy are as follows:
Wherein RmIndicate the record number containing m in data set, RnIndicate the record number after a desensitization process comprising n, andH(Rn)、H(Rm) indicate RnAnd RmComentropy.
In addition, H (Rn) and H (Rm) general expression are as follows:
And freq (Rx, s) and indicate RxData set has the record number of s.
Step 12: data desensitization output
Data consumer obtains the data after desensitization according to user right, and data output protection method is by desensitization process Sensitive Attributes value replace original Sensitive Attributes value, and generate new storage address, but do not change the storage address of source data And content, the address data memory after desensitization are generated by using hash algorithm transformation initial data storage address, are simultaneously The storage efficiency of less big data platform in time destroys desensitization data.
It is specific embodiments of the present invention and the technical principle used described in above, if conception under this invention institute The change of work when the spirit that generated function is still covered without departing from specification and attached drawing, should belong to of the invention Protection scope.

Claims (9)

1. a kind of shared sensitive information desensitization method of data-oriented, which comprises the steps of:
(1) sensitive information automatic identification rule and sensitive information processing rule are preset, wherein the sensitive information is known automatically Not rule include construct all kinds of sensitive information keywords databases, to the automatic identification of sensitive information in sensitive information keywords database, number Code and the automatic identification of numerical value class sensitive information, automatic identification, the essence of address class sensitive information of name entity class sensitive information Really identification;The sensitive information handles rule and includes Sensitive Attributes create-rule, setting desensitization algorithm, names at entity desensitization Reason, address information desensitization process;The data of data set provider publication are checked in data consumer's request;
(2) data are pre-processed, pre-processes laggard style of writing notebook data participle and part-of-speech tagging;
(3) automatic identification is carried out to sensitive information according to pre-set sensitive information automatic identification rule;
(4) it is analyzed by the Sensitive Attributes calculation of relationship degree to sensitive information, retains the Sensitive Attributes degree of association and be higher than the quick of threshold value Feel information;Wherein threshold value is preset;Wherein, the Sensitive Attributes calculation of relationship degree method is as follows:
(a) degree of association of classifying type Sensitive Attributes is standardized using Sigmoid function, is such as given a definition:
Wherein, the codomain section of the function is [0,1], and continuous, smooth, monotonic increase;
(b) assume that every record has p attribute { u in data set T1,u2,...,up, and each attribute respectively corresponds several Attribute value is divided into and is denoted as { q1,q2,...,qp};In one record, the corresponding attribute value of Sensitive Attributes occurs being denoted as 1, does not go out It is now denoted as 0, then this record can be expressed as (a q1+q2+...+qp) dimension row vectorWhen data set T have n item record, Successively it is denoted as { t1,t2,...,tn, then just there are n (q1+q2+...+qp) dimension row vector, it is expressed as
(c) by (q1+q2+...+qp) correspond in dimension row vector value on position carry out with or and XOR operation, useTable The case where attribute value is collectively labeled as 1 on position is corresponded to when showing same or operation, is usedIndicate with or when operation correspond on position Attribute value is collectively labeled as 0;The then degree of association S (I between two attributes1,I2) calculation formula is as follows:
Wherein, by parameter lambda in calculating1, λ2, λ3It is set to 0.5,0.25,0.25, and codomain is 0≤S (I1,I2)≤1;
(5) rule is handled according to pre-set sensitive information and desensitization process is carried out to sensitive information;
(6) the desensitization depth of sensitive information is calculated, and judges whether desensitization depth meets preset requirement;If not being inconsistent It closes, then return step (5) re-starts desensitization process;Otherwise, the data set after desensitization is exported, is looked into for data consumer It sees.
2. a kind of shared sensitive information desensitization method of data-oriented according to claim 1, it is characterised in that: the step Suddenly the pretreatment operation of (2) is as follows: being classified to the data of publication according to data type, data type includes structured form Types of databases data, list data, data warehouse data and non-structured document data;It needs when pretreatment to attribute value Integrality, consistency, correctness checked, and non-structured document data be parsed into text data, document data It is parsed when parsing using analytical tool.
3. a kind of shared sensitive information desensitization method of data-oriented according to claim 1, it is characterised in that: the life Name entity class sensitive information automatic identification using the Viterbi algorithm based on hidden Markov HMM model part-of-speech tagging and Building name entity knowledge base combination is realized;The address class sensitive information is accurately identified by judging address information Adjacent sequence of terms is realized.
4. a kind of shared sensitive information desensitization method of data-oriented according to claim 1, it is characterised in that: check numbers Desensitization process is carried out with the sensitive information of numeric type specifically: the rule is stored in by the rule generated by formulating Sensitive Attributes Sensitive Attributes create-rule library, call the preset desensitization algorithm based on data distortion and encryption to newborn Sensitive Attributes value according to Desensitization task is converted, the data after eventually forming desensitization.
5. a kind of shared sensitive information desensitization method of data-oriented according to claim 1, it is characterised in that: to name Entity class sensitive information stores the mechanism name and Chinese Name of million ranks using the code table of a common Chinese name entity, It is replaced after original name entity progress Hash is tabled look-up, completes desensitization process;Method to address class sensitive information is root It desensitizes according to the level of detail of address information, longitude and latitude will be switched to by address, if can not parse original sensitive address letter Breath, then do not need to desensitize, and explanation is that comparison obscures address;If related latitude and longitude information can be parsed, according to original Longitude and latitude is converted in the range of beginning address location/county, generates another new address information, and according to user's right to use Limit fuzzy address to street/small towns.
6. a kind of shared sensitive information desensitization method of data-oriented according to claim 1, it is characterised in that: described Desensitization depth is difference degree between the data set and raw data set measured after desensitizing, and difference degree size and desensitization depth are big Small directly proportional, calculation method is as follows:
(I) calculating of Numeric Attributes desensitization depth:
Assuming that Numeric Attributes codomain of attribute value before desensitization isAttribute value after desensitizationThen Numeric Attributes desensitization depth Dsz(m,m*):
(II) categorical attribute desensitization depth calculation:
The desensitization depth of categorical attribute is sought by extensive tree-model is constructed, categorical attribute desensitization is calculated using following formula Depth Dfl(r,r*):
Dfl(r,r*)=((Nh-1)×Step(r,r*))/((N-1)×step(r,e))
Wherein, r, r*Attribute value after indicating attribute value before desensitizing and desensitizing, NhIndicate before a certain categorical attribute desensitizes attribute value with The child node number of its same father node, N indicate extensive leaf nodes number, and e indicates that root node, setp (x, y) indicate to belong to Property value node x desensitization after attribute node y the number of steps of;
(III) combining step (I) and step (II) obtain data set desensitization depth calculation formula D (T, T*), it is as follows:
Wherein, n indicates contained record number in data set;c1, c2It is expressed as Numeric Attributes number and categorical attribute Number.
7. a kind of shared sensitive information desensitization method of data-oriented according to claim 1, it is characterised in that: described right The mode that the method that data set after desensitization takes Hash converts the new download link address of original storage link generation is counted According to controlled output.
The system 8. a kind of shared sensitive information of data-oriented desensitizes, it is characterised in that: including System Management Unit, data source capsule Manage unit, sensitive information recognition unit, sensitive information processing unit, data outputting unit;The System Management Unit is used for structure Desensitization system user account and access control are built, the role and permission of user are identified, only allows the legal user for closing power Operate corresponding data;The data source administrative unit includes storing data source information;The sensitive information recognition unit is used for Sensitive information in all types of data sources of automatic identification, and calculate data source and concentrate each Sensitive Attributes relevance;Wherein, sensitive to belong to Property calculation of relationship degree method is as follows:
(a) degree of association of classifying type Sensitive Attributes is standardized using Sigmoid function, is such as given a definition:
Wherein, the codomain section of the function is [0,1], and continuous, smooth, monotonic increase;
(b) assume that every record has p attribute { u in data set T1,u2,...,up, and each attribute respectively corresponds several Attribute value is divided into and is denoted as { q1,q2,...,qp};In one record, the corresponding attribute value of Sensitive Attributes occurs being denoted as 1, does not go out It is now denoted as 0, then this record can be expressed as (a q1+q2+...+qp) dimension row vectorWhen data set T have n item record, Successively it is denoted as { t1,t2,...,tn, then just there are n (q1+q2+...+qp) dimension row vector, it is expressed as
(c) by (q1+q2+...+qp) correspond in dimension row vector value on position carry out with or and XOR operation, useTable The case where attribute value is collectively labeled as 1 on position is corresponded to when showing same or operation, is usedIndicate with or when operation correspond on position Attribute value is collectively labeled as 0;The then degree of association S (I between two attributes1,I2) calculation formula is as follows:
Wherein, by parameter lambda in calculating1, λ2, λ3It is set to 0.5,0.25,0.25, and codomain is 0≤S (I1,I2)≤1;
The sensitive information processing unit is for automatically creating desensitization task, matching desensitization strategy and desensitization algorithm;The data The data output that output unit is used for safely and effectively control sensitive data;System Management Unit, data source administrative unit, Sensitive information recognition unit, sensitive information processing unit, data outputting unit are sequentially connected.
The system 9. a kind of shared sensitive information of data-oriented according to claim 8 desensitizes, it is characterised in that: the number It include that data source types, IP address, storage address and data source data structure are extracted and managed according to source control unit;The sensitivity Information identificating unit is based on natural language processing technique and carries out word segmentation processing to text data, all kinds of being constructed using manual type On the basis of sensitive information knowledge base, mark sensitive information rank, rule-based and pattern matching mode automatic identification sensitive information, Sigmoid functional based method is introduced simultaneously calculates the Sensitive Attributes degree of association;The sensitive information processing unit is based at natural language Reason technology is examined and is created automatically corresponding desensitization task to data request for utilization, and Sensitive Attributes create-rule is respectively adopted Library, searched using Hash table converted within the scope of mode and address information longitude and latitude and all kinds of desensitization algorithms to all kinds of sensitive informations into Row desensitization process;The Sensitive Attributes value of desensitization process is replaced original Sensitive Attributes value by the data outputting unit, and is used Hash algorithm transformation initial data storage address generates new storage address output data.
CN201710506066.1A 2017-06-28 2017-06-28 A kind of sensitive information desensitization method and system that data-oriented is shared Active CN107480549B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710506066.1A CN107480549B (en) 2017-06-28 2017-06-28 A kind of sensitive information desensitization method and system that data-oriented is shared

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710506066.1A CN107480549B (en) 2017-06-28 2017-06-28 A kind of sensitive information desensitization method and system that data-oriented is shared

Publications (2)

Publication Number Publication Date
CN107480549A CN107480549A (en) 2017-12-15
CN107480549B true CN107480549B (en) 2019-08-02

Family

ID=60596029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710506066.1A Active CN107480549B (en) 2017-06-28 2017-06-28 A kind of sensitive information desensitization method and system that data-oriented is shared

Country Status (1)

Country Link
CN (1) CN107480549B (en)

Families Citing this family (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009435B (en) * 2017-12-18 2020-12-18 网智天元科技集团股份有限公司 Data desensitization method, device and storage medium
CN108428187A (en) * 2017-12-21 2018-08-21 中国平安人寿保险股份有限公司 Address matching method, apparatus and storage medium
CN108280355A (en) * 2017-12-26 2018-07-13 山东浪潮云服务信息科技有限公司 A kind of data desensitization method and device
CN108268800A (en) * 2017-12-29 2018-07-10 上海上讯信息技术股份有限公司 A kind of address desensitization method of configurable regional extent and information format
CN108133294B (en) * 2018-01-10 2020-12-04 阳光财产保险股份有限公司 Prediction method and device based on information sharing
CN108846292B (en) * 2018-05-30 2021-08-17 中国联合网络通信集团有限公司 Desensitization rule generation method and device
CN108776762B (en) * 2018-06-08 2022-01-28 北京中电普华信息技术有限公司 Data desensitization processing method and device
CN109214642B (en) * 2018-07-10 2020-09-18 华中科技大学 Automatic extraction and classification method and system for building construction process constraints
CN109063507A (en) * 2018-07-13 2018-12-21 上海派兰数据科技有限公司 A kind of general design model for hospital information system analysis
CN109063511A (en) * 2018-08-16 2018-12-21 深圳云安宝科技有限公司 Data access control method, device, proxy server and medium based on Web API
CN109308258A (en) * 2018-08-21 2019-02-05 中国平安人寿保险股份有限公司 Building method, device, computer equipment and the storage medium of test data
CN109582861B (en) * 2018-10-29 2023-04-07 复旦大学 Data privacy information detection system
CN109460676A (en) * 2018-10-30 2019-03-12 全球能源互联网研究院有限公司 A kind of desensitization method of blended data, desensitization device and desensitization equipment
CN109344258B (en) * 2018-11-28 2021-11-12 中国电子科技网络信息安全有限公司 Intelligent self-adaptive sensitive data identification system and method
CN109740363B (en) * 2019-01-04 2023-03-14 贵州大学 Document grading desensitization encryption method
CN109783698B (en) * 2019-01-15 2023-05-26 辽宁大学 Industrial production data entity identification method based on Merkle-tree
CN109872282B (en) * 2019-01-16 2021-08-06 众安信息技术服务有限公司 Image desensitization method and system based on computer vision
CN109800600B (en) * 2019-01-23 2020-11-24 中国海洋大学 Ocean big data sensitivity evaluation system and prevention method facing privacy requirements
CN109918647A (en) * 2019-01-30 2019-06-21 中国科学院信息工程研究所 A kind of security fields name entity recognition method and neural network model
CN109977222A (en) * 2019-03-05 2019-07-05 广州海晟科技有限公司 The recognition methods of data sensitive behavior
CN109978016B (en) * 2019-03-06 2022-08-23 重庆邮电大学 Network user identity identification method
CN111832062A (en) * 2019-04-19 2020-10-27 珠海金山办公软件有限公司 Method and device for desensitizing selected area data in table file
CN110135175A (en) * 2019-04-26 2019-08-16 平安科技(深圳)有限公司 Information processing, acquisition methods, device, equipment and medium based on block chain
CN110175468B (en) * 2019-05-05 2020-12-01 浙江工业大学 Name desensitization method with distribution characteristics reserved
CN110135193A (en) * 2019-05-15 2019-08-16 广东工业大学 A kind of data desensitization method, device, equipment and computer readable storage medium
CN110109998B (en) * 2019-05-17 2023-05-30 贵州数据宝网络科技有限公司 Intelligent data transaction integration system
CN110134719B (en) * 2019-05-17 2023-04-28 贵州大学 Identification and classification method for sensitive attribute of structured data
CN110188571A (en) * 2019-06-05 2019-08-30 深圳市优网科技有限公司 Desensitization method and system based on sensitive data
CN110704861B (en) * 2019-08-07 2023-03-24 荣邦科技有限公司 Method, device and system for real-time desensitization based on open interface
CN110633577B (en) * 2019-08-22 2023-08-29 创新先进技术有限公司 Text desensitization method and device
CN110532805B (en) * 2019-09-05 2023-01-24 国网山西省电力公司阳泉供电公司 Data desensitization method and device
CN110543779B (en) * 2019-09-05 2023-04-07 国网山西省电力公司阳泉供电公司 Data processing method and device
CN110795761A (en) * 2019-10-29 2020-02-14 国网山东省电力公司信息通信公司 Dynamic desensitization method for sensitive data of ubiquitous power Internet of things
CN110795751A (en) * 2019-10-30 2020-02-14 浪潮云信息技术有限公司 Method for carrying out safety protection on sensitive data through natural language analysis
CN110851864A (en) * 2019-11-08 2020-02-28 国网浙江省电力有限公司信息通信分公司 Sensitive data automatic identification and processing method and system
CN111027032B (en) * 2019-11-13 2022-07-26 北京字节跳动网络技术有限公司 Authority management method, device, medium and electronic equipment
CN111079174A (en) * 2019-11-21 2020-04-28 中国电力科学研究院有限公司 Power consumption data desensitization method and system based on anonymization and differential privacy technology
CN111079179A (en) * 2019-12-16 2020-04-28 北京天融信网络安全技术有限公司 Data processing method and device, electronic equipment and readable storage medium
CN111143633B (en) * 2019-12-24 2023-09-01 北京明朝万达科技股份有限公司 Data decryption method and device, electronic equipment and storage medium
CN111143880B (en) * 2019-12-27 2022-06-07 中电长城网际系统应用有限公司 Data processing method and device, electronic equipment and readable medium
CN111143882A (en) * 2019-12-31 2020-05-12 杭州依图医疗技术有限公司 Information processing method and device
CN111143884B (en) * 2019-12-31 2022-07-12 北京懿医云科技有限公司 Data desensitization method and device, electronic equipment and storage medium
CN111191291B (en) * 2020-01-04 2022-06-17 西安电子科技大学 Database attribute sensitivity quantification method based on attack probability
CN111079198A (en) * 2020-03-10 2020-04-28 广州电力交易中心有限责任公司 Data publishing method and system based on electric power transaction
US20210304341A1 (en) * 2020-03-26 2021-09-30 International Business Machines Corporation Preventing disclosure of sensitive information
CN111522950B (en) * 2020-04-26 2023-06-27 成都思维世纪科技有限责任公司 Rapid identification system for unstructured massive text sensitive data
CN111611312A (en) * 2020-05-19 2020-09-01 四川万网鑫成信息科技有限公司 Data desensitization method based on rule engine and block chain technology
CN111709052B (en) * 2020-06-01 2021-05-25 支付宝(杭州)信息技术有限公司 Private data identification and processing method, device, equipment and readable medium
CN111881480A (en) * 2020-07-31 2020-11-03 平安付科技服务有限公司 Private data encryption method and device, computer equipment and storage medium
US20220100899A1 (en) * 2020-09-25 2022-03-31 International Business Machines Corporation Protecting sensitive data in documents
CN112232050B (en) * 2020-10-13 2024-04-09 中国平安人寿保险股份有限公司 Method, device, terminal and readable medium for generating greeting report
CN112329055A (en) * 2020-11-02 2021-02-05 微医云(杭州)控股有限公司 Method and device for desensitizing user data, electronic equipment and storage medium
CN112434331B (en) * 2020-11-20 2023-08-18 百度在线网络技术(北京)有限公司 Data desensitization method, device, equipment and storage medium
CN112395645A (en) * 2020-11-30 2021-02-23 中国民航信息网络股份有限公司 Data desensitization processing method and device
CN112765658A (en) * 2021-01-15 2021-05-07 杭州数梦工场科技有限公司 Data desensitization method and device, electronic equipment and storage medium
CN112800465A (en) * 2021-02-09 2021-05-14 第四范式(北京)技术有限公司 Method and device for processing text data to be labeled, electronic equipment and medium
CN112765673A (en) * 2021-03-16 2021-05-07 杭州数梦工场科技有限公司 Sensitive data statistical method and related device
CN112989414B (en) * 2021-03-21 2024-03-19 贵州大学 Mobile service data desensitization rule generation method based on width learning
CN115221544A (en) * 2021-04-16 2022-10-21 华为云计算技术有限公司 Data desensitization method and device
CN113127929B (en) * 2021-04-30 2024-03-01 天翼安全科技有限公司 Data desensitizing method, desensitizing rule processing method, device, equipment and storage medium
CN113626865A (en) * 2021-08-11 2021-11-09 南京莱斯网信技术研究院有限公司 Data sharing opening method and system for preventing sensitive information from being leaked
CN113761576A (en) * 2021-09-03 2021-12-07 国网山东省电力公司电力科学研究院 Privacy protection method and device, storage medium and electronic equipment
CN113792323A (en) * 2021-11-15 2021-12-14 聊城高新生物技术有限公司 Sensitive data encryption method and device based on agricultural products and electronic equipment
CN116070205B (en) * 2023-03-07 2023-06-13 北京和升达信息安全技术有限公司 Data clearing method and device, electronic equipment and storage medium
CN116205236B (en) * 2023-05-06 2023-08-18 四川三合力通科技发展集团有限公司 Data rapid desensitization system and method based on entity naming identification
CN116776351A (en) * 2023-06-21 2023-09-19 中国民用航空总局第二研究所 Preserving format encryption method and system for personal information to resist statistical analysis attack
CN117010019B (en) * 2023-08-04 2024-04-16 北京泰策科技有限公司 Data desensitization method and system based on NLP language model
CN116910817B (en) * 2023-09-13 2023-12-29 北京国药新创科技发展有限公司 Desensitization processing method and device for medical data and electronic equipment
CN117201206B (en) * 2023-11-08 2024-01-09 河北翎贺计算机信息技术有限公司 Network safety supervision system for preventing network data leakage
CN117290888B (en) * 2023-11-23 2024-02-09 江苏风云科技服务有限公司 Information desensitization method for big data, storage medium and server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102480481A (en) * 2010-11-26 2012-05-30 腾讯科技(深圳)有限公司 Method and device for improving security of product user data
CN103778380A (en) * 2013-12-31 2014-05-07 网秦(北京)科技有限公司 Data desensitization method and device and data anti-desensitization method and device
CN104123504A (en) * 2014-06-27 2014-10-29 武汉理工大学 Cloud platform privacy protection method based on frequent item retrieval
CN104156668A (en) * 2014-08-04 2014-11-19 江苏大学 Privacy protection reissuing method for multiple sensitive attribute data
CN106650487A (en) * 2016-09-29 2017-05-10 广西师范大学 Multi-partite graph privacy protection method published based on multi-dimension sensitive data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102480481A (en) * 2010-11-26 2012-05-30 腾讯科技(深圳)有限公司 Method and device for improving security of product user data
CN103778380A (en) * 2013-12-31 2014-05-07 网秦(北京)科技有限公司 Data desensitization method and device and data anti-desensitization method and device
CN104123504A (en) * 2014-06-27 2014-10-29 武汉理工大学 Cloud platform privacy protection method based on frequent item retrieval
CN104156668A (en) * 2014-08-04 2014-11-19 江苏大学 Privacy protection reissuing method for multiple sensitive attribute data
CN106650487A (en) * 2016-09-29 2017-05-10 广西师范大学 Multi-partite graph privacy protection method published based on multi-dimension sensitive data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
隐私保护的数据挖掘算法研究;吴珏;《中国博士学位论文全文数据库》;20130215(第2期);全文

Also Published As

Publication number Publication date
CN107480549A (en) 2017-12-15

Similar Documents

Publication Publication Date Title
CN107480549B (en) A kind of sensitive information desensitization method and system that data-oriented is shared
Gottschalk et al. EventKG–the hub of event knowledge on the web–and biographical timeline generation
Van Keulen et al. A probabilistic XML approach to data integration
US11132610B2 (en) Extracting structured knowledge from unstructured text
US8666928B2 (en) Knowledge repository
US9519681B2 (en) Enhanced knowledge repository
del Valle-Cano et al. SocialHaterBERT: A dichotomous approach for automatically detecting hate speech on Twitter through textual analysis and user profiles
CN109564578A (en) It is inputted based on natural language user interface and generates natural language output
Hamzei et al. Translating place-related questions to GeoSPARQL queries
EP1999692A2 (en) Knowledge repository
Valencia‐García et al. A knowledge acquisition methodology to ontology construction for information retrieval from medical documents
Braun et al. Consumer protection in the digital era: The potential of customer-centered legaltech
Prentice et al. Tractor: A framework for soft information fusion
Sathyendra et al. Helping users understand privacy notices with automated query answering functionality: An exploratory study
Zavarella et al. An Ontology-Based Approach to Social Media Mining for Crisis Management.
Pedersen et al. Open‐endedness, Schemas and Ontological Commitment
CN111339252B (en) Searching method, searching device and storage medium
Segev Adaptive ontology use for crisis knowledge representation
Purohit et al. An information filtering and management model for twitter traffic to assist crises response coordination
Purohit et al. Transactional Knowledge Graph Generation To Model Adversarial Activities
Cortis et al. Techniques for the identification of semantically-equivalent online identities
Delanaux Privacy-Preserving Linked Data Integration
Shrivastava Understanding the Importance of Entities and Roles in Natural Language Inference: A Model and Datasets
Xiaohan et al. On Constructing a Knowledge Base of Chinese Criminal Cases
Zhang et al. Judicial Intelligent Assistant System: Extracting Events from Divorce Cases to Detect Disputes for the Judge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province

Patentee after: Yinjiang Technology Co.,Ltd.

Address before: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province

Patentee before: ENJOYOR Co.,Ltd.