CN115422594A - Method for realizing data desensitization by using matrix replacement - Google Patents

Method for realizing data desensitization by using matrix replacement Download PDF

Info

Publication number
CN115422594A
CN115422594A CN202211143576.4A CN202211143576A CN115422594A CN 115422594 A CN115422594 A CN 115422594A CN 202211143576 A CN202211143576 A CN 202211143576A CN 115422594 A CN115422594 A CN 115422594A
Authority
CN
China
Prior art keywords
data
desensitization
desensitized
coefficient
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211143576.4A
Other languages
Chinese (zh)
Other versions
CN115422594B (en
Inventor
吴鸿钟
李世亮
汪广锐
张桂银
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Bite Xin'an Technology Co ltd
Original Assignee
Chengdu Bite Xin'an Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Bite Xin'an Technology Co ltd filed Critical Chengdu Bite Xin'an Technology Co ltd
Priority to CN202211143576.4A priority Critical patent/CN115422594B/en
Publication of CN115422594A publication Critical patent/CN115422594A/en
Application granted granted Critical
Publication of CN115422594B publication Critical patent/CN115422594B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a method for realizing data desensitization by using matrix replacement, which comprises the following steps: step 1: acquiring data to be desensitized, and determining a data type combination of the data to be desensitized to invoke a data coding strategy from a combined database; and 2, step: carrying out data coding on data to be desensitized according to a data coding strategy to construct a matrix to be desensitized; and step 3: and analyzing the desensitization coefficient and the desensitization attribute of each element to be desensitized in the matrix to be desensitized, and replacing the elements to be replaced according to the desensitization coefficient and the desensitization attribute to obtain desensitization data. The data type of the data to be desensitized is determined to call a coding strategy to construct a matrix, and element replacement is carried out according to coefficients and attributes of elements in the matrix to obtain desensitized data, so that the data can be effectively protected.

Description

Method for realizing data desensitization by using matrix replacement
Technical Field
The invention relates to the technical field of data desensitization, in particular to a method for realizing data desensitization by using matrix replacement.
Background
Data desensitization refers to that certain sensitive information is subjected to data deformation through desensitization rules, so that the reliable protection of sensitive private data is realized. Under the condition of relating to client security data or some business sensitive data, the real data is modified and provided for test use under the condition of not violating system rules, and data desensitization is required to be carried out on personal information such as identification numbers, mobile phone numbers, card numbers, client numbers and the like.
Data desensitization generally adopts different modes, for example, fixed replacement, random replacement and the like for some data to perform privacy removal on some data, but in the replacement process, different modes are not adopted for privacy removal due to different data types, and generally a set certain mode is used for privacy removal, which results in over-single desensitization and failure in good privacy protection.
The present invention therefore proposes a method for achieving data desensitization using matrix replacement.
Disclosure of Invention
The invention provides a method for realizing data desensitization by using matrix replacement, which is used for calling an encoding strategy to construct a matrix by determining the data type of data to be desensitized, and carrying out element replacement according to the coefficient and the attribute of elements in the matrix to obtain desensitization data, so that the data is effectively protected.
The invention provides a method for realizing data desensitization by using matrix replacement, which comprises the following steps:
step 1: acquiring data to be desensitized, and determining a data type combination of the data to be desensitized to invoke a data coding strategy from a combined database;
step 2: carrying out data coding on the data to be desensitized according to the data coding strategy to construct a matrix to be desensitized;
and step 3: and analyzing the desensitization coefficient and the desensitization attribute of each element to be desensitized in the matrix to be desensitized, and replacing the elements to be replaced according to the desensitization coefficient and the desensitization attribute to obtain desensitization data. Preferably, before acquiring the data to be desensitized, the method includes:
performing first extraction on an input interface, and simultaneously performing second extraction on input information on the input interface;
according to the first extraction result, determining privacy input items of the input interface, and respectively determining the preset privacy of each privacy input item;
determining the filling accuracy of each privacy input item according to the second extraction result;
Figure 427168DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 651476DEST_PATH_IMAGE002
input information representing the corresponding privacy input item determined based on the second extraction result;
Figure DEST_PATH_IMAGE003
standard information representing a corresponding privacy input item;
Figure 775290DEST_PATH_IMAGE004
effective information representing the corresponding privacy input item determined based on the second extraction result, and an information amount of the effective information being smaller than an information amount of the corresponding input information;
Figure DEST_PATH_IMAGE005
a first reference coefficient representing input information for a corresponding privacy input item;
Figure 365366DEST_PATH_IMAGE006
a second reference coefficient representing valid information for a corresponding privacy input item;
Figure DEST_PATH_IMAGE007
indicating a fill accuracy of the corresponding privacy input;
acquiring an allowable desensitization weight matched with the preset privacy based on an input item-privacy-weight mapping table;
acquiring an allowable desensitization weight matched with the filling accuracy based on an input item-accuracy-weight mapping table;
determining an allowable desensitization value of a corresponding privacy input item according to preset privacy and filling accuracy
Figure DEST_PATH_IMAGE009
Figure 777892DEST_PATH_IMAGE010
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE011
an allowed desensitization weight representing that the corresponding privacy entry is associated with a preset privacy;
Figure 868208DEST_PATH_IMAGE012
an allowed desensitization weight representing that the corresponding privacy entry is relevant to fill accuracy;
when the allowed desensitization value is larger than a preset value, taking the input information corresponding to the privacy input item as pending data;
and combining to-be-desensitized data based on all the to-be-determined data.
Preferably, the acquiring data to be desensitized and determining the data type combination of the data to be desensitized includes:
performing data clustering analysis on the data to be desensitized to obtain a plurality of subdata;
inputting each subdata into a data discrimination model respectively to obtain the type class probability matched with the corresponding subdata;
screening the type corresponding to the maximum probability as the main data type of the corresponding subdata;
and constructing a data type combination of the data to be desensitized based on all the main data types.
Preferably, the data type combination for obtaining the data to be desensitized is constructed based on all main data types, and includes:
arranging the data according to the probability value consistent with the main data type corresponding to each subdata from large to small to obtain a data set;
performing first calibration on second data of the same type in the data set, determining distribution positions of the same type, and counting first probability values corresponding to the same type;
counting the total probability value of all second data in the data set;
determining a first ratio corresponding to the same type based on the first probability value and the total probability value, and simultaneously determining a third ratio of the probability value of each second data based on the total probability value;
determining a second ratio of the probability value of each second data in the same type to the first probability value corresponding to the same type;
constructing a first array corresponding to each second data according to the first ratio, the second ratio and the third ratio;
and respectively setting reference labels to the corresponding second data according to the first array, and constructing to obtain a data type combination.
Preferably, the method for retrieving the data encoding policy from the combined database comprises the following steps:
determining a reference label and a main data type of each subdata existing in the combined database;
meanwhile, the total privacy of the corresponding subdata based on the input privacy items is also determined;
determining a calling factor of corresponding subdata according to the reference label, the main data type and the total privacy of each subdata;
acquiring a calling combination command based on all calling factors;
and calling a data coding strategy from the combined database according to the calling combined command.
Preferably, the data coding is performed on the data to be desensitized according to the data coding strategy to construct a matrix to be desensitized, including:
acquiring coding flows of the data coding strategy and data to be coded which are correspondingly matched with each coding flow;
carrying out data coding on the corresponding data to be coded according to the coding flows, and analyzing the row and column display positions of the data coding results corresponding to each coding flow;
and determining an upper boundary, a lower boundary, a left boundary and a right boundary based on all row and column display positions, and performing coding completion processing to construct a matrix to be desensitized.
Preferably, analyzing the desensitization coefficient and desensitization attribute of each element to be desensitized in the matrix to be desensitized includes:
acquiring coding information of each element to be desensitized in the matrix to be desensitized;
inputting the coding information into an information analysis model, and acquiring a coding protection index of the coding information;
determining a desensitization coefficient corresponding to an element to be desensitized based on the coding protection index;
meanwhile, protection type division is carried out on the coding protection indexes, and the total type weight of each division result is calculated respectively;
and screening the final division type with the total type weight larger than the preset weight, and calling to obtain the desensitization attribute based on the type-attribute database.
Preferably, the element to be replaced is replaced according to the desensitization coefficient and the desensitization attribute to obtain desensitization data, which includes:
determining a desensitization level corresponding to an element to be desensitized based on the desensitization coefficient and the desensitization attribute;
screening the first element of which the desensitization level is greater than a preset level, and locking the position of the first element;
and matching the corresponding replacement information according to the desensitization level corresponding to the first element for replacement to obtain desensitization data.
Preferably, determining a desensitization level corresponding to an element to be desensitized based on the desensitization coefficient and the desensitization attribute comprises:
determining an overlapping index of the coding protection index corresponding to the desensitization coefficient and the coding protection index corresponding to the desensitization attribute;
determining a reference coefficient corresponding to an element to be desensitized based on the overlap index;
when the ratio of the reference coefficient to the desensitization coefficient is larger than 1/2, determining the desensitization level corresponding to the element to be desensitized according to the desensitization coefficient and the desensitization attribute;
otherwise, determining a first difference coefficient of the reference coefficient and the desensitization coefficient and a second difference coefficient of the reference coefficient and a preset coefficient;
determining an allowable adjustment range of the reference coefficient according to the first difference coefficient and the second difference coefficient;
screening a first index from the coding protection indexes according to the minimum value of the allowable adjustment range, wherein a first adjustment coefficient corresponding to the first index is consistent with the minimum value;
screening a second index from the coding protection index according to the maximum value of the allowable adjustment range, wherein a second adjustment coefficient corresponding to the second index is consistent with the maximum value;
screening overlapped indexes of the first indexes and the second indexes and randomly screening half of indexes from the rest indexes corresponding to the second indexes to serve as adjustment indexes;
acquiring a new desensitization attribute based on an index matched with the final classification type with the screened total type weight larger than a preset weight, wherein the index is used as a final index;
and obtaining the desensitization level corresponding to the element to be desensitized based on the desensitization coefficient and the new desensitization attribute.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flow chart of a method for implementing data desensitization using matrix replacement according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The invention provides a method for realizing data desensitization by using matrix replacement, which comprises the following steps of:
step 1: acquiring data to be desensitized, and determining a data type combination of the data to be desensitized to invoke a data coding strategy from a combined database;
and 2, step: carrying out data coding on the data to be desensitized according to the data coding strategy to construct a matrix to be desensitized;
and 3, step 3: and analyzing the desensitization coefficient and the desensitization attribute of each element to be desensitized in the matrix to be desensitized, and replacing the elements to be replaced according to the desensitization coefficient and the desensitization attribute to obtain desensitization data.
In this embodiment, data to be desensitized, for example, user information and the like, need to be privacy-protected, and data type combinations of the data to be desensitized, for example, name, identification card information, work information, age, address, work address and the like, may all be regarded as different data types.
In this embodiment, the combined database includes combinations of different data types and policies matched with the combinations, and thus, corresponding matching policies can be obtained.
In this embodiment, the data encoding strategy refers to that different data to be desensitized in the combined database are encoded according to the type to which the data belong, that is, the corresponding data to be desensitized are replaced by the codes.
In this embodiment, the data encoding may be digital encoding, alphabetical encoding, character encoding, or the like, and is mainly to encode the data to be desensitized to construct a matrix to be desensitized, and the matrix to be desensitized is finally constructed by determining the position of the corresponding encoded data.
In this embodiment, in the process of encoding according to a policy, the policy actually determines positions where different types of data need to be located, for example, after encoding the identification card information, the identification card information occupies the second row, after encoding the working information, the working information occupies the third row, after encoding the name, the name occupies the first two columns of the first row, after encoding the age, the age occupies the third column of the first row, and so on, so that the matrix to be desensitized can be constructed, that is, the matrix to be desensitized contains information that can be represented by corresponding data to be desensitized.
In this embodiment, after obtaining the obtained matrix, each element in the matrix has a desensitization coefficient and a desensitization attribute, and further determines an element that needs to be replaced, for example, replacing the element in the second row and the first column by an x, and after all the elements that need to be replaced are replaced, desensitization data can be obtained.
The beneficial effects of the above technical scheme are: the data type of the data to be desensitized is determined to call a coding strategy to construct a matrix, and element replacement is performed according to coefficients and attributes of elements in the matrix to obtain desensitized data, so that the data can be effectively protected.
The invention provides a method for realizing data desensitization by using matrix replacement, which comprises the following steps of:
performing first extraction on an input interface, and simultaneously performing second extraction on input information on the input interface;
according to the first extraction result, determining privacy input items of the input interface, and respectively determining the preset privacy of each privacy input item;
determining the filling accuracy of each privacy input item according to the second extraction result;
Figure 49791DEST_PATH_IMAGE013
wherein the content of the first and second substances,
Figure 52513DEST_PATH_IMAGE014
representing corresponding privacy input determined based on the second extraction resultInput information of the item;
Figure 952336DEST_PATH_IMAGE015
standard information representing a corresponding privacy input item;
Figure 518446DEST_PATH_IMAGE016
effective information representing the corresponding privacy input item determined based on the second extraction result, and an information amount of the effective information being smaller than an information amount of the corresponding input information;
Figure 820115DEST_PATH_IMAGE017
a first reference coefficient representing input information for a corresponding privacy input item;
Figure 243006DEST_PATH_IMAGE018
a second reference coefficient representing valid information for the corresponding privacy input item;
Figure 630125DEST_PATH_IMAGE019
indicating a fill-in accuracy of the corresponding privacy entry;
acquiring an allowable desensitization weight matched with the preset privacy based on an input item-privacy-weight mapping table;
acquiring an allowable desensitization weight matched with the filling accuracy based on an input item-accuracy-weight mapping table;
determining an allowable desensitization value of the corresponding privacy input item according to the preset privacy and the filling accuracy
Figure 999926DEST_PATH_IMAGE009
Figure 202106DEST_PATH_IMAGE010
Wherein the content of the first and second substances,
Figure 999161DEST_PATH_IMAGE011
an allowed desensitization weight representing that the corresponding privacy entry is associated with a preset privacy;
Figure 607997DEST_PATH_IMAGE012
an allowed desensitization weight representing that the corresponding privacy entry is relevant to fill accuracy;
when the allowed desensitization value is larger than a preset value, taking the input information corresponding to the privacy input item as pending data;
and combining to-be-desensitized data based on all the to-be-determined data.
In this embodiment, before determining the data to be desensitized, the corresponding allowed desensitization value needs to be determined, so as to include the effectiveness of element analysis in constructing a desensitization matrix.
In this embodiment, the input interface refers to an interface capable of inputting information, the first extraction is performed on the interface, mainly to acquire privacy input items, such as names, genders, ages, phone numbers, and the like, which need to be filled in, and the preset privacy of each privacy input item is preset.
In this embodiment, the accuracy of filling and selecting the privacy entries is determined according to the filling rules of the privacy entries, for example, the name needs pinyin, the phone number needs 11 digits, and the like, and is determined according to the restriction rules between the privacy entries, for example, the consistency between the phone number and the name, and further, the value of the first reference coefficient is generally 0.6, and the value of the second reference coefficient is generally 0.4.
In this embodiment, the entry-privacy-weight map is determined based on different fill entries, matching privacy, and corresponding allowed desensitization weights, and the entry-accuracy-weight map is determined based on different fill entries, matching accuracy, and corresponding allowed desensitization weights.
In this embodiment, the allowed desensitization value is mainly determined based on the weight corresponding to the privacy and the accuracy, and the preset value is preset, for example, the allowed desensitization value of the data 1 is greater than the preset value, and at this time, the corresponding data 1 is used as the pending data.
The beneficial effects of the above technical scheme are: the allowed desensitization value of the corresponding privacy input item is determined by considering the two aspects of the preset privacy and the filling and selecting accuracy, so that undetermined data are obtained by screening, and finally, the data to be desensitized are combined to provide a basis for subsequently constructing a matrix.
The invention provides a method for realizing data desensitization by using matrix replacement, which is used for acquiring data to be desensitized and determining a data type combination of the data to be desensitized, and comprises the following steps:
performing data clustering analysis on the data to be desensitized to obtain a plurality of subdata;
inputting each subdata into a data discrimination model respectively, and obtaining the type class probability matched with the corresponding subdata;
screening the type corresponding to the maximum probability as the main data type of the corresponding subdata;
and constructing a data type combination of the data to be desensitized based on all the main data types.
In this embodiment, the data to be desensitized, for example, includes: in this case, the sub-data 12 and the sub-data 3 are obtained by clustering the data 1, 2 and 3, including the data 1, 2 and 3.
In this embodiment, the data discrimination model is trained in advance, and is obtained by training based on different data samples and the types and matching probabilities matched with the data samples, so that the type class probability matched with the corresponding sub-data can be obtained.
In this embodiment, because each subdata has a corresponding matched type class probability, a type corresponding to the maximum probability needs to be screened as a main data type of the corresponding subdata, and a data type combination is further constructed.
The beneficial effects of the above technical scheme are: by carrying out cluster analysis on the data and obtaining the class probability of each subdata, the data type combination is conveniently constructed and obtained, and a basis is provided for constructing a matrix subsequently.
The invention provides a method for realizing data desensitization by using matrix replacement, which constructs a data type combination of data to be desensitized based on all main data types, and comprises the following steps:
arranging the data according to the probability value consistent with the main data type corresponding to each subdata from large to small to obtain a data set;
performing first calibration on second data of the same type in the data set, determining distribution positions of the same type, and counting first probability values corresponding to the same type;
counting the total probability value of all second data in the data set;
determining a first ratio corresponding to the same type based on the first probability value and the total probability value, and simultaneously determining a third ratio of the probability value of each second data based on the total probability value;
determining a second ratio of the probability value of each second data in the same type to the first probability value corresponding to the same type;
constructing a first array corresponding to each second data according to the first ratio, the second ratio and the third ratio;
and respectively setting reference labels for the corresponding second data according to the first array, and constructing to obtain a data type combination.
In this embodiment, for example, the sub-data 1, 2, and 3 have corresponding probabilities of 0.8, 0.9, and 0.7, respectively, and in this case, the obtained data sets are arranged in order of the probabilities of 0.9, 0.8, and 0.7.
In this embodiment, for example, the sub-data 1 and 3 are of the same type, and the sub-data 2 is of the same type, at this time, it may be counted that the first probability value of the sub-data 1 and 3 corresponding to the same type is 1.5, the corresponding total probability value is 2.4, the first ratio value is 1.5/2.4, and the third ratio value is: 0.9/2.4,0.8/2.4,0.7/2.4.
In this embodiment, the second ratio corresponding to the sub-data 1 and 3 is: 0.8/1.5,0.7/1.5.
The second ratio corresponding to the subdata 2 is: 0.9/0.9.
In this embodiment, the first array corresponding to the subdata 1 is: 1.5/2.4, 0.8/1.5, and 0.8/2.4.
The first array corresponding to the subdata 2 is: 0.9/2.4, and 0.9/0.9.
The first array corresponding to the subdata 3 is: 1.5/2.4, 0.7/1.5, and 0.7/2.4.
In this embodiment, the reference tag is set to the second data, so as to construct and obtain the data type combination, and the reference tag is set to mainly determine the invoking command matched with the corresponding data, thereby ensuring the accuracy of the subsequent acquiring policy.
In this embodiment, the reference label is obtained mainly according to the first ratio, the second ratio and the third ratio.
The beneficial effects of the above technical scheme are: the label is set by constructing a data set, calibrating the data of the same type and respectively calculating the corresponding three ratios, thereby providing a basis for the follow-up strategy calling.
The invention provides a method for realizing data desensitization by using matrix replacement to invoke a data coding strategy from a combined database, which comprises the following steps:
determining a reference label and a main data type of each subdata existing in the combined database;
meanwhile, the total privacy of the corresponding subdata based on the input privacy items is also determined;
determining a calling factor of corresponding subdata according to the reference label, the main data type and the total privacy of each subdata;
acquiring a calling combination command based on all calling factors;
and calling a data coding strategy from the combined database according to the calling combined command.
In this embodiment, since the subdata is obtained by clustering, one subdata may relate to a plurality of privacy entries, and further, the total privacy is obtained by performing calculation based on the privacy corresponding to the related privacy entries.
In this embodiment, the retrieval factor is retrieved from a database based on retrieval factors including a combination of reference tags, primary data types, and total privacy, and matching the combination.
In this embodiment, each retrieval factor is specific to the corresponding sub-data, so that the retrieval combination command is conveniently obtained by obtaining a total of retrieval factors corresponding to all the sub-data, and each retrieval factor corresponds to a retrieval number, so that the retrieval combination command can be obtained.
In this embodiment, the combined database is inclusive of the command and the policy matching the command.
The beneficial effects of the above technical scheme are: by determining the reference label, the main data type and the total privacy of the subdata, the factors can be conveniently and effectively obtained, the command can be obtained, the strategy can be further obtained, and an effective basis is provided for subsequent data desensitization.
The invention provides a method for realizing data desensitization by using matrix replacement, which carries out data coding on data to be desensitized according to a data coding strategy to construct a matrix to be desensitized and comprises the following steps:
acquiring coding flows of the data coding strategy and data to be coded which are correspondingly matched with each coding flow;
performing data coding on the corresponding data to be coded according to the coding flows, and analyzing the row and column display positions of the data coding results corresponding to each coding flow;
and determining an upper boundary, a lower boundary, a left boundary and a right boundary based on all row and column display positions, and performing coding completion processing to construct a matrix to be desensitized.
In this embodiment, the data encoding policy is preset, and only by analyzing the data, the relevant policy is effectively called, and then the subsequent processing is performed according to the policy.
In this embodiment, the data encoding is mainly to encode the data to be encoded, and the encoding result may represent the data to be encoded.
In this embodiment, the row and column display position may be determined based on the encoding policy, and the row and column positions of different data to be encoded may be different.
In this embodiment, since the columns appearing in each row may be different in the determination process, the left and right boundaries based on the row and the upper and lower boundaries based on the column are determined to construct the resulting matrix.
The beneficial effects of the above technical scheme are: data coding is carried out on the data to be coded according to a coding strategy, and row and column display positions of different results and determination of the maximum boundary are analyzed, so that a matrix is conveniently constructed and obtained.
The invention provides a method for realizing data desensitization by using matrix replacement, which analyzes a desensitization coefficient and desensitization attribute of each element to be desensitized in a matrix to be desensitized, and comprises the following steps:
acquiring coding information of each element to be desensitized in the matrix to be desensitized;
inputting the coding information into an information analysis model, and acquiring a coding protection index of the coding information;
determining a desensitization coefficient corresponding to an element to be desensitized based on the coding protection index;
meanwhile, protection type division is carried out on the coding protection indexes, and the total type weight of each division result is calculated respectively;
and screening the final division type with the total type weight larger than the preset weight, and calling to obtain the desensitization attribute based on the type-attribute database.
In this embodiment, each row-column position in the matrix corresponds to an element, the element is represented based on the coding, the information analysis model is trained in advance, and the model is trained based on different coding information and indexes matched with the coding information, so that the coding protection index can be obtained.
In this embodiment, when determining a desensitization coefficient, the corresponding desensitization coefficient is obtained by accumulating and calculating according to the index weight and the protection level of each coding protection index.
In this embodiment, the indicators are classified into protection types to determine total type weights of different types, that is, in the process of obtaining the protection indicators according to the model, weights are given to each indicator, so that the total type weights of different classification results can be obtained.
In this embodiment, the preset weight is preset, and is generally set to be 0.4, and the type-attribute database includes different partition types and attributes matched with the different partition types, so that a desensitization attribute, such as hiding a certain symbol therein, can be obtained.
The beneficial effects of the above technical scheme are: coding information is determined and analyzed based on a model, and desensitization coefficients and corresponding desensitization attributes are determined by analyzing indexes, so that an effective basis is provided for subsequent replacement.
The invention provides a method for realizing data desensitization by using matrix replacement, which replaces elements needing to be replaced according to a desensitization coefficient and a desensitization attribute to obtain desensitization data, and comprises the following steps of:
determining a desensitization level corresponding to an element to be desensitized based on the desensitization coefficient and the desensitization attribute;
screening the first element with the desensitization level larger than the preset level, and locking the position of the first element;
and matching the corresponding replacement information according to the desensitization level corresponding to the first element for replacement to obtain desensitization data.
The beneficial effects of the above technical scheme are: by determining desensitization levels and screening elements, information can be effectively replaced to obtain desensitization data.
The invention provides a method for realizing data desensitization by using matrix replacement, which determines desensitization levels corresponding to elements to be desensitized based on desensitization coefficients and desensitization attributes, and comprises the following steps:
determining an overlapping index of the coding protection index corresponding to the desensitization coefficient and the coding protection index corresponding to the desensitization attribute;
determining a reference coefficient corresponding to an element to be desensitized based on the overlapping index;
when the ratio of the reference coefficient to the desensitization coefficient is greater than 1/2, determining the desensitization level of the element to be desensitized according to the desensitization coefficient and the desensitization attribute;
otherwise, determining a first difference coefficient of the reference coefficient and the desensitization coefficient and a second difference coefficient of the reference coefficient and a preset coefficient;
determining an allowable adjustment range of the reference coefficient according to the first difference coefficient and the second difference coefficient;
screening a first index from the coding protection indexes according to the minimum value of the allowable adjustment range, wherein a first adjustment coefficient corresponding to the first index is consistent with the minimum value;
screening a second index from the coding protection index according to the maximum value of the allowable adjustment range, wherein a second adjustment coefficient corresponding to the second index is consistent with the maximum value;
screening overlapped indexes of the first indexes and the second indexes and randomly screening half of indexes from the rest indexes corresponding to the second indexes to serve as adjustment indexes;
acquiring a new desensitization attribute based on an index matched with the final division type with the screened total type weight larger than a preset weight as a final index;
and obtaining the desensitization level corresponding to the element to be desensitized based on the desensitization coefficient and the new desensitization attribute.
In this embodiment, the desensitization coefficient is the total encoding protection index for the first time, the desensitization attribute is the partial encoding protection index for the second time, and when there is an overlapping index, the corresponding reference coefficient may be determined, and the reference coefficient is mainly used to provide a basis for the subsequent determination of the desensitization level.
In this embodiment, the reference coefficients and the desensitization coefficients are calculated similarly, i.e., the last calculated reference coefficient must be less than the desensitization coefficient, and therefore, the ratio of the two is determined and compared to 1/2 to effectively determine the desensitization level.
In this embodiment, the first difference coefficient is the result of a desensitization coefficient-reference coefficient.
In this embodiment, the second difference coefficient is a result of the preset coefficient — the reference coefficient.
In this embodiment, the allowable adjustment range is: [ Preset coefficient-reference coefficient, desensitization coefficient-reference coefficient ].
In this embodiment, the minimum value is a preset coefficient-reference coefficient, and the maximum value is a desensitization coefficient-reference coefficient.
In this embodiment, in the process of screening the first index, all the encoded protection indexes are screened from the rest indexes after removing the overlapped indexes, and the basis of screening the second index is the same.
In this embodiment, the coefficient finally calculated according to the selected index is consistent with the corresponding minimum value and maximum value.
In this embodiment, for example, there are 10 remaining indexes, and the adjusted index is 5 randomly selected from the 10 remaining indexes.
In this embodiment, the new desensitization attribute is obtained by re-determining the filtered and originally divided ones.
In this embodiment, the calculation formula of the reference coefficient may be obtained by performing product accumulation and summation according to the corresponding index weight and the index value corresponding to the index.
The beneficial effects of the above technical scheme are: the method comprises the steps of determining a reference coefficient by determining an overlapping index corresponding to a desensitization coefficient and a desensitization attribute, comparing the reference coefficient with the desensitization coefficient, conveniently and effectively determining a desensitization level, facilitating follow-up, obtaining a new attribute by screening indexes according to a minimum value and a maximum value, obtaining the desensitization level, and providing an effective basis for data desensitization.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. A method for implementing data desensitization using matrix replacement, comprising:
step 1: acquiring data to be desensitized, and determining a data type combination of the data to be desensitized to invoke a data coding strategy from a combined database;
step 2: carrying out data coding on the data to be desensitized according to the data coding strategy to construct a matrix to be desensitized;
and 3, step 3: and analyzing the desensitization coefficient and the desensitization attribute of each element to be desensitized in the matrix to be desensitized, and replacing the elements to be replaced according to the desensitization coefficient and the desensitization attribute to obtain desensitization data.
2. The method for achieving data desensitization using matrix replacement according to claim 1, wherein prior to obtaining data to be desensitized, comprising:
performing first extraction on an input interface, and simultaneously performing second extraction on input information on the input interface;
according to the first extraction result, determining privacy input items of the input interface, and respectively determining the preset privacy of each privacy input item;
determining the filling accuracy of each privacy input item according to the second extraction result;
Figure 36947DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 534924DEST_PATH_IMAGE002
input information representing the corresponding privacy input item determined based on the second extraction result;
Figure 306571DEST_PATH_IMAGE003
standard information representing a corresponding privacy input item;
Figure 70259DEST_PATH_IMAGE004
effective information representing the corresponding privacy input item determined based on the second extraction result, and an information amount of the effective information being smaller than an information amount of the corresponding input information;
Figure 952764DEST_PATH_IMAGE005
a first reference coefficient representing input information for a corresponding privacy input item;
Figure 305248DEST_PATH_IMAGE006
a second reference coefficient representing valid information for the corresponding privacy input item;
Figure 247796DEST_PATH_IMAGE007
indicating a fill-in accuracy of the corresponding privacy entry;
acquiring an allowable desensitization weight matched with the preset privacy based on an input item-privacy-weight mapping table;
obtaining an allowable desensitization weight matched with the filling accuracy based on an input item-accuracy-weight mapping table;
determining an allowable desensitization value of the corresponding privacy input item according to the preset privacy and the filling accuracy
Figure 748048DEST_PATH_IMAGE009
Figure 168665DEST_PATH_IMAGE010
Wherein the content of the first and second substances,
Figure 641234DEST_PATH_IMAGE011
an allowed desensitization weight representing that the corresponding privacy entry is associated with a preset privacy;
Figure 72128DEST_PATH_IMAGE012
an allowed desensitization weight representing that the corresponding privacy entry is relevant to fill accuracy;
when the allowable desensitization value is larger than a preset value, taking the input information of the corresponding privacy input item as pending data;
and combining to-be-desensitized data based on all the to-be-determined data.
3. The method for achieving data desensitization using matrix replacement according to claim 1, wherein obtaining data to be desensitized and determining a data type combination of the data to be desensitized comprises:
performing data clustering analysis on the data to be desensitized to obtain a plurality of subdata;
inputting each subdata into a data discrimination model respectively, and obtaining the type class probability matched with the corresponding subdata;
screening the type corresponding to the maximum probability as the main data type of the corresponding subdata;
and constructing a data type combination of the data to be desensitized based on all the main data types.
4. The method for achieving data desensitization through matrix replacement according to claim 3, wherein constructing a data type combination for obtaining the data to be desensitized based on all major data types comprises:
arranging the data according to the probability value consistent with the main data type corresponding to each subdata from large to small to obtain a data set;
performing first calibration on second data of the same type in the data set, determining distribution positions of the same type, and counting first probability values corresponding to the same type;
counting the total probability value of all second data in the data set;
determining a first ratio corresponding to the same type based on the first probability value and the total probability value, and simultaneously determining a third ratio of the probability value of each second data based on the total probability value;
determining a second ratio of the probability value of each second data in the same type to the first probability value corresponding to the same type;
constructing a first array corresponding to each second data according to the first ratio, the second ratio and the third ratio;
and respectively setting reference labels to the corresponding second data according to the first array, and constructing to obtain a data type combination.
5. The method for implementing data desensitization using matrix substitution according to claim 4, wherein invoking a data encoding policy from a combined database comprises:
determining a reference tag and a main data type of each sub data existing in the combined database;
meanwhile, the total privacy of the corresponding subdata based on the input privacy items is also determined;
determining a calling factor of corresponding subdata according to the reference label, the main data type and the total privacy of each subdata;
acquiring a calling combination command based on all calling factors;
and calling a data coding strategy from the combined database according to the calling combined command.
6. The method for achieving data desensitization by matrix replacement according to claim 1, wherein the data encoding of the data to be desensitized according to the data encoding strategy to construct a matrix to be desensitized comprises:
acquiring coding flows of the data coding strategy and data to be coded which are correspondingly matched with each coding flow;
carrying out data coding on the corresponding data to be coded according to the coding flows, and analyzing the row and column display positions of the data coding results corresponding to each coding flow;
and determining an upper boundary, a lower boundary, a left boundary and a right boundary based on all row and column display positions, and performing coding completion processing to construct a matrix to be desensitized.
7. The method for realizing data desensitization by matrix replacement according to claim 1, wherein analyzing desensitization coefficients and desensitization attributes of each element to be desensitized in the matrix to be desensitized comprises:
acquiring coding information of each element to be desensitized in the matrix to be desensitized;
inputting the coding information into an information analysis model, and acquiring a coding protection index of the coding information;
determining a desensitization coefficient corresponding to an element to be desensitized based on the coding protection index;
meanwhile, protection type division is carried out on the coding protection indexes, and the total type weight of each division result is calculated respectively;
and screening the final division type with the total type weight larger than the preset weight, and calling to obtain the desensitization attribute based on the type-attribute database.
8. The method for achieving data desensitization by matrix replacement according to claim 7, wherein the replacement of the elements to be replaced according to the desensitization coefficients and desensitization properties to obtain desensitization data comprises:
determining a desensitization level corresponding to an element to be desensitized based on the desensitization coefficient and the desensitization attribute;
screening the first element of which the desensitization level is greater than a preset level, and locking the position of the first element;
and matching the corresponding replacement information according to the desensitization level corresponding to the first element for replacement to obtain desensitization data.
9. The method for implementing data desensitization using matrix replacement according to claim 8, wherein determining a desensitization level corresponding to an element to be desensitized based on the desensitization coefficients and desensitization properties comprises:
determining an overlapping index of the coding protection index corresponding to the desensitization coefficient and the coding protection index corresponding to the desensitization attribute;
determining a reference coefficient corresponding to an element to be desensitized based on the overlap index;
when the ratio of the reference coefficient to the desensitization coefficient is greater than 1/2, determining the desensitization level of the element to be desensitized according to the desensitization coefficient and the desensitization attribute;
otherwise, determining a first difference coefficient of the reference coefficient and the desensitization coefficient and a second difference coefficient of the reference coefficient and a preset coefficient;
determining an allowable adjustment range of the reference coefficient according to the first difference coefficient and the second difference coefficient;
screening a first index from the coding protection indexes according to the minimum value of the allowable adjustment range, wherein a first adjustment coefficient corresponding to the first index is consistent with the minimum value;
screening a second index from the coding protection index according to the maximum value of the allowable adjustment range, wherein a second adjustment coefficient corresponding to the second index is consistent with the maximum value;
screening overlapped indexes of the first indexes and the second indexes and randomly screening half of indexes from the rest indexes corresponding to the second indexes to serve as adjustment indexes;
acquiring a new desensitization attribute based on an index matched with the final classification type with the screened total type weight larger than a preset weight, wherein the index is used as a final index;
and obtaining a desensitization level corresponding to the element to be desensitized based on the desensitization coefficient and the new desensitization attribute.
CN202211143576.4A 2022-09-20 2022-09-20 Method for realizing data desensitization by matrix replacement Active CN115422594B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211143576.4A CN115422594B (en) 2022-09-20 2022-09-20 Method for realizing data desensitization by matrix replacement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211143576.4A CN115422594B (en) 2022-09-20 2022-09-20 Method for realizing data desensitization by matrix replacement

Publications (2)

Publication Number Publication Date
CN115422594A true CN115422594A (en) 2022-12-02
CN115422594B CN115422594B (en) 2023-06-30

Family

ID=84204333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211143576.4A Active CN115422594B (en) 2022-09-20 2022-09-20 Method for realizing data desensitization by matrix replacement

Country Status (1)

Country Link
CN (1) CN115422594B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951562A (en) * 2017-04-01 2017-07-14 北京数聚世界信息技术有限公司 A kind of desensitization method and device of Chinese Name data
CN110188571A (en) * 2019-06-05 2019-08-30 深圳市优网科技有限公司 Desensitization method and system based on sensitive data
CN110502924A (en) * 2019-08-23 2019-11-26 恩亿科(北京)数据科技有限公司 A kind of data desensitization method, data desensitization device and readable storage medium storing program for executing
CN110598442A (en) * 2019-09-11 2019-12-20 国网浙江省电力有限公司信息通信分公司 Sensitive data self-adaptive desensitization method and system
CN110851874A (en) * 2019-11-20 2020-02-28 成都比特信安科技有限公司 Method for realizing data desensitization by using matrix replacement
CN110851860A (en) * 2019-10-23 2020-02-28 国网天津市电力公司电力科学研究院 Power consumption data desensitization algorithm model construction method based on anonymization privacy technology
CN113420332A (en) * 2021-07-13 2021-09-21 国家电网有限公司客户服务中心 Desensitization method of client information
CN114117525A (en) * 2021-11-24 2022-03-01 贵州大学 Data publishing method and system
CN114186275A (en) * 2021-12-13 2022-03-15 平安国际融资租赁有限公司 Privacy protection method and device, computer equipment and storage medium
CN114861218A (en) * 2022-04-19 2022-08-05 胜斗士(上海)科技技术发展有限公司 Data desensitization method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951562A (en) * 2017-04-01 2017-07-14 北京数聚世界信息技术有限公司 A kind of desensitization method and device of Chinese Name data
CN110188571A (en) * 2019-06-05 2019-08-30 深圳市优网科技有限公司 Desensitization method and system based on sensitive data
CN110502924A (en) * 2019-08-23 2019-11-26 恩亿科(北京)数据科技有限公司 A kind of data desensitization method, data desensitization device and readable storage medium storing program for executing
CN110598442A (en) * 2019-09-11 2019-12-20 国网浙江省电力有限公司信息通信分公司 Sensitive data self-adaptive desensitization method and system
CN110851860A (en) * 2019-10-23 2020-02-28 国网天津市电力公司电力科学研究院 Power consumption data desensitization algorithm model construction method based on anonymization privacy technology
CN110851874A (en) * 2019-11-20 2020-02-28 成都比特信安科技有限公司 Method for realizing data desensitization by using matrix replacement
CN113420332A (en) * 2021-07-13 2021-09-21 国家电网有限公司客户服务中心 Desensitization method of client information
CN114117525A (en) * 2021-11-24 2022-03-01 贵州大学 Data publishing method and system
CN114186275A (en) * 2021-12-13 2022-03-15 平安国际融资租赁有限公司 Privacy protection method and device, computer equipment and storage medium
CN114861218A (en) * 2022-04-19 2022-08-05 胜斗士(上海)科技技术发展有限公司 Data desensitization method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴鸿钟;周海灵;: "基于大数据技术的结构化数据库保密检查系统设计", 保密科学技术, no. 01, pages 34 - 37 *

Also Published As

Publication number Publication date
CN115422594B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
US9600673B2 (en) Method and device for risk evaluation
CN105426756B (en) The detection of confidential information
Faith et al. Integrating phylogenetic diversity, complementarity, and endemism for conservation assessment
US5659731A (en) Method for rating a match for a given entity found in a list of entities
Stvilia et al. A framework for information quality assessment
CN110569322A (en) Address information analysis method, device and system and data acquisition method
US20140330845A1 (en) Method for record linkage from multiple sources
KR20090014136A (en) System and method for searching and matching data having ideogrammatic content
CN111597348B (en) User image drawing method, device, computer equipment and storage medium
CN112765659B (en) Data leakage protection method for big data cloud service and big data server
CN112990386B (en) User value clustering method and device, computer equipment and storage medium
CN110222733B (en) High-precision multi-order neural network classification method and system
CN110276382A (en) Listener clustering method, apparatus and medium based on spectral clustering
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN108924371A (en) The method that family number is identified by incoming number during electric power customer service
CN110517154A (en) Data model training method, system and computer equipment
CN112259210A (en) Medical big data access control method and device and computer readable storage medium
CA3110455A1 (en) Visualization of social determinants of health
CN112785112A (en) Risk rule extraction method and risk rule extraction device
CN115422594A (en) Method for realizing data desensitization by using matrix replacement
CN112966965A (en) Import and export big data analysis and decision method, device, equipment and storage medium
CN117421773A (en) Data desensitization processing method, device, equipment and storage medium
KR102110350B1 (en) Domain classifying device and method for non-standardized databases
CN113688206A (en) Text recognition-based trend analysis method, device, equipment and medium
CN111382457A (en) Data risk assessment method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant