CN112835903A - Sensitive data identification method and equipment - Google Patents

Sensitive data identification method and equipment Download PDF

Info

Publication number
CN112835903A
CN112835903A CN202110138550.XA CN202110138550A CN112835903A CN 112835903 A CN112835903 A CN 112835903A CN 202110138550 A CN202110138550 A CN 202110138550A CN 112835903 A CN112835903 A CN 112835903A
Authority
CN
China
Prior art keywords
sensitive data
identified
database
data
sensitive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110138550.XA
Other languages
Chinese (zh)
Inventor
徐岩
郭义兰
王倪彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Suninfo Technology Co ltd
Original Assignee
Shanghai Suninfo Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Suninfo Technology Co ltd filed Critical Shanghai Suninfo Technology Co ltd
Priority to CN202110138550.XA priority Critical patent/CN112835903A/en
Publication of CN112835903A publication Critical patent/CN112835903A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Abstract

The method comprises the steps of selecting a database table to be identified according to database information; carrying out sensitive data identification on the database table to be identified to obtain an initial sensitive data type; and re-identifying the initial sensitive data type according to the preset mismatching data type to obtain a target sensitive data identification result. Therefore, the unexpected sensitive information type is prevented from being identified, and the accuracy of sensitive information identification is improved.

Description

Sensitive data identification method and equipment
Technical Field
The present application relates to the field of computers, and in particular, to a method and an apparatus for sensitive data identification.
Background
At present, large data is widely applied, and personal information protection is challenged never before. People enjoy the appropriate service brought to their lives by data analysis, and are also troubled by personal information leakage and even harassment, and the personal information and part of the information which we are unwilling to see real data are collectively called as sensitive information, so that how to identify and process the sensitive information in a data source is very important. The sensitive data identification refers to scanning and identifying information in a data source, and different types of sensitive information are identified through different sensitive information identification algorithms. After the sensitive information is identified, corresponding desensitization algorithms are configured for different sensitive information for desensitization, but sometimes the same data is identified as multiple sensitive data, for example, when the sensitive information is identified for a certain table in a database, a certain column in the table is simultaneously identified as information such as taxpayer identification number, passport number and the like. When the sensitive information is processed, ambiguity can be generated on which type of sensitive information the column actually belongs to, and the phenomenon that some fields belong to multiple sensitive information types at the same time, namely the identification number and the taxpayer identification number exists. Some processing needs to be performed on the sensitive information types to achieve the effect that which sensitive information type belongs can be accurately identified.
Disclosure of Invention
An object of the present application is to provide a method and an apparatus for sensitive data identification, which solve the problem in the prior art that sensitive information identifies an unexpected type of sensitive information.
According to one aspect of the present application, there is provided a method of sensitive data identification, the method comprising:
selecting a database table to be identified according to the database information;
carrying out sensitive data identification on the database table to be identified to obtain an initial sensitive data type;
and re-identifying the initial sensitive data type according to the preset mismatching data type to obtain a target sensitive data identification result.
Further, selecting a database table to be identified according to the database information includes:
and selecting the database table to be identified according to the database type, the database name and the database table name of the database.
Further, performing sensitive data identification on the database table to be identified to obtain an initial sensitive data type, including:
and identifying sensitive data for each field in the database table to be identified to obtain an initial sensitive data type contained in each field.
Further, the method comprises:
and matching the preset mismatching data types according to the actual service scene.
Further, after re-identifying the initial sensitive data type according to the preset mismatching data type, the method includes:
checking the identification result of the target sensitive data, and determining that the checking result is an inconsistent field;
and carrying out revocation mismatching setting on the data in the inconsistent fields.
Further, performing revocation mismatching setting on the data in the inconsistent field, including:
and deleting the data in the inconsistent fields, setting the data which is actually contained as mismatching, and then re-identifying the sensitive data.
According to yet another aspect of the present application, there is also provided an apparatus for sensitive data identification, the apparatus comprising:
one or more processors; and
a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method as previously described.
According to yet another aspect of the present application, there is also provided a computer readable medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the method as described above.
Compared with the prior art, the database table to be identified is selected according to the database information; carrying out sensitive data identification on the database table to be identified to obtain an initial sensitive data type; and re-identifying the initial sensitive data type according to the preset mismatching data type to obtain a target sensitive data identification result. Therefore, the unexpected sensitive information type is prevented from being identified, and the accuracy of sensitive information identification is improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 illustrates a schematic flow chart of a method of sensitive data identification provided in accordance with an aspect of the present application;
FIG. 2 is a flow chart illustrating a method for sensitive data identification based on a mismatch setting according to an embodiment of the present application;
fig. 3 shows a schematic structural diagram of a device for sensitive data identification provided in another aspect of the present application.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
The present application is described in further detail below with reference to the attached figures.
In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., Central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.
The Memory may include volatile Memory in a computer readable medium, Random Access Memory (RAM), and/or nonvolatile Memory such as Read Only Memory (ROM) or flash Memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, Phase-Change RAM (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other Memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, magnetic cassette tape, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
Fig. 1 shows a schematic flow chart of a method for sensitive data identification, which includes: step S11-step S13, wherein, step S11 is to select a database table to be identified according to the database information; step S12, carrying out sensitive data identification on the database table to be identified to obtain an initial sensitive data type; and step S13, re-identifying the initial sensitive data type according to the preset mismatching data type to obtain the target sensitive data identification result. Thereby improving the accuracy of identifying the type of sensitive information.
In step S11, selecting a database table to be identified according to the database information; the data in the database table is identified, the database table to be identified is selected according to the database information, and the identification target is found.
In step S12, performing sensitive data identification on the database table to be identified to obtain an initial sensitive data type; here, sensitive data identification is performed on the selected database table to be identified, that is, data of a sensitive data type is identified, so as to obtain an initial identification result, and at this time, an unexpected sensitive data type may exist in the initial sensitive data type in the identification result, so that further processing is required.
In step S13, re-identifying the initial sensitive data type according to a preset mismatching data type, and obtaining a target sensitive data identification result. The data type of the mismatching is preset, the data type of the mismatching is an unexpected sensitive data type, the set data type of the mismatching is used for filtering out the set sensitive data type, and the initial sensitive data type obtained in the last process is newly identified again by using the preset data type of the mismatching, so that the target sensitive data identification result is obtained, and the identification accuracy is improved.
In one embodiment of the present application, in step S11, the database table to be identified is selected according to the database type, the database name, and the database table name of the database. Here, the database type, the database name, and the database table name to be identified are selected, and the database table to be identified is determined according to the selected information, so as to identify the sensitive information, for example, a mysql database, a school database name, and an employee table name are selected.
Then, in step S12, sensitive data identification is performed on each field in the database table to be identified, so as to obtain an initial sensitive data type contained in each field. Subsequently, when sensitive data identification is carried out on the selected database table information, the sensitive data type contained in each field in the database table is identified, the initial sensitive data type in each field is obtained, and after a sensitive information identification task is executed, the sensitive information type identified in the employee table can be seen, for example, a user _ info field contains four sensitive information types, namely an identity card number, a WeChat account number, a taxpayer identification number, a website account number and the like.
In an embodiment of the present application, the method includes: and matching the preset mismatching data types according to the actual service scene. The sensitive information types are mainly used for realizing classification definition of sensitive data, and the built-in sensitive information types included in data identification include sensitive information types such as identity card numbers, e-mails, telephone numbers, bank card numbers, names, addresses, zip codes, organization structure names, organization structure codes, business license numbers, taxpayer identification numbers, bank account numbers and the like. The preset mismatching data types can be matched according to actual business scene requirements, for example, if user _ info only has the meaning of one sensitive information type, namely an identity card number, the WeChat account number, the taxpayer identification number, the website account number and the like are set as mismatching sensitive information types, sensitive information identification is carried out again, and only one sensitive information type, namely the identity card number, is identified by the user _ info. The user can customize the type of the sensitive information, and the user needs to set a matching rule expression of the sensitive information to match the corresponding sensitive data, wherein the matching rule expression is a regular expression, and if Chinese characters are matched, the expression is as follows: [ \ u4e00- \ u9fa5 ].
In an embodiment of the present application, after re-identifying the initial sensitive data type according to a preset mismatching data type, the target sensitive data identification result may be corrected, and a field with an inconsistent correction result is determined; and carrying out revocation mismatching setting on the data in the inconsistent fields. Specifically, revoking the mismatch setting includes: and deleting the data in the inconsistent fields, setting the data which is actually contained as mismatching, and then re-identifying the sensitive data. Here, for example, the WeChat account number, the taxpayer identification number, and the website account number in the above steps are set as the sensitive information type of the mismatch, but after the verification, it is found that the user _ info field is actually the taxpayer identification number, the corresponding sensitive information discovery task and the database table information are selected from the list of the mismatch, the taxpayer identification number set as the mismatch is deleted, the identity number is set as the mismatch, and the sensitive information identification is performed again, so that the field is only identified as the taxpayer identification number. When the data type of the mismatching is preset, a sensitive information identification task is newly established, the type of the mismatching sensitive information is set based on the task, the identification task can be repeatedly executed for many times, and if the type of the mismatching is modified each time, the identification result is also changed. When mismatching is set, the situation that an operator mistakenly operates to set an accurate sensitive information type is possible, and the problem that the type of mismatching setting is inaccurate due to the fact that the same data is changed during actual scanning exists, so that the error matching is required to be corrected, the previous mismatching type setting is cancelled, and accurate mismatching type information is reset.
In an embodiment of the present application, as shown in fig. 2, database table information to be subjected to sensitive information identification is selected, a sensitive information identification task is performed, if a mismatch is set for a sensitive information result, the sensitive information identification task is performed again, and if not, the sensitive information identification result is directly displayed. Therefore, the accuracy of the sensitive information type is improved by setting the mismatching, the proportion of the sensitive information type in the database is more favorably counted, and the field can be better guided to set which desensitization rule when the desensitization rule is set. And counting the proportion of the sensitive information in the identified data according to the identification result of the sensitive information, such as the proportion of the sensitive information such as the identity card number in the identified data, and setting the type of the mismatching sensitive information to accurately count the counting condition of the proportion of the sensitive information type. A desensitization rule template is preset to set the corresponding relation between each sensitive information type and the desensitization rule, when a desensitization strategy is set, if multiple sensitive information types exist, a user cannot well determine which sensitive information type is accurate, after mismatching is set, the identification result is possibly only one, when the desensitization strategy is set, the bound desensitization strategy is already in desensitization setting, the user can directly use the strategy by default, and the desensitization strategy can also be modified manually.
In addition, the embodiment of the present application also provides a computer readable medium, on which computer readable instructions are stored, the computer readable instructions being executable by a processor to implement the aforementioned method for sensitive data identification.
In correspondence with the method described above, the present application also provides a terminal, which includes modules or units capable of executing the method steps described in fig. 1 or fig. 2 or various embodiments, and these modules or units can be implemented by hardware, software or a combination of hardware and software, and the present application is not limited thereto. For example, in an embodiment of the present application, there is also provided an apparatus for sensitive data identification, the apparatus including:
one or more processors; and
a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method as previously described.
For example, the computer readable instructions, when executed, cause the one or more processors to:
selecting a database table to be identified according to the database information;
carrying out sensitive data identification on the database table to be identified to obtain an initial sensitive data type;
and re-identifying the initial sensitive data type according to the preset mismatching data type to obtain a target sensitive data identification result.
Fig. 3 is a schematic structural diagram of a device for sensitive data identification provided in another aspect of the present application, where the device includes: the device comprises a selection device 11, an identification device 12 and a mismatching setting device 13, wherein the selection device 11 is used for selecting a database table to be identified according to database information; the identification device 12 is configured to perform sensitive data identification on the database table to be identified to obtain an initial sensitive data type; and the mismatching setting device 13 is used for re-identifying the initial sensitive data type according to the preset mismatching data type to obtain a target sensitive data identification result.
It should be noted that the content executed by the selection device 11, the identification device 12 and the mismatch setting device 13 is the same as or corresponding to the content executed in the above steps S11, S12 and S13, and for brevity, the description thereof is omitted.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (8)

1. A method of sensitive data identification, wherein the method comprises:
selecting a database table to be identified according to the database information;
carrying out sensitive data identification on the database table to be identified to obtain an initial sensitive data type;
and re-identifying the initial sensitive data type according to the preset mismatching data type to obtain a target sensitive data identification result.
2. The method of claim 1, wherein selecting the database table to be identified based on the database information comprises:
and selecting the database table to be identified according to the database type, the database name and the database table name of the database.
3. The method of claim 1, wherein identifying sensitive data of the database table to be identified to obtain an initial sensitive data type comprises:
and identifying sensitive data for each field in the database table to be identified to obtain an initial sensitive data type contained in each field.
4. The method of claim 1, wherein the method comprises:
and matching the preset mismatching data types according to the actual service scene.
5. The method of claim 1, wherein after re-identifying the initial sensitive data type according to a preset mismatching data type, the method comprises:
checking the identification result of the target sensitive data, and determining that the checking result is an inconsistent field;
and carrying out revocation mismatching setting on the data in the inconsistent fields.
6. The method of claim 5, wherein revoking the mismatch setting for the data in the inconsistent field comprises:
and deleting the data in the inconsistent fields, setting the data which is actually contained as mismatching, and then re-identifying the sensitive data.
7. An apparatus for sensitive data recognition, wherein the apparatus comprises:
one or more processors; and
memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method of any of claims 1 to 6.
8. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of any one of claims 1 to 6.
CN202110138550.XA 2021-02-01 2021-02-01 Sensitive data identification method and equipment Pending CN112835903A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110138550.XA CN112835903A (en) 2021-02-01 2021-02-01 Sensitive data identification method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110138550.XA CN112835903A (en) 2021-02-01 2021-02-01 Sensitive data identification method and equipment

Publications (1)

Publication Number Publication Date
CN112835903A true CN112835903A (en) 2021-05-25

Family

ID=75931499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110138550.XA Pending CN112835903A (en) 2021-02-01 2021-02-01 Sensitive data identification method and equipment

Country Status (1)

Country Link
CN (1) CN112835903A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704573A (en) * 2021-08-26 2021-11-26 北京中安星云软件技术有限公司 Database sensitive data scanning method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120131481A1 (en) * 2010-11-22 2012-05-24 International Business Machines Corporation Dynamic De-Identification of Data
CN105825138A (en) * 2015-01-04 2016-08-03 北京神州泰岳软件股份有限公司 Sensitive data identification method and device
CN111709052A (en) * 2020-06-01 2020-09-25 支付宝(杭州)信息技术有限公司 Private data identification and processing method, device, equipment and readable medium
CN112069540A (en) * 2020-09-04 2020-12-11 中国平安人寿保险股份有限公司 Sensitive information processing method, device and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120131481A1 (en) * 2010-11-22 2012-05-24 International Business Machines Corporation Dynamic De-Identification of Data
CN105825138A (en) * 2015-01-04 2016-08-03 北京神州泰岳软件股份有限公司 Sensitive data identification method and device
CN111709052A (en) * 2020-06-01 2020-09-25 支付宝(杭州)信息技术有限公司 Private data identification and processing method, device, equipment and readable medium
CN112069540A (en) * 2020-09-04 2020-12-11 中国平安人寿保险股份有限公司 Sensitive information processing method, device and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704573A (en) * 2021-08-26 2021-11-26 北京中安星云软件技术有限公司 Database sensitive data scanning method and device

Similar Documents

Publication Publication Date Title
CN110543483A (en) Data auditing method and device and electronic equipment
US20160342501A1 (en) Accelerating Automated Testing
US20160283357A1 (en) Call stack relationship acquiring method and apparatus
CN110675399A (en) Screen appearance flaw detection method and equipment
CN107092535B (en) Method and apparatus for data storage of test interface
CN106909811A (en) The method and apparatus of ID treatment
US20230205755A1 (en) Methods and systems for improved search for data loss prevention
CN112765248A (en) SQL-based data extraction method and equipment
CN116432604A (en) Data verification method and device and electronic equipment
CN109472722B (en) Method and device for obtaining relevant information of approved finding segment of official document to be generated
CN111950380A (en) Bill auditing method and device, electronic equipment and computer-readable storage medium
CN112835903A (en) Sensitive data identification method and equipment
CN113868698A (en) File desensitization method and equipment
US20220067136A1 (en) Verification method and apparatus, and computer readable storage medium
CN110246063B (en) Method and device for guiding case examination and management
US10114951B2 (en) Virus signature matching method and apparatus
CN113129121A (en) E-commerce platform financial reconciliation accounting method and device
US10782942B1 (en) Rapid onboarding of data from diverse data sources into standardized objects with parser and unit test generation
CN110138707B (en) Data interaction method, client, application and electronic equipment
CN111832062A (en) Method and device for desensitizing selected area data in table file
CN111190986B (en) Map data comparison method and device
CN114547675A (en) Data identification method and device
CN114510300A (en) Method and equipment for embedding target object in derived class
CN110018844B (en) Management method and device of decision triggering scheme and electronic equipment
CN110517010A (en) A kind of data processing method, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination