CN116401704A - Sensitive data identification method, electronic equipment and storage medium - Google Patents

Sensitive data identification method, electronic equipment and storage medium Download PDF

Info

Publication number
CN116401704A
CN116401704A CN202310344068.0A CN202310344068A CN116401704A CN 116401704 A CN116401704 A CN 116401704A CN 202310344068 A CN202310344068 A CN 202310344068A CN 116401704 A CN116401704 A CN 116401704A
Authority
CN
China
Prior art keywords
data
sensitive data
industry
identified
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310344068.0A
Other languages
Chinese (zh)
Inventor
于元河
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202310344068.0A priority Critical patent/CN116401704A/en
Publication of CN116401704A publication Critical patent/CN116401704A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a sensitive data identification method, electronic equipment and a storage medium. In the embodiment of the application, the industry classification template is utilized to identify the special sensitive data of the data to be identified, and the universal recognition model is utilized to identify the universal sensitive data of the data to be identified, so that the full sensitive data identification of the data to be identified is realized, namely the sensitive data is identified more comprehensively.

Description

Sensitive data identification method, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of data security technologies, and in particular, to a sensitive data identification method, an electronic device, and a storage medium.
Background
Various industries such as finance, energy, automobiles and the like have sensitive data identification requirements, and sensitive data mainly comprises high-value data such as customer data, technical data, personal information and the like, and the leakage of the sensitive data directly influences the data security. Therefore, how to more comprehensively identify sensitive data of each industry is a technical problem to be solved.
Disclosure of Invention
Aspects of the present application provide a sensitive data identification method, an electronic device, and a storage medium for more comprehensively identifying sensitive data.
The embodiment of the application provides a sensitive data identification method, which comprises the following steps: acquiring data to be identified and a target industry classification and grading template of a target industry to which the data to be identified belong; carrying out industry exclusive sensitive data identification on the data to be identified by utilizing the target industry classification template to obtain an industry exclusive sensitive data identification result; and carrying out universal sensitive data identification on the data to be identified by utilizing the universal identification model to obtain a universal sensitive data identification result.
The embodiment of the application also provides a sensitive data identification method which is applied to the cloud server and comprises the following steps: responding to a task creation request triggered by the terminal equipment, and creating a sensitive data identification task; responding to the triggering of the sensitive data identification task, and determining target industry classification and classification templates of the data to be identified and the target industries to which the data to be identified belong based on the sensitive data identification task; carrying out industry exclusive sensitive data identification on the data to be identified by utilizing the target industry classification template to obtain an industry exclusive sensitive data identification result; carrying out universal sensitive data identification on the data to be identified by utilizing the universal identification model to obtain a universal sensitive data identification result; and returning the industry exclusive sensitive data identification result and the general sensitive data identification result to the terminal equipment.
The embodiment of the application also provides electronic equipment, which comprises: a memory and a processor; a memory for storing a computer program; the processor is coupled to the memory for executing the computer program for performing the steps in the sensitive data identification method.
Embodiments of the present application also provide a computer-readable storage medium storing a computer program, which when executed by a processor, causes the processor to implement steps in a sensitive data identification method.
In the embodiment of the application, the industry classification template is utilized to identify the special sensitive data of the data to be identified, and the universal recognition model is utilized to identify the universal sensitive data of the data to be identified, so that the full sensitive data identification of the data to be identified is realized, namely the sensitive data is identified more comprehensively.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
fig. 1 is a flowchart of a method for identifying sensitive data according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an exemplary industry classification template provided in an embodiment of the application;
FIG. 3 is an exemplary task creation interface provided by embodiments of the present application;
FIG. 4 is an exemplary application scenario diagram provided by an embodiment of the present application;
FIG. 5 is a flowchart of another method for identifying sensitive data according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a sensitive data identification apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or" describes the access relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may represent: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In the text description of the present application, the character "/" generally indicates that the front-rear association object is an or relationship. In addition, in the embodiments of the present application, "first", "second", "third", etc. are only for distinguishing the contents of different objects, and have no other special meaning.
Various industries such as finance, energy, automobiles and the like have sensitive data identification requirements, and sensitive data mainly comprises high-value data such as customer data, technical data, personal information and the like, and the leakage of the sensitive data directly influences the data security. Therefore, how to more comprehensively identify sensitive data of each industry is a technical problem to be solved.
For this reason, the embodiment of the application provides a sensitive data identification method, an electronic device and a storage medium. In the embodiment of the application, the industry classification template is utilized to identify the special sensitive data of the data to be identified, and the universal recognition model is utilized to identify the universal sensitive data of the data to be identified, so that the full sensitive data identification of the data to be identified is realized, namely the sensitive data is identified more comprehensively.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for identifying sensitive data according to an embodiment of the present application. The method may be performed by a sensitive data identification device, which may be comprised of software and/or hardware, and which may be configured in general in an electronic device.
Referring to fig. 1, the method may include the steps of:
101. And acquiring data to be identified and a target industry classification and grading template of the target industry to which the data to be identified belong.
102. And carrying out industry exclusive sensitive data identification on the data to be identified by utilizing the target industry classification and classification template to obtain an industry exclusive sensitive data identification result.
103. And carrying out universal sensitive data identification on the data to be identified by using the universal identification model to obtain a universal sensitive data identification result.
In this embodiment, the data to be identified refers to data derived from different industries having sensitive data identification requirements. The data to be identified may be structured data or unstructured data, which is not limited.
In this embodiment, the industry-specific sensitive data identification is performed on the data to be identified, and the general sensitive data identification is performed on the data to be identified, so that the full-scale sensitive data identification is performed on the data to be identified, that is, the sensitive data is identified more comprehensively.
In this embodiment, the industry-specific sensitive data may be understood as sensitive data specific to a certain industry, that is, sensitive data unique to a certain industry. For example, sensitive data for an industry includes passport pictures, fingerprint pictures, business license pictures, payment passwords, private keys, and the like.
In this embodiment, the general sensitive data may be understood as sensitive data having a general characteristic, that is, sensitive data applicable to each industry, in other words, sensitive data common to each industry. Generic sensitive data includes, for example, but is not limited to: name, age, home address, work unit, etc.
In this embodiment, in order to identify the industry-specific sensitive data, an industry classification template corresponding to each industry is provided, and the industry classification template can identify the industry-specific sensitive data and classify the industry-specific sensitive data. Classification refers to determining the sensitivity classification to which the industry-specific sensitivity data belongs. It is worth noting how many classes there are sensitive classifications of a certain industry, which can be flexibly set as required. Sensitive classifications for an industry include, for example, but are not limited to: key sensitive information, sensitive picture information, location sensitive information, and business sensitive information. Grading refers to determining the risk level of the industry-specific sensitive data, and the higher the risk level is, the greater the influence on the data security is. For example, the risk levels are ranked in order of high to low, S1, S2, S3, and S4, respectively.
For better industry-specific sensitive data identification, the industry classification template includes at least one sensitive classification associated with the corresponding industry, at least one data type associated with each sensitive classification, and a specific identification model associated with each data type, the specific identification model indicating a risk level of the data type and at least one sensitive data identification manner associated with the data type.
In practical applications, the data types of industries can be flexibly defined, and the data types of a certain industry include, for example and not limited to: payment passwords, private keys, passport pictures, fingerprint pictures, business license pictures, and the like.
Referring to the exemplary industry classification ranking template shown in fig. 2, the industry classification ranking template indicates that the sensitive classifications of the corresponding industries include key sensitive information, sensitive picture information, location sensitive information, and business sensitive information. Each sensitive class is associated with one or more data types, e.g., key sensitive information, such as data types under the sensitive class including, but not limited to: payment passwords and private keys, etc. The data types under the sensitive classification of sensitive picture information include, for example, but are not limited to: passport pictures, fingerprint pictures, business license pictures, and the like. Each data type is associated with a proprietary identification model for identifying sensitive data for that data type. The proprietary recognition model indicates a risk level of the data type and at least one sensitive data recognition means associated with the data type. Taking the exclusive identification model of the business license picture as an example, the business license picture is configured by the data type configuration item of the exclusive identification model, the data type configured by the data type configuration item can also be regarded as the model name of the exclusive identification model, and the semantic information of the model name can indicate that the function of the exclusive identification model is to identify the sensitive data of the configured data type. The classification hierarchy configuration item configures which industry classification hierarchy template the data type belongs to, and the sensitive classification and risk level corresponding to the data type.
In this embodiment, the plurality of industry classification templates may include at least one system level industry classification template and/or at least one custom industry classification template. A system level industry classification rating template may be understood as an industry classification rating template provided by a vendor or any third party that provides sensitive data identification services. Custom industry classification ranking templates may be understood as industry classification ranking templates customized by a user to meet user-defined needs. Further optionally, in order to improve the convenience of creating the custom industry classification template, the system-level industry classification template may be modified to obtain the custom industry classification template. As an example, when creating the custom industry classification template, a system-level industry classification template may be selected from the at least one system-level industry classification template; updating the selected system-level industry classification template to obtain a custom industry classification template.
In practical application, when updating the selected system-level industry classification template, sensitive classification in the selected system-level industry classification template can be deleted, added and modified, the data types under any sensitive classification in the selected system-level industry classification template can be deleted, added and modified. It can be appreciated that when a sensitive class is added, the data type under the sensitive class needs to be added. The new sensitivity classification or new data type may involve associating a proprietary recognition model for the data type. The associated proprietary recognition model can be selected from the created proprietary recognition models, and further optionally, in order to better meet user-defined requirements, the user can be supported to create a new proprietary recognition model. Based on the above, if the creation of the newly added exclusive identification model is involved in the process of updating the selected system-level industry classification and classification template, displaying an exclusive identification model configuration interface; and configuring the newly added exclusive identification model in response to the configuration operation aiming at the exclusive identification model configuration interface.
As an example, referring to fig. 2, the proprietary recognition model configuration interface includes at least a data type configuration item, a classification hierarchy configuration item, and a recognition mode configuration item; in response to a configuration operation aiming at the proprietary identification model configuration interface, the implementation manner of configuring the newly added proprietary identification model is as follows: responding to configuration operation triggered by the data type configuration item, and configuring the data type associated with the newly added exclusive identification model; responding to configuration operation triggered by the classification configuration item, and configuring an industry classification template, a sensitive classification and a risk level associated with the newly added exclusive identification model; and configuring at least one sensitive data identification mode of the data type associated with the newly added exclusive identification model in response to the configuration operation triggered by the identification mode configuration item.
It is noted that at least one classification information may be associated with the newly added proprietary identification model via a classification configuration item, each classification information indicating an industry classification template, a sensitivity classification, and a risk level associated with the newly added proprietary identification model.
In addition, the sensitive data identification mode can be flexibly selected according to the requirement. For example, sensitive information (e.g., cell phone number, instant messaging number, mailbox, etc.) for a large number of numeric and english characters may be identified based on regular expressions. If the data to be identified is successfully matched with the regular expression, the data to be identified is sensitive data; and if the data to be identified is failed to be matched with the regular expression, indicating that the data to be identified is not sensitive data. For another example, some sensitive words may be collected in advance to form a keyword library, and recognition may be performed based on the keyword library. If the data to be identified has sensitive words in the keyword library, the data to be identified is the sensitive data; and if the data to be identified does not have the sensitive words in the keyword library, indicating that the data to be identified is not the sensitive data. For another example, the identification may be based on image recognition techniques. And calculating the image similarity between the picture to be identified and the predetermined sensitive picture, and if the image similarity is higher, determining that the picture to be identified is the sensitive picture. If the image similarity is low, the picture to be identified is not a sensitive picture.
Further optionally, in order to improve the richness of the recognition result of the sensitive data, referring to fig. 2, the dedicated recognition model configuration interface further includes: a data tag configuration item; and the data tag to which the sensitive data identified by the newly added exclusive identification model belongs can be configured in response to the configuration operation triggered by the data tag configuration item.
Notably, the data tags are flexibly set as needed. When the data to be identified is identified as sensitive data, adding a corresponding data tag for the data to be identified. The data tag is, for example, personal sensitive data or personal data, etc.
Further optionally, for better management of sensitive data identification, referring to fig. 2, the proprietary identification model configuration interface further includes: identify a range configuration item and/or identify a threshold configuration item. The identification range configuration item is used for configuring a sensitive data identification range, namely, carrying out sensitive data identification on which data. The identification threshold configuration item is used for configuring an evaluation standard of the identification rate of the sensitive data. For example, if 1 to 10 pieces of sensitive information are hit, the level indicating the recognition rate of the sensitive data is a low level; if 10-100 pieces of sensitive information are hit, the grade of the identification rate of the sensitive data is a medium grade; if more than 100 pieces of sensitive information are hit, the identification rate of the sensitive data is high.
In this embodiment, in order to implement industry-specific sensitive data identification on data to be identified, an industry classification template of an industry to which the data to be identified belongs needs to be acquired. Further optionally, in order to better manage the sensitive data identification operation, a target industry classification ranking template of the data to be identified and the target industry to which the data belongs may be determined based on the sensitive data identification task in response to the sensitive data identification task being triggered.
In particular, when a sensitive data identification task is created, it is indicated which data needs to be sensitive data identified, and industry classification ranking templates to which those data are associated. In this way, when the sensitive data identification task is triggered, it can be timely and accurately determined which industry classification and classification templates are adopted to identify which data.
Further optionally, to facilitate task management, sensitive data may be created through a task creation interface to identify tasks. As one example, in response to a task creation request, a task creation interface is displayed, the task creation interface including at least a scope configuration item and a template configuration item; configuring a scope of the sensitive data identification task in response to a configuration operation initiated through the scope configuration item; responding to the triggering operation of the template configuration items, and displaying a plurality of industry classification grading templates; in response to a selection operation of the plurality of industry classification hierarchical templates, the selected industry classification hierarchical template is configured as a target industry classification hierarchical template for the sensitive data identification task to create the sensitive data identification task.
In practical application, the task creation interface may include more configuration items besides the scope configuration item and the template configuration item, which is not limited. Referring to FIG. 3, the task creation interface includes, for example, but is not limited to: a task name configuration item for configuring a task name, a task start time configuration item for configuring a task start time, a scope configuration item for configuring a scope configuration defining a sensitive data identification scope, a template configuration item for configuring an industry classification hierarchical template, and the like. Thus, sensitive data created through the task creation interface identifies tasks including, for example, but not limited to: task name, task start time, industry classification rating templates for scope and configuration defining sensitive data identification scope, and the like.
And if the task starting time is configured to be immediately scanned, triggering the execution of the sensitive data identification task when the sensitive data identification task is received. If the task starting time is configured as periodic scanning, the execution of the sensitive data identification task is periodically triggered after the sensitive data identification task is received.
And if the scope is configured as global scanning, carrying out sensitive data identification on all data assets of the user. If the scope is configured as a specified data domain, sensitive data identification is performed on the data assets in the specified data domain. If the scope is configured to specify an asset type, sensitive data identification is performed on the data asset of the specified asset type. Wherein a data field may be considered a collection of data assets of the same characteristics, which may be defined, for example, by one or more of service attributes, organization architecture, data characteristics. Data assets include, for example, but are not limited to: database tables, instance data or log data, etc.
Based on the foregoing, as an optional implementation manner, determining, based on the sensitive data identification task, the to-be-identified data and the target industry classification template of the target industry to which the to-be-identified data belongs may include: and determining data to be identified according to the scope, and acquiring a target industry classification template from the sensitive data identification task.
In this embodiment, in order to implement industry-specific sensitive data identification on data to be identified, an industry classification template of an industry to which the data to be identified belongs needs to be acquired. For ease of understanding, the industry to which the data to be identified belongs is referred to as the target industry, and the industry classification template of the target industry is referred to as the target industry classification template. After the target industry classification template corresponding to the data to be identified is determined, the target industry classification template is utilized to identify the sensitive data specific to the industry of the data to be identified, and an identification result of the sensitive data specific to the industry is obtained.
Specifically, the target industry classification template comprises at least one sensitive classification associated with the target industry, at least one data type associated with each sensitive classification and a specific identification model associated with each data type, wherein the specific identification model indicates the risk level of the data type and at least one sensitive data identification mode associated with the data type, and when the target industry classification template is used for carrying out industry specific sensitive data identification on data to be identified to obtain an industry specific sensitive data identification result, a target specific identification model matched with the target data type of the data to be identified in the target industry classification template can be determined; carrying out industry exclusive sensitive data identification on the data to be identified by utilizing at least one sensitive data identification mode associated with the target data type in the target exclusive identification model to obtain an industry exclusive sensitive data identification result; the industry-specific sensitive data identification result comprises whether the data to be identified is sensitive data or not, and if the data to be identified is sensitive data, the risk level of the data to be identified is the risk level of the target data type, and the sensitive classification corresponding to the data to be identified is the sensitive classification corresponding to the target data type.
In this embodiment, the universal recognition model is used to perform universal sensitive data recognition on the data to be recognized, so as to obtain a universal sensitive data recognition result. The generic recognition model may be, for example, a machine learning model trained based on massive training data, but is not limited thereto.
According to the technical scheme provided by the embodiment of the application, the industry-specific sensitive data identification is carried out on the data to be identified by utilizing the industry classification template, the general sensitive data identification is carried out on the data to be identified by utilizing the general identification model, and further the full-scale sensitive data identification is carried out on the data to be identified, namely the sensitive data is identified more comprehensively.
For better understanding, a scene embodiment is described below in connection with fig. 4.
In this embodiment, the data security center system located at the cloud provides a sensitive data identification service to meet the sensitive data identification requirements of users in different industries. Referring to fig. 4, the data security center system may include a management service module and an engine service module.
The management and control service module can maintain and manage industry classification templates, sensitive data identification tasks and sensitive data identification results. Specifically, a template management unit in the management and control service module provides a plurality of industry classification hierarchical templates. The industry classification templates comprise system-level industry classification templates provided by a service side of the data security center system and custom industry classification templates provided by users. The template management unit can respond to the creation operation of the user aiming at the custom industry classification template, and create the custom industry classification template for the user. A task management unit in the management and control service module maintains and manages sensitive data identification tasks. The management and control service module can respond to a task creation instruction of a user to create sensitive data identification tasks. The management and control service module receives the sensitive data identification result returned by the engine service module through the interface layer and stores the sensitive data identification result in the database. Of course, the management and control service module can also perform statistical analysis on the sensitive data identification result in the database. Such as data classification ranking statistics, data tag statistics, and data identification rate statistics, among others. The data classification hierarchy statistics, for example, count how many pieces of sensitive data are for each sensitive classification or how many pieces of sensitive data are for each risk class. The data tag statistics, for example, count how many pieces of sensitive data are for each data tag. The data identification rate statistics, for example, count how many data columns are, how many data columns are identified, how many data columns are sensitive, etc. The data columns refer to fields (fields) of structured data, which are composed of rows and columns, and generally "columns" of structured data are referred to as fields and "rows" of structured data are referred to as records (records).
Referring to (1) in fig. 4, the terminal device transmits a task creation instruction to the data security center system, and a task management unit in the data security center system creates a sensitive data identification task in response to the task creation instruction of the user. Referring to fig. 4 (2), the engine service module pulls the sensitive data identification task from the task management unit, referring to fig. 4 (3), the engine service module performs full-scale sensitive data identification, that is, performs industry-specific sensitive data identification on the data to be identified by using the industry classification hierarchical template, and performs general sensitive data identification on the data to be identified by using the general identification model. Referring to (4) in fig. 4, the management and control service module receives the sensitive data identification result output by the engine service module through the interface layer and writes the sensitive data identification result into the database. The management and control service module can also carry out statistical analysis on the identification result of the sensitive data in the database so as to be inquired by a user. Referring to (5) in fig. 4, the terminal device sends a query command to the data security center system, and the management and control service module queries the database to obtain the sensitive data identification result and returns to the user. For example, the data identification result of a certain data column includes: a proprietary recognition model of data column hits, data tags, risk levels and sensitivity classifications, and so forth. Of course, if the user can also query the statistics of the identification results of the sensitive data, the method is not limited.
Fig. 5 is a flowchart of another method for identifying sensitive data according to an embodiment of the present application. The method may be performed by a sensitive data identification device, which may be comprised of software and/or hardware, and may be generally configured in a cloud server. Referring to fig. 5, the method may include the steps of:
501. and responding to a task creation request triggered by the terminal equipment, and creating a sensitive data identification task.
502. And determining the target industry classification grading template of the data to be identified and the target industry to which the data belong based on the sensitive data identification task in response to the sensitive data identification task being triggered.
503. And carrying out industry exclusive sensitive data identification on the data to be identified by utilizing the target industry classification and classification template to obtain an industry exclusive sensitive data identification result.
504. Carrying out universal sensitive data identification on the data to be identified by utilizing the universal identification model to obtain a universal sensitive data identification result;
505. and returning the industry exclusive sensitive data identification result and the general sensitive data identification result to the terminal equipment.
For implementation of each step in this method embodiment, reference may be made to the related description in the foregoing embodiment, which is not repeated here.
According to the technical scheme provided by the embodiment, the industry classification hierarchical template is utilized to identify the special sensitive data of the industry, and the universal recognition model is utilized to identify the universal sensitive data of the data to be identified, so that the full sensitive data identification of the data to be identified is realized, and the sensitive data is identified more comprehensively.
Fig. 6 is a schematic structural diagram of a sensitive data identification apparatus according to an embodiment of the present application. As shown in fig. 6, the apparatus may include:
the acquiring module 61 is configured to acquire data to be identified and a target industry classification and classification template of a target industry to which the data belong;
the industry exclusive identification module 62 is configured to identify industry exclusive sensitive data of the data to be identified by using the target industry classification template, so as to obtain an industry exclusive sensitive data identification result; and
the universal recognition module 63 is configured to perform universal sensitive data recognition on the data to be recognized by using the universal recognition model, so as to obtain a universal sensitive data recognition result.
Further optionally, the target industry classification template includes at least one sensitive classification associated with the target industry, at least one data type associated with each sensitive classification, and a proprietary recognition model associated with each data type, the proprietary recognition model indicating a risk level of the data type and at least one sensitive data recognition manner associated with the data type;
accordingly, the industry exclusive identification module 62 performs industry exclusive sensitive data identification on the data to be identified by using the target industry classification hierarchical template, and is specifically configured to: determining a target exclusive identification model matched with a target data type of data to be identified in a target industry classification and classification template; carrying out industry exclusive sensitive data identification on the data to be identified by utilizing at least one sensitive data identification mode associated with the target data type in the target exclusive identification model to obtain an industry exclusive sensitive data identification result; the industry-specific sensitive data identification result comprises whether the data to be identified is sensitive data or not, and if the data to be identified is sensitive data, the risk level of the data to be identified is the risk level of the target data type, and the sensitive classification corresponding to the data to be identified is the sensitive classification corresponding to the target data type.
Further optionally, when the obtaining module 61 obtains the data to be identified and the target industry classification template of the target industry to which the data belongs, the obtaining module is specifically configured to: and determining the target industry classification grading template of the data to be identified and the target industry to which the data belong based on the sensitive data identification task in response to the sensitive data identification task being triggered.
Further optionally, the sensitive data identification task includes at least a scope and a target industry classification template for defining a sensitive data identification range; the obtaining module 61 is specifically configured to, when determining the to-be-identified data and the target industry classification and classification template of the target industry to which the to-be-identified data belongs based on the sensitive data identification task: and determining data to be identified according to the scope, and acquiring a target industry classification template from the sensitive data identification task.
Further optionally, the apparatus further includes: the task creation module is used for responding to the task creation request and displaying a task creation interface, wherein the task creation interface at least comprises a scope configuration item and a template configuration item; configuring a scope of the sensitive data identification task in response to a configuration operation initiated through the scope configuration item; responding to the triggering operation of the template configuration items, and displaying a plurality of industry classification grading templates; in response to a selection operation of the plurality of industry classification hierarchical templates, the selected industry classification hierarchical template is configured as a target industry classification hierarchical template for the sensitive data identification task to create the sensitive data identification task.
Further optionally, the plurality of industry classification templates includes at least one system level industry classification template and/or at least one custom industry classification template.
Further optionally, the apparatus further includes a template creation module for selecting a system-level industry classification template from at least one system-level industry classification template; updating the selected system-level industry classification template to obtain a custom industry classification template; if the newly added exclusive identification model is created in the process of updating the selected system-level industry classification template, displaying an exclusive identification model configuration interface; and configuring the newly added exclusive identification model in response to the configuration operation aiming at the exclusive identification model configuration interface.
Further optionally, the dedicated recognition model configuration interface includes at least a data type configuration item, a classification hierarchy configuration item, and a recognition mode configuration item; the template creation module is specifically configured to, when configuring the newly added proprietary recognition model, in response to a configuration operation for the proprietary recognition model configuration interface: responding to configuration operation triggered by the data type configuration item, and configuring the data type associated with the newly added exclusive identification model; responding to configuration operation triggered by the classification configuration item, and configuring an industry classification template, a sensitive classification and a risk level associated with the newly added exclusive identification model; and configuring at least one sensitive data identification mode of the data type associated with the newly added exclusive identification model in response to the configuration operation triggered by the identification mode configuration item.
Further optionally, the proprietary recognition model configuration interface further includes: a data tag configuration item; the template creation module is further configured to: and responding to configuration operation triggered by the data tag configuration item, and configuring the data tag to which the sensitive data identified by the newly added exclusive identification model belongs.
The apparatus shown in fig. 6 may perform the method of the embodiment shown in fig. 1, and its implementation principles and technical effects will not be repeated. The specific manner in which the various modules and units perform the operations in the apparatus of fig. 6 in the above embodiments has been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
It should be noted that, the execution subjects of each step of the method provided in the above embodiment may be the same device, or the method may also be executed by different devices. For example, the execution subject of steps 101 to 103 may be device a; for another example, the execution subject of steps 101 and 102 may be device a, and the execution subject of step 103 may be device B; etc.
In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations appearing in a specific order are included, but it should be clearly understood that the operations may be performed out of the order in which they appear herein or performed in parallel, the sequence numbers of the operations such as 101, 102, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device includes: a memory 71 and a processor 72;
memory 71 for storing a computer program and may be configured to store other various data to support operations on the computing platform. Examples of such data include instructions for any application or method operating on a computing platform, contact data, phonebook data, messages, pictures, videos, and the like.
The Memory 71 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random access Memory (Static Random-AccessMemory, SRAM), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read Only Memory, EEPROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic or optical disk.
A processor 72 coupled to the memory 71 for executing the computer program in the memory 71 for: steps in the sensitive data identification method are performed.
Further, as shown in fig. 7, the electronic device further includes: communication component 73, display 74, power component 75, audio component 76, and other components. Only some of the components are schematically shown in fig. 7, which does not mean that the electronic device only comprises the components shown in fig. 7. In addition, the components within the dashed box in fig. 7 are optional components, not necessarily optional components, depending on the product form of the electronic device. The electronic device in this embodiment may be implemented as a terminal device such as a desktop computer, a notebook computer, a smart phone, or an IOT (internet of things ) device, or may be a server device such as a conventional server, a cloud server, or a server array. If the electronic device of the embodiment is implemented as a terminal device such as a desktop computer, a notebook computer, or a smart phone, the electronic device may include components within the dashed-line frame in fig. 7; if the electronic device of the embodiment is implemented as a server device such as a conventional server, a cloud server, or a server array, the components within the dashed box in fig. 7 may not be included.
The detailed implementation process of each action performed by the processor may refer to the related description in the foregoing method embodiment or the apparatus embodiment, and will not be repeated herein.
Accordingly, the present application further provides a computer readable storage medium storing a computer program, where the computer program is executed to implement the steps executable by the electronic device in the above method embodiments.
Accordingly, embodiments of the present application also provide a computer program product comprising a computer program/instructions which, when executed by a processor, cause the processor to carry out the steps of the above-described method embodiments that are executable by an electronic device.
The communication component is configured to facilitate wired or wireless communication between the device in which the communication component is located and other devices. The device where the communication component is located may access a wireless network based on a communication standard, such as a mobile communication network of WiFi (Wireless Fidelity ), 2G (2 generation,2 generation), 3G (3 generation ), 4G (4 generation,4 generation)/LTE (long Term Evolution ), 5G (5 generation,5 generation), or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a near field communication (Near Field Communication, NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on radio frequency identification (Radio Frequency Identification, RFID) technology, infrared data association (The Infrared Data Association, irDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
The display includes a screen, which may include a liquid crystal display (Liquid Crystal Display, LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation.
The power supply component provides power for various components of equipment where the power supply component is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located.
The audio component described above may be configured to output and/or input an audio signal. For example, the audio component includes a Microphone (MIC) configured to receive external audio signals when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (central processing unit, CPUs), input/output interfaces, network interfaces, and memory.
The Memory may include non-volatile Memory in a computer readable medium, random access Memory (Random Access Memory, RAM) and/or non-volatile Memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase Change RAM (PRAM), static Random-Access Memory (SRAM), dynamic Random-Access Memory (Dynamic Random Access Memory, DRAM), other types of Random-Access Memory (Random Access Memory, RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash Memory or other Memory technology, compact disc Read Only Memory (CD-ROM), digital versatile disc (Digital versatile disc, DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, operable to store information that may be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (12)

1. A method for identifying sensitive data, comprising:
acquiring data to be identified and a target industry classification and grading template of a target industry to which the data to be identified belong;
carrying out industry exclusive sensitive data identification on the data to be identified by utilizing the target industry classification template to obtain an industry exclusive sensitive data identification result; and
And carrying out universal sensitive data identification on the data to be identified by using a universal identification model to obtain a universal sensitive data identification result.
2. The method of claim 1, wherein the target industry classification template comprises at least one sensitivity classification associated with the target industry, at least one data type associated with each sensitivity classification, and a proprietary recognition model associated with each data type, the proprietary recognition model indicating a risk level of the data type and at least one sensitivity data recognition pattern associated with the data type;
correspondingly, the target industry classification template is utilized to identify the special sensitive data of the industry to be identified, and an identification result of the special sensitive data of the industry is obtained, which comprises the following steps:
determining a target exclusive identification model matched with the target data type of the data to be identified in the target industry classification template;
carrying out industry exclusive sensitive data identification on the data to be identified by utilizing at least one sensitive data identification mode associated with the target data type in the target exclusive identification model to obtain an industry exclusive sensitive data identification result;
The industry-specific sensitive data identification result includes whether the data to be identified is sensitive data, if the data to be identified is sensitive data, the risk level of the data to be identified is the risk level of the target data type, and the sensitive classification corresponding to the data to be identified is the sensitive classification corresponding to the target data type.
3. The method of claim 1, wherein obtaining the target industry classification ranking template for the data to be identified and the target industry to which it belongs comprises:
and determining a target industry classification grading template of the data to be identified and the target industry to which the data belong based on the sensitive data identification task in response to the sensitive data identification task being triggered.
4. A method according to claim 3, wherein the sensitive data identification task comprises at least a scope for defining a sensitive data identification range and the target industry classification ranking template;
determining a target industry classification template of the data to be identified and the target industry to which the data to be identified belong based on the sensitive data identification task comprises the following steps:
and determining the data to be identified according to the scope, and acquiring the target industry classification template from the sensitive data identification task.
5. A method according to claim 3, further comprising, before the sensitive data identification task is triggered:
responding to a task creation request, and displaying a task creation interface, wherein the task creation interface at least comprises a scope configuration item and a template configuration item;
configuring a scope of the sensitive data identification task in response to a configuration operation initiated by the scope configuration item;
responding to the triggering operation of the template configuration item, and displaying a plurality of industry classification grading templates;
in response to a selection operation of the plurality of industry classification ranking templates, configuring the selected industry classification ranking template as the target industry classification ranking template of the sensitive data identification task to create the sensitive data identification task.
6. The method of claim 5, wherein the plurality of industry classification templates comprises at least one system level industry classification template and/or at least one custom industry classification template.
7. The method of claim 6, wherein the custom industry classification hierarchical template is created by:
selecting a system-level industry classification template from the at least one system-level industry classification template;
Updating the selected system-level industry classification template to obtain the custom industry classification template;
if the newly added exclusive identification model is created in the process of updating the selected system-level industry classification template, displaying an exclusive identification model configuration interface;
and configuring the newly added exclusive identification model in response to configuration operation of the exclusive identification model configuration interface.
8. The method of claim 7, wherein the proprietary recognition model configuration interface includes at least a data type configuration item, a classification hierarchy configuration item, and a recognition mode configuration item;
configuring the newly added proprietary recognition model in response to a configuration operation for the proprietary recognition model configuration interface, including:
responding to configuration operation triggered by the data type configuration item, and configuring the data type associated with the newly added exclusive identification model;
responding to configuration operation triggered by the classification configuration item, and configuring an industry classification template, a sensitive classification and a risk level associated with the newly added exclusive identification model;
and configuring at least one sensitive data identification mode of the data type associated with the newly added exclusive identification model in response to the configuration operation triggered by the identification mode configuration item.
9. The method of claim 8, wherein the proprietary recognition model configuration interface further comprises: a data tag configuration item;
and responding to the configuration operation triggered by the data tag configuration item, and configuring the data tag to which the sensitive data identified by the newly added exclusive identification model belongs.
10. A method for identifying sensitive data, applied to a cloud server, the method comprising:
responding to a task creation request triggered by the terminal equipment, and creating a sensitive data identification task;
responding to the triggering of the sensitive data identification task, and determining target industry classification templates of the data to be identified and the target industries to which the data to be identified belong based on the sensitive data identification task;
carrying out industry exclusive sensitive data identification on the data to be identified by utilizing the target industry classification template to obtain an industry exclusive sensitive data identification result;
carrying out universal sensitive data identification on the data to be identified by utilizing a universal identification model to obtain a universal sensitive data identification result;
and returning the industry exclusive sensitive data identification result and the general sensitive data identification result to the terminal equipment.
11. An electronic device, comprising: a memory and a processor; the memory is used for storing a computer program; the processor is coupled to the memory for executing the computer program for performing the steps in the method of any of claims 1-10.
12. A computer readable storage medium storing a computer program, which when executed by a processor causes the processor to carry out the steps of the method of any one of claims 1-10.
CN202310344068.0A 2023-03-28 2023-03-28 Sensitive data identification method, electronic equipment and storage medium Pending CN116401704A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310344068.0A CN116401704A (en) 2023-03-28 2023-03-28 Sensitive data identification method, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310344068.0A CN116401704A (en) 2023-03-28 2023-03-28 Sensitive data identification method, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116401704A true CN116401704A (en) 2023-07-07

Family

ID=87013709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310344068.0A Pending CN116401704A (en) 2023-03-28 2023-03-28 Sensitive data identification method, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116401704A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116776390A (en) * 2023-08-15 2023-09-19 上海观安信息技术股份有限公司 Method, device, storage medium and equipment for monitoring data leakage behavior

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116776390A (en) * 2023-08-15 2023-09-19 上海观安信息技术股份有限公司 Method, device, storage medium and equipment for monitoring data leakage behavior

Similar Documents

Publication Publication Date Title
CN109154935B (en) Method, system and readable storage device for analyzing captured information for task completion
CN110781376A (en) Information recommendation method, device, equipment and storage medium
US10936609B2 (en) Presenting user information suggestions
CN106033418B (en) Voice adding and playing method and device, and picture classifying and retrieving method and device
US20200050906A1 (en) Dynamic contextual data capture
CN107103011B (en) Method and device for realizing terminal data search
CN102160073B (en) Interest manager
CN116401704A (en) Sensitive data identification method, electronic equipment and storage medium
KR20210096230A (en) Data processing methods and devices, electronic devices and storage media
US11257029B2 (en) Pickup article cognitive fitment
CN114385623A (en) Data table acquisition method, device, apparatus, storage medium, and program product
CN113553521A (en) Content searching method and device
CN111553749A (en) Activity push strategy configuration method and device
CN104391844A (en) Data management system and tool
CN104240107A (en) Community data screening system and method thereof
US11314793B2 (en) Query processing
CN115686455A (en) Application development method, device and equipment based on spreadsheet and storage medium
CN110895552A (en) Personnel information acquisition method and device
CA2945505C (en) Electronic device and method of searching data records
CN114168183A (en) Front-end resource information processing method, device, equipment and storage medium
KR20210089242A (en) Storage and reading access methods, devices, electronic devices and storage media
Archana Acharya et al. A stitch in time saves nine: a Big Data analytics perspective
US11934413B2 (en) Techniques and systems for smart natural language processing of search results
US11366958B1 (en) Intelligent automated note annotation
US12032644B2 (en) Systems and methods for displaying contextually relevant links

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination