CN114254350A - Multi-dimensional fine-grained hierarchical classification management system and method and data access method - Google Patents

Multi-dimensional fine-grained hierarchical classification management system and method and data access method Download PDF

Info

Publication number
CN114254350A
CN114254350A CN202111574317.2A CN202111574317A CN114254350A CN 114254350 A CN114254350 A CN 114254350A CN 202111574317 A CN202111574317 A CN 202111574317A CN 114254350 A CN114254350 A CN 114254350A
Authority
CN
China
Prior art keywords
data
classification
metadata
resources
management module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111574317.2A
Other languages
Chinese (zh)
Inventor
刘亮亮
杜渂
王聚全
索涛
刘冉东
何之栋
梁铮
刘琦
周吉
李帅帅
侯俊丞
穆青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ds Information Technology Co ltd
Original Assignee
Ds Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ds Information Technology Co ltd filed Critical Ds Information Technology Co ltd
Priority to CN202111574317.2A priority Critical patent/CN114254350A/en
Publication of CN114254350A publication Critical patent/CN114254350A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2141Access rights, e.g. capability lists, access control lists, access tables, access matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Automation & Control Theory (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention provides a multi-dimensional fine-grained classification management system and method and a data access method, wherein the system comprises: the metadata management module is used for managing the data resources to be managed according to the corresponding category attributes; the data classification management module is used for classifying and storing the data resources managed by the metadata management module according to preset classes; the data grading management module is used for grading and storing the data resources managed by the metadata management module according to a preset grade; the preset level is the sensitive level of the metadata; and the user identification service module is used for identifying different users based on the storage of the metadata resources by the data classification management module and limiting the authority of the users to access the data categories or levels. The data classification management and control method effectively solves the technical problems that the existing data classification management and control method is not fine enough in granularity, not comprehensive in investigation dimensionality when oriented to business, not flexible enough in safety and the like.

Description

Multi-dimensional fine-grained hierarchical classification management system and method and data access method
Technical Field
The invention relates to the field of data security sharing service, in particular to a multi-dimensional fine-grained classification management system and method and a data access method.
Background
Data governance is a whole set of administrative activities in data organization involving the use of data that is initiated and enforced by enterprise data governance departments, the content including a series of policies and procedures on how to formulate and implement business applications and technical management for data within the entire enterprise. The final goal of data governance is to improve the value of data, which is the basis for enterprises to realize digital strategies, and is a management system, including standards, organizations, systems, methods, processes, tools, and the like.
In mass data management, in order to solve the problem that security management of data in a service sharing process ensures that certain data or certain types of data are not divulged, and especially when the problem that data lack of data sensitivity measurement standards in a data opening process is solved, an implementation method of data classification management and control and security management and control according to classification of different types and classification and combination of user permissions is generally adopted, sensitive information diffusion is prevented, and data abuse risks are avoided. The data classification is to grade the data resources according to the sensitivity of the data content and control the use range of the data resources according to the data grade. The data classification is to classify the data resources from multiple dimensions such as data sources, data resource categories, field meanings and the like, and to control the use range of the data resources according to the data categories. The data access authority is controlled by data classification and data classification in combination with the user authority, and the safety of the data in the sharing process is guaranteed.
In a mass storage medium, the existing common scheme or technical means is to realize data security sharing by combining a security framework with security control rules, the solutions mainly include Kerberos, Apache Sentry, Apache range and the like, the control range mainly includes a file level or a table level, the granularity is not fine enough, the investigation dimensionality is not comprehensive enough when the method is oriented to services, and the security of the method is not flexible enough when the method is oriented to the service levels. In particular, the present invention relates to a method for producing,
1) kerberos is an identity authentication protocol based on a symmetric key, which is used as an identity authentication service of an independent third party, can provide an identity authentication function for other services, and supports SSO (after client identity authentication, multiple services such as HBase/HDFS and the like can be accessed). But only can control access or deny access to a service, and cannot control to fine granularity, for example, a certain path of HDFS and a certain table of Hive need LDAP cooperation to realize authentication at a user level.
2) Apache Sentry is a Hadoop secure open source component issued by Cloudera, which provides fine-grained, role-based authorization. But only Hive, HDFS and Impala are supported, and the data storage medium is limited.
3) Apache Range supports only the Hive database column level, and other database components do not. Although the security granularity reaches the rank level, the security granularity is not flexible enough, and the storage medium has limitations because the security granularity needs to be customized when facing services.
Disclosure of Invention
The invention aims to provide a multi-dimensional fine-grained classification management system and method and a data access method, and effectively solve the technical problems that the existing data classification management and control method is not fine enough in granularity, not comprehensive in investigation dimensionality when oriented to business, not flexible enough in safety and the like.
The technical scheme provided by the invention is as follows:
in one aspect, the present invention provides a data service-oriented multi-dimensional fine-grained hierarchical classification management system, including:
the metadata management module is used for managing the data resources to be managed according to the corresponding category attributes;
the data classification management module is used for classifying and storing the data resources managed by the metadata management module according to preset classes;
the data grading management module is used for grading and storing the data resources managed by the metadata management module according to a preset grade; the preset level is the sensitivity level of the metadata;
and the user identification service module is used for identifying different users based on the storage of the data classification management module and the data classification management module on the data resources and limiting the data access permission of the users.
In another aspect, the present invention provides a data service-oriented multidimensional fine-grained classification management method, including:
acquiring data resources to be managed;
managing the metadata resources according to the category attributes;
classifying and storing the metadata resources according to preset categories;
storing the metadata resources in a grading manner according to a preset grade; the preset level is the sensitivity level of the metadata;
and identifying different users based on the storage of the metadata resources, and limiting the data access permission of the users.
In another aspect, the present invention provides a data access method, including:
receiving a request to access target data; the target data is stored and managed by adopting the data service-oriented multi-dimensional fine-grained classification management method;
acquiring user identification information and hierarchical classification identification information of the access data according to the request of accessing the target data;
matching the acquired identification information of the target data with identification information of a user identified in advance, wherein the identification information of the user identified in advance comprises a data category or a data level accessible to the user;
and if the data is matched with the target data, notifying the data access service to access the target data.
In another aspect, the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the foregoing service-oriented multi-dimensional fine-grained hierarchical classification management method or the foregoing data access method when executing the computer program.
In another aspect, the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-mentioned data service-oriented multidimensional fine-grained hierarchical classification management method or the above-mentioned data access method.
The multi-dimensional fine-grained classification management system, the multi-dimensional fine-grained classification management method and the data access method provided by the invention can at least bring the following beneficial effects:
1. compared with the traditional scheme of managing access from a specific storage medium, the method has the advantages that the control on the safe access of the data is not influenced no matter what medium the data is stored in, and the safety of the stored data is greatly improved.
2. A security system is established at a field level for data to be managed in a multi-dimensional mode, the security system comprises metadata dimensions, relationship dimensions among the metadata, sensitive dimensions and the like, and compared with a security access control method in the prior art (a storage medium is limited when field granularity is supported; only table-level granularity is supported when a large number of storage media are supported), the security of the data is guaranteed by penetrating into the metadata dimensions. And the dimensionality is different according to different industries and different business requirements, so that the data of an application layer can be more flexibly and conveniently safely controlled, and the problems of lack of data sensitivity measurement standards in the data open sharing process and the like are solved.
3. And a rule base is introduced, and the user is identified by the user identification service in a classified classification manner, so that the access control granularity of the user is finer, more refined and more flexible.
Drawings
The foregoing features, technical features, advantages and implementations of which will be further described in the following detailed description of the preferred embodiments in a clearly understandable manner in conjunction with the accompanying drawings.
FIG. 1 is a schematic structural diagram of an embodiment of a data service-oriented multidimensional fine-grained classification management system according to the present invention;
FIG. 2 is a schematic flow chart of an embodiment of a data service-oriented multidimensional fine-grained classification management method according to the present invention;
FIG. 3 is a flow chart of a data access method of the present invention;
fig. 4 is a schematic structural diagram of a terminal device in the present invention.
The reference numbers illustrate:
100-a multi-dimensional fine-grained hierarchical classification management system, 110-a metadata management module, 120-a data classification management module, 130-a data hierarchical management module and 140-a user identification service module.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, specific embodiments of the present invention will be described below with reference to the accompanying drawings. It is to be understood that the drawings in the following description are merely exemplary of the invention and that other drawings and embodiments may be devised by those skilled in the art without the use of inventive faculty.
A first embodiment of the present invention is a data service-oriented multi-dimensional fine-grained hierarchical classification management system 100, including: the metadata management module 110 is configured to manage data resources to be managed according to corresponding category attributes; the data classification management module 120 is configured to store the data resources managed by the metadata management module 110 in a classification manner according to preset categories; a data hierarchical management module 130, configured to store the data resources managed by the metadata management module 110 in a hierarchical manner according to a preset level; the preset level is the sensitive level of the metadata; the user identification service module 140 is configured to identify different users based on the storage of the data resources by the data classification management module 120 and the data hierarchy management module 130, and limit their rights to access data.
The metadata is also called intermediate data or relay data, and is data describing data (data about data), mainly information describing data property (property), and is used to support functions such as indicating storage location, history data, resource lookup, file recording, and the like. Metadata is information about the organization of data, data fields, and their relationships, and in short, metadata is data about data.
Metadata management is the most basic requirement for establishing a hierarchical classification system in the embodiment, and is important for the hierarchical classification management of resources in the future, for example, for the insurance industry, metadata in personnel information data resources includes: when a data resource access user sees that the personnel information contains the identity card and the household address information, and is interested in the information, the data resource access user needs to be applied to other services to realize target application, obviously cannot directly obtain the data from the system and needs corresponding authority to obtain the data, and the metadata needs to be managed firstly to realize the authority control of the user, which is also a basic module in the system and is a main body for establishing a hierarchical classification system.
The metadata management module 110 provides a metadata information list of the data resources during the management of the data resources, so as to interpret the metadata of the data resources and provide effective information for establishing a field granularity control method. In order to further improve the management efficiency, in an improved embodiment, the metadata management module 110 is further configured to catalog the data resources to be managed according to the corresponding category attributes, and uniquely encode the specific resources, where the unique code defines corresponding metadata information, and a user may view information included in the corresponding resources through metadata management so as to apply for an authority for the corresponding resources. For the metadata information list generated by the identification, the metadata is identified according to a security system and rules, and the identification is permanent and effective as long as the metadata of the data resource is not changed.
The data classification management module 120 classifies and stores the data resources managed by the metadata management module 110 according to preset classes, where the classified classes are different according to different industries and different business needs, and can be determined according to actual needs in applications, for example, in the insurance industry, the data resources can be classified into personal certificate information classes, personal highly sensitive privacy information classes, social characteristic information classes, and the like, where the personal highly sensitive privacy information classes include disease history, social security information, and the like; the social characteristic information class comprises information such as personnel name, nationality, height, age, occupation and the like.
In order to improve management efficiency and solve data security problems from finer granularity, in an improved embodiment, the data classification management module 120 classifies and stores metadata resources in a multi-dimensional and/or multi-hierarchy manner, where the storage dimensions include a data acquisition manner, a data resource type, a field classification, a field relationship classification, and the like (including metadata dimensions of data, relationship dimensions among multiple metadata in the data, metadata sensitivity dimensions of the data, and the like), and can be expanded as needed; the multi-level storage is performed according to a multi-level mode under the same dimensionality, and any dimensionality can be subjected to level expansion according to needs. For example, in the insurance industry, when the data acquisition mode is taken as one dimension, the first-layer specifications are classified into management, public and the like, and the second-layer classification under the management category can be further expanded into an enterprise management system and the like.
In order to facilitate management and storage of data resources, in a modified embodiment, the data hierarchy management module 130 is further configured to uniquely identify different classification categories through preset identifiers, and the identifiers may be defined by themselves as needed, as long as the identifiers are easy to distinguish. For example, in the insurance industry, the identification method is defined using a 6-bit length identifier beginning with GA. When the data resources are classified into the personal certificate information class, the personal highly sensitive privacy information class, the social characteristic information class and the like, the personal certificate information class is marked as GA0001, the personal highly sensitive privacy information class is marked as GA0002, the social characteristic information class is marked as GA0003, and the like.
The classification of the data resources can be realized by adopting two modes of manual operation and artificial intelligence algorithm, wherein the classification of the resources is calibrated by the manual mode according to the experience knowledge of the data resources, and the method is suitable for the application with smaller data volume. The artificial intelligence algorithm is mainly realized by adopting Bayes and neural network algorithms, and compared with an artificial mode, the artificial intelligence algorithm can save labor cost and improve classification accuracy. Specifically, the multidimensional fine-grained hierarchical classification management system 100 further includes a data classification module, configured to classify the data resources managed by the metadata management module 110 according to a pre-trained classification model, and store the metadata resources in a classification manner by the data classification management module 120 based on the classification result of the data classification module. The data classification module is embodied as algorithm model service in the system, classification of data resources is realized by building Bayes and neural network models, and before classification, classification training is carried out on the built models by adopting training data (training can be completed by adopting modes such as supervision, semi-supervision and the like) so as to improve classification accuracy. Of course, in practical application, two ways of combining artificial and artificial intelligence algorithms can be adopted to further improve the classification accuracy.
After the classification of the data resources is completed, the data resources managed by the metadata management module 110 are further stored in a hierarchical manner according to a preset level by the data hierarchical management module 130. The data hierarchy is the basis and basis for defining the access level of the data resource. Through data grading, graded access limitation is carried out on records and fields of contents such as sensitive contents, privacy contents, positioning information and the like, diffusion of sensitive information can be prevented, and the risk of means abuse is avoided.
The data grading process is to set sensitivity rules for the fields possibly containing sensitive contents and set the sensitivity level of the data record contents according to the sensitivity rules. For example, in the insurance industry, sensitive content includes certificate numbers, bank card numbers, portrait features, cell phone numbers, full text content, pictures, and the like. Sensitive rules may include sensitive identities, sensitive keywords, sensitive pictures, sensitive voice, sensitive value ranges, other sensitive information, and the like. For a sensitive identity, different sensitivity levels may be set for different levels of a particular access object; for the sensitive keywords, different sensitivity levels can be set according to different sensitivity degrees of the content of the keywords.
In order to facilitate management and storage of data resources, in an improved embodiment, the data hierarchical management module 130 is further configured to uniquely identify different sensitivity levels through a preset identifier, where the identifier may be self-defined as needed, as long as it is easy to distinguish, for example, two-digit or three-digit number representation is adopted according to the hierarchical level. For example, in the insurance industry, two digits are used for representing the sensitivity level, and the sensitivity level is identified according to 0X-9X. In order to improve the efficiency, each level can be further refined as required, namely, the metadata resources are stored in a multi-level mode in a grading mode, the metadata resources can be refined into 01-99 levels at most, and the smaller the numerical value is, the higher the sensitive level is.
The above-mentioned identification of the resource data, specifically, the data content, i.e. the data record, by the data classification management module 120 and the data hierarchy management module 130 needs to complete traversing the content of the data resource.
The data classification mode is implemented by a rule base, and specifically, the multidimensional fine-grained classification management system 100 further includes a data classification module for performing text analysis on the data resources managed by the metadata management module 110 according to a pre-trained classification model to generate a classification result; in this way, the data classification management module 120 stores the metadata resources in a classification manner based on the classification result of the data classification module. The data grading module is specifically embodied as an algorithm model service in a system, a grading model is established based on text matching or NLP (Natural Language Processing), the text content of data resources is analyzed, whether the current content accords with a sensitive content rule or not is judged, grading of the data resources is achieved, and before grading, training data is adopted to conduct grading training on the built model (training can be completed in modes of supervision, semi-supervision and the like) so as to improve grading accuracy, improve accuracy of a safety system and further improve user experience of user access data.
After the hierarchical classification storage of the data resources is completed, the user identifier service module 140 identifies different users based on the storage of the metadata resources by the data classification management module 120 and the data hierarchical management module 130, where the identified content is the identification content of the hierarchical classification to limit the authority of the user to access the data category or level, and after the user identifier is completed, the user identifier data of the user can be obtained when the subsequent user accesses the resources.
The user identifier is used for distinguishing the type or level of the data resource which can be accessed by the user, and the granularity can be a user role and can also be specific to an access object of resource access. And defining the sensitivity of corresponding accessible data according to the industry and the role of the organization or the user, and identifying the user according to the established multidimensional security system. The reference of the user identification can improve the granularity of the user data access control, the control of the user accessible data is more precise (identification is carried out in multiple dimensions), and the user accessible data is more flexible, for example, different users belonging to the same department of the same industry and even the same job level use different identifications due to different data, and finally the permission for accessing the data is different.
The embodiment is improved to obtain the embodiment, in this embodiment, the multidimensional fine-grained hierarchical classification management system 100 includes, in addition to the metadata management module 110, the data classification management module 120, the data hierarchical management module 130, the user identification service module 140, the data classification module and the data classification module, a data authority control module, which is used for acquiring a request for accessing target data and acquiring user identification information and hierarchical classification identification information of access data from the request; and matches it with the identification information of the user identification service module 140, if matching is successful, the user is released, and the target data is accessed.
In this embodiment, when a user needs to access a specific data resource, first, a user identifier and an identifier of the data resource to be accessed are obtained according to a request, and then, comparison is performed according to a business rule, and if the user identifier and the identifier corresponding to the data resource satisfy the comparison rule, the user can be released from accessing the data resource. Specifically, if the relevant identifier corresponding to the target data is within the range of the user identifier (the identification information of the user identifier service module 140), the data access service is notified that the user can access the target data. If the relevant identification corresponding to the target data exceeds the range of the user identification, the data access service is notified, the user cannot access the target data, the user is prevented from accessing the resource, and a prompt is given, so that the user does not have the right to access the resource.
Another embodiment of the present invention, a data service-oriented multidimensional fine-grained classification management method, as shown in fig. 2, includes:
s10, acquiring data resources to be managed;
s20, managing the metadata resource according to the category attribute;
s30, classifying and storing the metadata resources according to preset categories;
s40, storing the metadata resources in a grading way according to the preset grade; the preset level is the sensitive level of the metadata;
s50 identifies different users based on the deposit of metadata resources, defining their rights to access categories or levels of data.
Metadata management is the most basic requirement for establishing a hierarchical classification system in the embodiment, and is important for the hierarchical classification management of resources in the future, for example, for the insurance industry, metadata in personnel information data resources includes: when a data resource access user sees that the personnel information contains the identity card and the household address information, and is interested in the information, the data resource access user needs to be applied to other services to realize target application, obviously cannot directly obtain the data from the system and needs corresponding authority to obtain the data, and the metadata needs to be managed firstly to realize the authority control of the user, which is also a basic module in the system and is a main body for establishing a hierarchical classification system.
In step S20, through the management of the metadata resource, a metadata information list of the data resource is provided to interpret the data resource metadata, so as to provide effective information for establishing the field granularity control method. To further improve the management efficiency, in a modified embodiment, step S20 includes: and cataloging the metadata resources according to the category attributes, and uniquely encoding the specific resources, wherein the unique codes define corresponding metadata information, and a user can check information contained in the corresponding resources through metadata management so as to apply for the authority of the corresponding resources. For the metadata information list generated by the identification, the metadata is identified according to a security system and rules, and the identification is permanent and effective as long as the metadata of the data resource is not changed.
In step S30, the data resources are classified and stored according to preset categories, where the classified categories are different according to different industries and different business needs, and may be determined according to actual needs in applications, for example, in insurance industry, the classified categories may be classified into personal certificate information categories, personal highly sensitive privacy information categories, social characteristic information categories, and the like, where the personal highly sensitive privacy information categories include disease history, social security information, and the like; the social characteristic information class comprises information such as personnel name, nationality, height, age, occupation and the like.
In order to improve the management efficiency, in an improved embodiment, in step S30, metadata resources are classified and stored in a multi-dimensional and/or multi-level manner, where the storage dimension includes a data acquisition manner, a data resource category, a field classification, a field relationship classification, and the like, and may be expanded as needed; the multi-level storage is performed according to a multi-level mode under the same dimensionality, and any dimensionality can be subjected to level expansion according to needs. For example, in the insurance industry, when the data acquisition mode is taken as one dimension, the first-layer specifications are classified into management, public and the like, and the second-layer classification under the management category can be further expanded into an enterprise management system and the like.
In order to facilitate management and storage of the data resources, in a modified embodiment, the step S30 is further configured to uniquely identify different classification categories according to preset identifiers, where the identifiers may be self-defined as needed, as long as the identifiers are easily distinguished. For example, in the insurance industry, the identification method is defined using a 6-bit length identifier beginning with GA. When the data resources are classified into the personal certificate information class, the personal highly sensitive privacy information class, the social characteristic information class and the like, the personal certificate information class is marked as GA0001, the personal highly sensitive privacy information class is marked as GA0002, the social characteristic information class is marked as GA0003, and the like.
The classification of the data resources can be realized by adopting two modes of manual operation and artificial intelligence algorithm, wherein the classification of the resources is calibrated by the manual mode according to the experience knowledge of the data resources, and the method is suitable for the application with smaller data volume. The artificial intelligence algorithm is mainly realized by adopting Bayes and neural network algorithms, and compared with an artificial mode, the artificial intelligence algorithm can save labor cost and improve classification accuracy. Specifically, step S30 includes: s31, classifying the metadata resources according to the pre-trained classification model; s32, uniquely identifying different classification categories according to preset identifiers; and classifying and storing the metadata resources according to the preset classification in the subsequent process. The step is embodied as algorithm model service in the system, classification of data resources is realized by building Bayes and neural network models, and before classification, classification training is carried out on the built models by adopting training data (training can be completed by adopting modes such as supervision, semi-supervision and the like) so as to improve the classification accuracy. Of course, in practical application, two ways of combining artificial and artificial intelligence algorithms can be adopted to further improve the classification accuracy.
After the classification of the data resources is completed, step S40 further performs hierarchical storage of the metadata resources according to a preset level. The data hierarchy is the basis and basis for defining the access level of the data resource. Through data grading, graded access limitation is carried out on records and fields of contents such as sensitive contents, privacy contents, positioning information and the like, diffusion of sensitive information can be prevented, and the risk of means abuse is avoided.
The data grading process is to set sensitivity rules for the fields possibly containing sensitive contents and set the sensitivity level of the data record contents according to the sensitivity rules. For example, in the insurance industry, sensitive content includes certificate numbers, bank card numbers, portrait features, cell phone numbers, full text content, pictures, and the like. Sensitive rules may include sensitive identities, sensitive keywords, sensitive pictures, sensitive voice, sensitive value ranges, other sensitive information, and the like. For a sensitive identity, different sensitivity levels may be set for different levels of a particular access object; for the sensitive keywords, different sensitivity levels can be set according to different sensitivity degrees of the content of the keywords.
In order to facilitate management and storage of data resources, in a modified embodiment, step S40 further uniquely identifies different sensitivity levels through a preset identifier, where the identifier may be self-defined as needed, as long as it is easily distinguished, for example, two-bit or three-bit digital representation is adopted according to the classification level. For example, in the insurance industry, two digits are used for representing the sensitivity level, and the sensitivity level is identified according to 0X-9X. In order to improve the efficiency, each level can be further refined as required, namely, the metadata resources are stored in a multi-level mode in a grading mode, the metadata resources can be refined into 01-99 levels at most, and the smaller the numerical value is, the higher the sensitive level is.
The data classification is implemented by using a rule base, and specifically, the step S40 further includes: s41, performing text analysis on the metadata resource according to the pre-trained hierarchical model to generate a hierarchical result; s42 uniquely identifies the different sensitivity levels according to a predetermined identifier. And the metadata resources can be conveniently stored in a grading way according to the grading result. The method specifically includes the steps that an algorithm model service in the system is realized, a grading model is established based on text matching or NLP (Natural Language Processing), the text content of the data resource is analyzed, whether the current content meets the sensitive content rule or not is judged, grading of the data resource is achieved, and before grading, the built model is trained in a grading mode through training data (training can be completed in modes of supervision, semi-supervision and the like) so that grading accuracy is improved, accuracy of a safety system is improved, and user experience of user access data is improved.
After the hierarchical classification storage of the data resource is completed, in step S50, different users are identified based on the storage of the metadata resource, and the authority of accessing the data category or level is defined, the identified content is the identified content of the hierarchical classification, so as to define the authority of accessing the data category or level, and after the user identification is completed, when the subsequent user accesses the resource, the user identification data of the user can be obtained.
The user identifier is used for distinguishing the type or level of the data resource which can be accessed by the user, and the granularity can be a user role and can also be specific to an access object of resource access. And defining the sensitivity of corresponding accessible data according to the industry and the role of the organization or the user, and identifying the user according to a multi-dimensional security system. The reference of the user identification can improve the granularity of the user data access control, and the user data access control is more precise and flexible, for example, different persons belonging to the same department or even the same job level of the same industry use different identifications due to different data, and finally have different data access rights.
In another embodiment of the present invention, a data access method, as shown in fig. 3, includes:
s60 receiving a request to access target data; the target data are stored and managed by adopting the data service-oriented multi-dimensional fine-grained classification management method;
s70, according to the request of accessing the target data, obtaining the user identification information and the classification identification information of the access data;
s80 matching the identification information of the acquired target data with identification information of a pre-identified user, the identification information of the pre-identified user including a data category or level accessible to the user;
if the data access service matches S90, the data access service is notified to access the target data.
In this embodiment, when a user needs to access a specific data resource, a request is sent to a target resource access service; then, the data authority control service intervenes to acquire user identification information and hierarchical classification identification information (including metadata information, hierarchical information, classification information and the like of data resources) of the access data; and then, the data authority control server compares the user identification information with the hierarchical classification information corresponding to the accessed target data respectively according to the service rule. And if the user identification and the identification corresponding to the data resource meet the comparison rule, the user can be released to access the data resource. Specifically, if the associated identifier corresponding to the target data is within the range of the user identifier (the user identifier in step S50), the data access service is notified that the user can access the target data. If the relevant identification corresponding to the target data exceeds the range of the user identification, the data access service is notified, the user cannot access the target data, the user is prevented from accessing the resource, and a prompt is given, so that the user does not have the right to access the resource.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of program modules is illustrated, and in practical applications, the above-described distribution of functions may be performed by different program modules, that is, the internal structure of the apparatus may be divided into different program units or modules to perform all or part of the above-described functions. Each program module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one processing unit, and the integrated unit may be implemented in a form of hardware, or may be implemented in a form of software program unit. In addition, the specific names of the program modules are only used for distinguishing the program modules from one another, and are not used for limiting the protection scope of the application.
Fig. 4 is a schematic structural diagram of a terminal device provided in an embodiment of the present invention, and as shown, the terminal device 200 includes: a processor 220, a memory 210, and a computer program 211 stored in the memory 210 and executable on the processor 220, such as: a data service oriented multi-dimensional fine-grained hierarchical classification manager or data access program. The processor 220 executes the computer program 211 to implement the steps in the above-mentioned embodiments of the data service-oriented multidimensional fine-grained hierarchical classification management method or the data access method, or the processor 220 executes the computer program 211 to implement the functions of the modules in the above-mentioned embodiments of the data service-oriented multidimensional fine-grained hierarchical classification management system.
The terminal device 200 may be a notebook, a palm computer, a tablet computer, a mobile phone, or the like. Terminal device 200 may include, but is not limited to, processor 220, memory 210. Those skilled in the art will appreciate that fig. 4 is merely an example of terminal device 200, does not constitute a limitation of terminal device 200, and may include more or fewer components than shown, or some components may be combined, or different components, such as: terminal device 200 may also include input-output devices, display devices, network access devices, buses, and the like.
The Processor 220 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor 220 may be a microprocessor or the processor may be any conventional processor or the like.
The memory 210 may be an internal storage unit of the terminal device 200, such as: a hard disk or a memory of the terminal device 200. The memory 210 may also be an external storage device of the terminal device 200, such as: a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device 200. Further, the memory 210 may also include both an internal storage unit of the terminal device 200 and an external storage device. The memory 210 is used to store the computer program 211 and other programs and data required by the terminal device 200. The memory 210 may also be used to temporarily store data that has been output or is to be output.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or recited in detail in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed terminal device and method may be implemented in other ways. For example, the above-described terminal device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by sending instructions to relevant hardware by the computer program 211, where the computer program 211 may be stored in a computer-readable storage medium, and when the computer program 211 is executed by the processor 220, the steps of the method embodiments may be implemented. Wherein the computer program 211 comprises: computer program code which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying the code of computer program 211, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the content of the computer readable storage medium can be increased or decreased according to the requirements of the legislation and patent practice in the jurisdiction, for example: in certain jurisdictions, in accordance with legislation and patent practice, the computer-readable medium does not include electrical carrier signals and telecommunications signals.
It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be construed as the protection scope of the present invention.

Claims (12)

1. A data service-oriented multi-dimensional fine-grained hierarchical classification management system is characterized by comprising:
the metadata management module is used for managing the data resources to be managed according to the corresponding category attributes;
the data classification management module is used for classifying and storing the data resources managed by the metadata management module according to preset classes;
the data classification management module is used for identifying different users based on the storage of data resources by the data classification management module and limiting the data access permission of the users;
and the user identification service module is used for identifying different users based on the storage of the metadata resources by the data classification management module and limiting the authority of the users to access the data categories or levels.
2. The data service-oriented multi-dimensional fine-grained hierarchical classification management system according to claim 1, wherein the metadata management module is further configured to uniquely encode the data resources to be managed according to the corresponding category attributes, so as to define metadata information for the corresponding metadata resources; or
The data grading management module is also used for uniquely identifying different classification categories through preset identifiers;
the data grading management module is also used for uniquely identifying different sensitivity levels through preset identifiers.
3. The data service oriented multi-dimensional fine-grained hierarchical classification management system according to claim 1 or 2,
the multi-dimensional fine-grained hierarchical classification management system also comprises a data classification module, a metadata management module and a classification module, wherein the data classification module is used for classifying the data resources managed by the metadata management module according to a pre-trained classification model; the data classification management module classifies and stores metadata resources based on the classification result of the data classification module; or
The multi-dimensional fine-grained classification management system also comprises a data classification module, a classification module and a classification module, wherein the data classification module is used for performing text analysis on the data resources managed by the metadata management module according to a pre-trained classification model to generate a classification result; and the data classification management module is used for storing the metadata resources in a classification mode based on the classification result of the data classification module.
4. The data service-oriented multi-dimensional fine-grained classification management system according to claim 1 or 2, wherein in the data classification management module, metadata resources are classified and stored in a multi-dimensional and/or multi-level manner, wherein the storage dimension includes a data acquisition manner, a data resource type, a field classification and a field relationship classification; the multi-level storage is carried out in a mode of multiple levels under the same dimensionality; or, in the data hierarchical management module, the metadata resources are stored in a hierarchical manner according to a multi-hierarchy mode.
5. The data service-oriented multidimensional fine-grained hierarchical classification management system according to claim 1 or 2, characterized in that the multidimensional fine-grained hierarchical classification management system further comprises a data authority control module for obtaining a request for accessing target data and obtaining user identification information and hierarchical classification identification information of the access data from the request; and matching the identification information with the identification information of the user identification service module, if the matching is successful, releasing the user and accessing the target data.
6. A data service-oriented multi-dimensional fine-grained classification management method is characterized by comprising the following steps:
acquiring data resources to be managed;
managing the metadata resources according to the category attributes;
classifying and storing the metadata resources according to preset categories;
storing the metadata resources in a grading manner according to a preset grade; the preset level is the sensitivity level of the metadata;
and identifying different users based on the storage of the metadata resources, and limiting the authority of the users to access the data categories or levels.
7. The data service oriented multi-dimensional fine-grained hierarchical classification management method according to claim 6,
the management of the metadata resources according to the category attributes comprises the following steps: uniquely encoding the metadata resources according to the category attributes to define metadata information for the corresponding metadata resources; or
In classifying and storing the metadata resources according to preset categories, the method comprises the following steps: carrying out unique identification on different classification categories according to a preset identifier; or
The step of storing the metadata resources in a grading manner according to the preset level comprises the following steps: and uniquely identifying different sensitivity levels according to a preset identifier.
8. The data service oriented multi-dimensional fine-grained hierarchical classification management method according to claim 6,
in classifying and storing the metadata resources according to preset categories, the method comprises the following steps:
classifying the metadata resources according to a pre-trained classification model;
carrying out unique identification on different classification categories according to a preset identifier; or
The step of storing the metadata resources in a grading manner according to the preset level comprises the following steps:
performing text analysis on the metadata resources according to the pre-trained hierarchical model;
and uniquely identifying different sensitivity levels according to a preset identifier.
9. The data service-oriented multidimensional fine-grained hierarchical classification management method according to claim 6, 7 or 8, wherein in the classification storage of the metadata resources according to preset categories, the metadata resources are classified and stored in a multidimensional and/or multi-level manner, wherein the storage dimensions include data acquisition manners, data resource categories, field classifications and field relationship classifications; the multi-level storage is carried out in a mode of multiple levels under the same dimensionality; or, in the step of storing the metadata resources in a grading manner according to the preset grade, the metadata resources are stored in a grading manner according to a multi-level manner.
10. A method of data access, comprising:
receiving a request to access target data; the target data are stored and managed by adopting the data service-oriented multi-dimensional fine-grained classification management method according to any one of claims 6 to 9;
acquiring user identification information and hierarchical classification identification information of the access data according to the request of accessing the target data;
matching the acquired identification information of the target data with identification information of a user identified in advance, wherein the identification information of the user identified in advance comprises a data category or a data level accessible to the user;
and if the data is matched with the target data, notifying the data access service to access the target data.
11. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the data service oriented multidimensional fine grained hierarchical classification management method according to any one of claims 6 to 9 or the data access method according to claim 10 when executing the computer program.
12. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the data service oriented multidimensional fine-grained classification management method according to any one of claims 6 to 9 or the data access method according to claim 10.
CN202111574317.2A 2021-12-21 2021-12-21 Multi-dimensional fine-grained hierarchical classification management system and method and data access method Pending CN114254350A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111574317.2A CN114254350A (en) 2021-12-21 2021-12-21 Multi-dimensional fine-grained hierarchical classification management system and method and data access method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111574317.2A CN114254350A (en) 2021-12-21 2021-12-21 Multi-dimensional fine-grained hierarchical classification management system and method and data access method

Publications (1)

Publication Number Publication Date
CN114254350A true CN114254350A (en) 2022-03-29

Family

ID=80793869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111574317.2A Pending CN114254350A (en) 2021-12-21 2021-12-21 Multi-dimensional fine-grained hierarchical classification management system and method and data access method

Country Status (1)

Country Link
CN (1) CN114254350A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116340975A (en) * 2023-03-16 2023-06-27 江苏骏安信息测评认证有限公司 Cache data safety protection system based on cloud computing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116340975A (en) * 2023-03-16 2023-06-27 江苏骏安信息测评认证有限公司 Cache data safety protection system based on cloud computing

Similar Documents

Publication Publication Date Title
CN112699175B (en) Data management system and method thereof
US11972006B2 (en) System of decentralized zero-trust services for creating, using and analyzing securely commingled self-governing data sets
EP3387575B1 (en) Policy enforcement for compute nodes
US9411977B2 (en) System and method for enforcing role membership removal requirements
CN100430951C (en) Systems and methods of access control enabling ownership of access control lists to users or groups
US11227068B2 (en) System and method for sensitive data retirement
US20180197145A1 (en) Multi-stage service record collection and access
US11734351B2 (en) Predicted data use obligation match using data differentiators
US11755768B2 (en) Methods, apparatuses, and systems for data rights tracking
CN109597843A (en) Data managing method, device, storage medium and the electronic equipment of big data environment
CN111625809A (en) Data authorization method and device, electronic equipment and storage medium
WO2020190309A1 (en) Method and system for managing personal digital identifiers of a user in a plurality of data elements
US11321479B2 (en) Dynamic enforcement of data protection policies for arbitrary tabular data access to a corpus of rectangular data sets
WO2019244036A1 (en) Method and server for access verification in an identity and access management system
CN114254350A (en) Multi-dimensional fine-grained hierarchical classification management system and method and data access method
CN114036549A (en) Database access control method and device based on data labels
JP2003108440A (en) Data disclosing method, data disclosing program, and data disclosing device
CN110928963A (en) Column-level authority knowledge graph construction method for operation and maintenance service data table
CN114186277A (en) Information protection method and system
CN113221177A (en) Data access method, device and system in distributed system
CN108171390A (en) A kind of secrecy department devices account informationization dynamic management system
CN116910651A (en) Data security treatment method and device based on hierarchical classification and readable medium
Gesicho et al. Ethical issues in implementing national-level health data warehouses in developing countries
CN117675330A (en) Data platform access control method and device, storage medium and electronic equipment
CN115587393A (en) Distributed performance data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination