CN106202452B - Unified data resource management system and method for big data platform - Google Patents

Unified data resource management system and method for big data platform Download PDF

Info

Publication number
CN106202452B
CN106202452B CN201610555871.9A CN201610555871A CN106202452B CN 106202452 B CN106202452 B CN 106202452B CN 201610555871 A CN201610555871 A CN 201610555871A CN 106202452 B CN106202452 B CN 106202452B
Authority
CN
China
Prior art keywords
data
user
container
module
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610555871.9A
Other languages
Chinese (zh)
Other versions
CN106202452A (en
Inventor
谢志鹏
胡俊峰
王鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201610555871.9A priority Critical patent/CN106202452B/en
Publication of CN106202452A publication Critical patent/CN106202452A/en
Application granted granted Critical
Publication of CN106202452B publication Critical patent/CN106202452B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2117User registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2141Access rights, e.g. capability lists, access control lists, access tables, access matrices

Abstract

The invention belongs to the technical field of big data platforms, and particularly relates to a unified data resource management system and a unified data resource management method for a big data platform. Aiming at the problem that data resources are difficult to manage due to the existence of various different types of data resources in a big data platform, the invention provides unified metadata description for the different types of data resources and unified adapter interface specifications for different types of data management components, and designs a unified data resource management method and a unified data resource management system on the basis of the unified metadata description, the unified data package uploading and downloading, the unified data discovery, the unified data access application and authorization and other functions. The invention can realize the dynamic extensibility, multi-tenant management and unified access control of the system, and is convenient for users to manage and use different types of data resources in the big data platform.

Description

Unified data resource management system and method for big data platform
Technical Field
The invention belongs to the technical field of big data platforms, and particularly relates to a unified data resource management system and a unified data resource management method for a big data platform.
Background
With the development of information technology, data has penetrated into every industry and business field today, becoming an important production factor. The effective management of mass data and further mining and application become the key for improving the core competitiveness and preempting the market opportunity of enterprises. Under the background, the big data technology has come forward, and has the distinct characteristics of large data volume, multiple data types, low value density, high processing speed and the like.
On existing big data platforms (represented by Hadoop), different types of data resources are often managed by different data management components in the big data ecosystem. For example: the distributed file system (represented typically by HDSFS) manages file-type data objects such as directories and files, the distributed columnar database system (represented typically by hbsase) manages data objects such as namespaces, tables, column families, columns, and the like, and the distributed table system (represented typically by HIVE) manages data objects such as databases, tables, fields, and the like. At the same time, different data management components use different access control policies to protect the data objects they manage from unauthorized access operations.
However, under the technical conditions of the existing Hadoop big data platform, different types of data resources have distinct logical data models, so that unified metadata management is lacked, and further, a platform user, even an administrator user of the platform, cannot obtain a unified view of the data resources in the platform, and it is difficult to find and locate all data resources that a specific analysis task may need to access, and it is also difficult to implement simple and effective access control on the own data resources. The user must be familiar with the different operator interfaces of the various data management components to be able to complete the task.
In view of the above, how to provide a unified view and an operation interface for metadata description, discovery, access control, and the like of data resources for different types of data resources in a big data platform is a problem to be solved in the art, and has application value.
Disclosure of Invention
The invention aims to provide a unified data resource management method and a system for a big data platform.
Aiming at the problem that data resources are difficult to manage due to the existence of various different types of data resources in a big data platform, the invention provides unified metadata description for the different types of data resources and unified adapter interface specifications for different types of data management components, and designs a unified data resource management method and a unified data resource management system on the basis of the unified metadata description, the unified data package uploading and downloading, the unified data discovery, the unified data access application and authorization and other functions.
The invention firstly provides a unified hierarchical meta-model to depict different types of data resources in a big data platform. The hierarchical meta-model provides a uniform representation model for the meta-data of different types of data resources, namely, the different types of data resources have the same meta-data mode, and a basis is provided for uniform management of the different types of data resources.
The unified meta-model consists of four layers of structures: the first layer is a data container layer, the second layer is a data cladding layer, the third layer is a data resource layer, and the fourth layer is a data field layer. Wherein the data container corresponds to a running instance of a data management component (typically deployed on a cluster), the data package is a collection of related data assets, the data assets are relatively independent data objects managed by a data container, and the data fields are data fields of relational data assets. Physically, the data packets are stored and managed by the data containers. The data container layer is transparent to ordinary users, the users cannot perceive the existence of the layer, and the system is responsible for storing and maintaining the mapping relation between the data packets and the data containers.
The meta-model has a one-to-many mapping relationship from top to bottom between adjacent layers, that is, each data container stores and manages one or more data packages, each data package is a collection of one or more interrelated data assets, and each data asset is composed of 0 or more data fields. These mappings are stored and maintained bi-directionally by the system: depending on a given data object, it may obtain either a data object of a layer above it or a list of data objects of a layer below it.
In addition to the mapping relationships between the hierarchies, the meta-model defines a plurality of attributes for the data object at each hierarchy, which includes: basic description attributes, access control attributes, derivative relationship attributes, and user access information. The basic description attribute records the description information of related industries, related disciplines, keywords, owners, creation dates and the like of the data object; the access control attribute describes which users can access the data object in which way; the derivation relationship attributes describe derivation relationships between the data objects; the user access information records when and how the data object was accessed by which users.
The invention further defines a unified adapter interface specification on the basis of a layered unified meta model, and any type of data management component can be dynamically and extendably accessed to the system by customizing an adapter plug-in which conforms to the interface specification. Wherein the functions involved in the adapter interface include (but are not limited to): (1) the method comprises the steps of (1) translating the symbolic name of a data object into a specific data object in a corresponding data container, (2) uploading and downloading a data packet, and (3) granting and recovering data authority. Through the abstraction of the uniform adapter interface, the system realizes dynamic expandability, and can conveniently support different types of data management components, namely, only corresponding adapter plug-ins are customized and developed for the data management components, and the adapters are added into the system.
There may be multiple deployment instances (i.e., multiple containers of the same type) for any one type of data management component. After the adapter corresponding to the data management component has been added to the system, each instance deployment cluster thereof can also be added to the system as a data container. Data packets in a large data platform are delivered to a data container for physical storage and management. The adapter shields the difference of implementation details in different types of data management components, and presents a uniform interface which is bidirectional in function, on one hand, the adapter converts abstract operation instructions of data objects into concrete command sequences of concrete data containers and executes the concrete command sequences; on the other hand, it obtains the description information about the data packet from the specific container and converts it into a unified information description format.
The invention provides a unified management system of different types of data resources, namely a unified data resource management system of a big data platform, on the basis of a hierarchical unified meta-model and a unified adapter interface specification. The system supports functions of uniform data packet uploading and downloading, uniform data discovery, uniform data access application and authorization and the like.
The unified data access application and authorization adopts a fine-grained access control model, and two abstract operations of 'reading' and 'writing' are respectively defined on three-layer abstract data objects (data packets, data assets and data fields). The owner of the data package may grant the user "read" or "write" rights to the entire data package, or to certain data assets in the data package, or to certain data fields in the data assets. The user expresses own data resource access authority request on the abstract model, and after the authorization is approved by the data owner, the authorization information of the abstract object is converted into a concrete instruction sequence of the corresponding data container through an authorization function in the corresponding adapter, and the execution is completed.
In summary, in the present system: the unified meta-model provides a unified description language for unified management of data resources, while adapter implementations that conform to the adapter interface specification act as translation layers (or translation layers). The role of the adapter is bi-directional: on one hand, the description information of the data resources is obtained from a specific data management module and is converted into a uniform description language; on the other hand, the abstract operation of the data resource is converted into a concrete command sequence of the corresponding data object by the data management module to be executed.
The unified data resource management system of the big data platform provided by the invention is shown in fig. 2. It includes: the system comprises a data packet loading module, a data resource discovery module, a user registration login module, a data access application and authorization module, a metadata management module, a data container management module, an adapter management module and the like;
the adapter management module is used for managing information related to the adapter. The system is responsible for receiving an adapter plug-in (namely a routine library realizing adapter interface specification) and a corresponding data management component type name transmitted from an access end by a system administrator, storing the adapter plug-in a sub-directory of/Adapters under a system root directory, and recording mapping between the adapter plug-in and the type name through a metadata management module; when the system is started, the system is also responsible for loading all the existing adapter plug-ins in the system and establishing the mapping relation between the adapter plug-ins and the type names in the memory; the system runs, it provides a mapping service of type names to adapter plug-in objects to other modules.
The data container management module is used for managing information related to the data container. The system is responsible for receiving a data container name, a type name of a corresponding data management component and related configuration information input by a system administrator from an access end, and recording corresponding information through a metadata management module; when the system is started, it is responsible for establishing the mapping between the data container and the data management component type name in the memory; when the system runs, the system provides mapping service of the data container name to the management data component type name to other modules; the system is also responsible for monitoring and maintaining the online working state of each data container while it is running.
The user registration and login module is used for being responsible for registration and login work of common users. For the registration of a new user, the module receives an account name and a password registered by the user at the access terminal, creates a KerberosPrincial and a corresponding key for the account, and writes the Princial and the key into a keytab file which can only be accessed by a system service program. For the login of the user, the system receives the account name and the password input by the user at the access end, and judges whether the password is correct: if the password is wrong, returning the information of login failure; if the account name is correct, the Kerberos authentication is carried out once by using the Kerberos Principal and the key corresponding to the account name, and the Configuration object of the user is cached, so that frequent repeated authentication is avoided.
The metadata management module is used for storing, maintaining and managing metadata of four types of objects such as data containers, data packages, data assets, data fields and the like; and provides a query interface to other functional modules.
The data packet loading module is used for receiving a data packet compressed file transmitted by a common user from an access end, selecting a data container of the type by using a load balancing mechanism according to a data management component type name specified by the user, and calling a data packet loading function provided by the adapter plug-in of the type to physically store and manage the data packet in the data container; after the data packet is successfully loaded into the selected data container, the module is also responsible for recording the description information of the data packet input by the user through a metadata management module.
The data resource discovery module is used for providing queries to users. It receives the user's inquiry request at the access end, firstly, verifies the user's identity and the validity of the inquiry content, then retrieves the data packet list hit by the inquiry from the metadata storage, filters according to the working state of the data container where the data resource packet is located, and finally returns the filtered data packet list as the inquiry result to the access end.
The data access application and authorization module is used for receiving an access application of a common user to a data packet, acquiring an owner of the data packet through the metadata management module, and transmitting the application to the owner. If the owner approves the application, the data access application and authorization module authenticates the identity of the owner, and if the data access application and authorization module passes the authentication, the metadata management module is called to acquire the data container where the related data resources are located, and an authorization routine in the corresponding adapter plug-in is called to complete a specific authorization action in the data container.
Compared with the prior art, the invention has the following advantages:
(1) the dynamic expansion of the system can be realized, and the novel data management module can be dynamically accessed into the system by customizing and developing a corresponding adapter which accords with the interface specification without compiling and deploying the whole system from the beginning;
(2) the management of multiple tenants can be realized. The user can manage different types of data resources owned by the user, can authorize other users to access the data resources owned by the user, and can also apply for accessing the data resources owned by other users;
(3) unified data access application and authorization can be realized. The access application of different types of data resources has a uniform interface, and the mapping between the abstract authorization interface and the native access control is realized by using the adapter as a translation layer, so that a user can conveniently apply and authorize the access authority of the data resources.
Drawings
FIG. 1 is a diagram of the correspondence between abstract four-level meta-models and concrete data management modules.
Fig. 2 is a schematic diagram of a data resource unified management system.
Detailed Description
The technical embodiments of the present invention will be described in detail below with reference to the accompanying drawings, but the scope of the present invention is not limited to the examples.
Fig. 1 shows the correspondence between the HDFS data management module and the HBase data management module and the four-layer meta model. In an embodiment, each HDFS cluster or HBase cluster corresponds to a data container. Three layers of directory "/username/directory name/file name" are maintained in each HDFS cluster, where: "/username/directory name" corresponds to a packet, "username" represents the username of the packet owner; and "filename" corresponds to the data asset. Each namespace in each HBase cluster corresponds to a data package, the tables in the namespaces correspond to the data assets in the data package, and the columns in the tables correspond to the data fields in the data assets.
Fig. 2 shows an overall configuration of the system. Example (b): the invention provides a unified management method for different types of data resources and a system adopting the method. The system comprises a data packet loading module, a data resource discovery module, a user registration and login module, a data access application and authorization module, a metadata management module, a data container management module, an adapter management module and the like.
Step 1: the system administrator user adds different adapter plug-ins through the "adapter management module". Adapter plugins are libraries of routines that conform to the adapter interface specification, each adapter plugin being specifically tailored to a certain data management module (e.g., HDFS, HBase, or Hive).
Step 2: the system administrator user adds or deletes data containers through a data container management module, each data container corresponds to one adapter plug-in to indicate the type of the container, and the operation on the container is completed through the routine in the adapter plug-in. When the system is started, the data container management module automatically acquires the types and physical addresses of all the data containers and establishes connection with the data containers. In addition, the data container management module is also responsible for monitoring and maintaining the online working state of each data container, such as: whether a data container is currently online or offline, the storage space usage in the data container, and so on. The system completes the migration of data among different data containers according to the monitoring information, and completes the tasks of load balancing among different data containers and the like.
And step 3: the ordinary user can register an account and log in the system through a user registration and login module. Because the big data platform deployed by the system works in a safe mode and Kerberos is used for user identity authentication, when a user registers an account, the system creates Kerberos Principal and a corresponding key for each user, and writes the Principal and the key into a keytab file which can only be accessed by a system service program. When a user logs in the system, the system only needs to pass Kerberos authentication once, and the Configuration object of the user can be cached after the authentication is successful, so that frequent repeated authentication is avoided.
And 4, step 4: after a registered common user logs in the system, the data packet can be uploaded to the platform through a data packet loading module, and the data packet is loaded and managed by the platform. The data packet is uploaded in two steps. The first step is to pack the data packet into a compressed file to be uploaded through a file uploading page provided by the system. The user can view all files uploaded by the user and the corresponding file ID numbers (assigned by the system). The second step is that the user specifies the type of the data packet in the form, inputs the basic description information of the data packet, and selects the file ID number of the previously uploaded data packet, so that a data packet object can be created on the platform. The system dynamically selects a container from a plurality of data containers of the type through a load balancing mechanism according to the type of a data packet object to manage the data packet, loads the data packet into the container through an adapter for management, simultaneously adds metadata related to the data packet into a metadata base through a metadata management module, and records the physical address of the data packet.
And 5, after the registered common user logs in the system, searching the interested data packet through a data resource discovery module. The user can inquire through the access terminal, the data resource discovery module firstly verifies the user identity and the legality of the inquired content, retrieves a list of data packets hit by the inquiry from the metadata storage, filters according to the working state of a data container where the data resources are located, and returns the filtered list of data packets to the access terminal as an inquiry result. The user can select interested data packets at the access end according to the query result so as to further view the details. The description metadata of the data package comprises a plurality of information such as text description, industry, subject, keywords, data package types and the like, and the unified data resource discovery can find out the data package which is interested by the user in the description metadata base of the data package according to the query input by the user. In the embodiment, the data packet uploaded by the user can only be read/written by the owner under the default condition, and if other users need to access, the other users need to apply in step 6 and can access after the owner approves the data packet.
And 6, the user can check the detailed information of the data package in the data resource discovery module, if the data package needs to be used, a data access request can be provided to a resource owner through the access terminal, the name of the data resource needing to be accessed, the type (reading or writing) of the access operation needing to be performed and the time limit of the access needing to be performed are filled in through a form of the data access request, an access application is provided, and the access application is sent to an owner user of the data package for approval. After the owner user of the data logs in the system, the corresponding access application can be checked, and the approval or the rejection of the application is determined. If the application is approved, the corresponding authorization work is delivered to the data access authorization module for completion. The unified authorization module can analyze the authorization request record, obtain the list of requested data objects and operations, locate the data container where the unified authorization module is located according to the type and name of the data packet, then call the authorization routine in the corresponding adapter plug-in, convert the authorization information into a series of function calls to the data container, and grant the corresponding access right to the user who applies for. And after the authorization operation is successful, the system informs the user who applies for the authorization operation.
The specific process of the system for processing the data resource authorization is as follows: (1) firstly, the system receives an authorization command sent by a user at an access end, analyzes the authorization command and extracts information such as data resources, authorization types, authorization objects, time limit and the like; (2) verifying the user identity of the access terminal, judging whether the user is the owner of the related data resources, if the user identity is not verified or the user is not the owner, returning failure information to the client and ending the processing, otherwise, continuing the next step; (3) and acquiring a data container where the related data resources are located by querying the metadata database, and calling an authorization routine in the adapter plug-in to complete a specific authorization action in the data container.
Each time the authorization of the data access right is attached with a time limit, and after the authorization time limit is exceeded, the corresponding authorization is recovered. The user can only initiate the application again if he wants to continue using, and waits for the owner's approval. In the concrete implementation, when the system is started, a background thread is created, the current effective authorization list is scanned at regular time, and when a certain authorization expires, the thread calls an authority recovery function in a corresponding adapter to recover the authority.

Claims (2)

1. A unified data resource management system of a big data platform of a unified meta-model is characterized in that:
the unified meta-model consists of the following four-layer structure: the first layer is a data container layer, the second layer is a data cladding layer, the third layer is a data resource layer, and the fourth layer is a data field layer; wherein the data container corresponds to an operational instance of a data management component, the data package is a collection of related data assets, the data assets are relatively independent data objects managed by a data container, and the data fields are data fields of relational data assets; physically, the data packet is stored and managed by the data container, the data container layer is transparent to ordinary users, the users cannot perceive the existence of the layer, and the system is responsible for storing and maintaining the mapping relation between the data packet and the data container;
the meta-model has a one-to-many mapping relationship from top to bottom between adjacent layers, namely each data container stores and manages one or more data packages, each data package is a set of one or more interrelated data assets, and each data asset is composed of 0 or more data fields; these mappings are stored and maintained bi-directionally by the system: according to a given certain data object, the data object of the previous layer can be obtained, and the list of the data objects of the next layer can also be obtained;
in addition to the mapping relationships between the hierarchies, the meta-model defines a plurality of attributes for the data object at each hierarchy, which includes: the basic description attribute, the access control attribute, the derivative relationship attribute and the user access information; wherein, the basic description attribute records the description information of related industries, related disciplines, keywords, owners and creation dates of the data objects; the access control attribute describes which users can access the data object in what manner; the derivative relationship attributes describe derivative relationships between the data objects; the user access information records when the data object is accessed by which users in which way;
the unified data resource management system includes: the system comprises a data packet loading module, a data resource discovery module, a user registration login module, a data access application and authorization module, a metadata management module, a data container management module and an adapter management module; wherein the content of the first and second substances,
the adapter management module is used for managing information related to the adapter; the module is responsible for receiving adapter plug-ins and corresponding data management component type names transmitted from an access end by a system administrator, storing the adapter plug-ins in a sub-directory of a system root directory, '/Adapters', and recording mapping between the adapter plug-ins and the type names through the metadata management module; when the system is started, the system is also responsible for loading all the existing adapter plug-ins in the system and establishing the mapping relation between the adapter plug-ins and the type names in the memory; when the system runs, the system provides mapping service of the type name to the adapter plug-in object for other modules;
the data container management module is used for managing information related to the data container; the module is responsible for receiving a data container name, a type name of a corresponding data management component and related configuration information input by a system administrator from an access end, and recording corresponding information through the metadata management module; when the system is started, it is responsible for establishing the mapping between the data container and the data management component type name in the memory; when the system runs, the system provides mapping service of the data container name to the management data component type name to other modules; when the system runs, the system is also responsible for monitoring and maintaining the online working state of each data container;
the user registration and login module is used for taking charge of registration and login work of common users; for the registration of a new user, the module receives an account name and a password registered by the user at an access end, creates a Kerberos Principal and a corresponding key for the account, and writes the Principal and the key into a keytab file which can only be accessed by a system service program; for the login of the user, the system receives the account name and the password input by the user at the access end, and judges whether the password is correct: if the password is wrong, returning the information of login failure; if the account name is correct, the Kerberos Principal and the key corresponding to the account name are used for passing Kerberos authentication once, and the Configuration object of the user is cached so as to avoid frequent repeated authentication;
the metadata management module is used for storing, maintaining and managing metadata of four types of objects of a data container, a data packet, data assets and data fields; and providing query interfaces to other functional modules;
the data package loading module is used for receiving a data package compressed file transmitted by a common user from an access end, selecting a data container of the type by using a load balancing mechanism according to a data management component type name specified by the user, and calling a data package loading function provided by the adapter plug-in of the type to physically store and manage the data package in the data container; after the data packet is successfully loaded into the selected data container, the module is also responsible for recording the description information of the data packet input by the user through the metadata management module;
the data resource discovery module is used for providing queries for users; the module receives an inquiry request input by a user at an access end, firstly verifies the identity of the user and verifies the validity of inquiry content, then retrieves a list of data packets hit by the inquiry from a metadata storage, filters the data packets according to the working state of a data container where the data resource packets are located, and finally returns the filtered list of data packets to the access end as an inquiry result;
the data access application and authorization module is used for receiving an access application of a common user to a data packet, acquiring an owner of the data packet through the metadata management module, and transmitting the application to the owner; if the owner approves the application, the data access application and authorization module authenticates the identity of the owner, and if the data access application and authorization module passes the authentication, the metadata management module is called to acquire the data container where the related data resources are located, and an authorization routine in the corresponding adapter plug-in is called to complete a specific authorization action in the data container.
2. A unified data resource management method based on the unified data resource management system of claim 1, characterized by the following specific steps:
step 1: a system administrator user adds different adapter plug-ins through an adapter management module;
step 2: a system administrator user adds or deletes data containers through a data container management module, each data container corresponds to one adapter plug-in to represent the type of the container, and the operation on the container is completed through routines in the adapter plug-ins; when the system is started, the data container management module automatically acquires the types and physical addresses of all the data containers and establishes connection with the data containers; in addition, the data container management module is also responsible for monitoring and maintaining the online working state of each data container, and the system completes the migration of data among different data containers according to the monitoring information and completes the task of load balancing among different data containers;
and step 3: a common user registers an account and a login system through a user registration login module; because the big data platform deployed by the system works in a safe mode and Kerberos is used for user identity authentication, when a user registers an account, the system creates Kerberos Principal and a corresponding key for each user, and writes the Principal and the key into a keytab file which can only be accessed by a system service program; when a user logs in the system, the system only passes Kerberos authentication once, and after the authentication is successful, a Configuration object of the user is cached, so that frequent repeated authentication is avoided;
and 4, step 4: after a registered common user logs in the system, a data packet is uploaded to a platform through a data packet loading module and is loaded and managed by the platform; the data packet uploading is completed in two steps: the first step is to pack the data packet into a compressed file to be uploaded through a file uploading page provided by the system; secondly, a user specifies the type of the data packet in the form, inputs basic description information of the data packet, and selects a file ID number of the previously uploaded data packet, so that a data packet object can be created on the platform; the system dynamically selects a container from a plurality of data containers of the type through a load balancing mechanism according to the type of a data packet object to manage the data packet, loads the data packet into the container through an adapter for management, adds metadata related to the data packet into a metadata base through a metadata management module, and records a physical address of the data packet;
step 5, after the registered common user logs in the system, searching for an interested data packet through a data resource discovery module; the user inquires through the access terminal, the data resource discovery module firstly verifies the user identity and the legality of the inquired content, retrieves a list of data packets hit by the inquiry from the metadata storage, filters according to the working state of a data container where the data resources are located, and returns the filtered list of data packets to the access terminal as an inquiry result; the user selects the interested data packet at the access end according to the query result so as to further check the details; the description metadata of the data package comprises a text description, an industry, a subject, keywords and a data package type, and the unified data resource discovery can find out the data package which is interested by the user in a description metadata base of the data package according to the query input by the user;
step 6, the user checks the detailed information of the data packet in the data resource discovery module, if the data packet needs to be used, a data access request is provided for a resource owner through the access end, the name of the data resource needing to be accessed, the type of the access operation needing to be performed and the time limit needing to be accessed are filled in through a form of the data access request, an access application is provided, and the access application is sent to an owner user of the data packet for approval; after the owner user of the data logs in the system, the corresponding access application can be checked, and the agreement or the rejection of the application is determined; if the application is approved, the corresponding authorization work is finished by a data access authorization module; the unified authorization module analyzes the authorization request record, obtains a list of requested data objects and operations, locates the data container where the unified authorization module is located according to the type and name of the data packet, calls an authorization routine in a corresponding adapter plug-in, converts authorization information into a series of function calls for the data container, and grants corresponding access rights to the user who applies for the data container; and after the authorization operation is successful, the system informs the user who applies for the authorization operation.
CN201610555871.9A 2016-07-15 2016-07-15 Unified data resource management system and method for big data platform Active CN106202452B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610555871.9A CN106202452B (en) 2016-07-15 2016-07-15 Unified data resource management system and method for big data platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610555871.9A CN106202452B (en) 2016-07-15 2016-07-15 Unified data resource management system and method for big data platform

Publications (2)

Publication Number Publication Date
CN106202452A CN106202452A (en) 2016-12-07
CN106202452B true CN106202452B (en) 2020-05-26

Family

ID=57474356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610555871.9A Active CN106202452B (en) 2016-07-15 2016-07-15 Unified data resource management system and method for big data platform

Country Status (1)

Country Link
CN (1) CN106202452B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131289A (en) * 2020-08-17 2020-12-25 武汉旷视金智科技有限公司 Data processing method and device, electronic equipment and storage medium

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106533967B (en) * 2016-12-08 2019-04-12 北京中安智达科技有限公司 A kind of data transmission method can customize load balancing
CN108243145B (en) * 2016-12-23 2019-04-26 中科星图股份有限公司 A kind of multi-source identity identifying method
CN106940867B (en) * 2017-02-24 2020-11-20 深圳国泰安教育技术有限公司 Financial transaction method and device
CN108809900B (en) * 2017-05-02 2021-09-07 武汉斗鱼网络科技有限公司 Framework and method for unified resource access
DE102017109703B3 (en) * 2017-05-05 2018-06-28 Technische Universität Braunschweig Method for coordinating access to a resource of a distributed computer system, computer system and computer program
US20190005066A1 (en) * 2017-06-29 2019-01-03 International Business Machines Corporation Multi-tenant data service in distributed file systems for big data analysis
CN108268798B (en) * 2017-06-30 2023-09-05 勤智数码科技股份有限公司 Data item authority allocation method and system
CN107798457B (en) * 2017-07-24 2021-08-03 深圳壹账通智能科技有限公司 Investment portfolio scheme recommending method, device, computer equipment and storage medium
CN107832440B (en) * 2017-11-17 2020-10-13 北京锐安科技有限公司 Data mining method, device, server and computer readable storage medium
CN108037919A (en) * 2017-12-01 2018-05-15 北京博宇通达科技有限公司 A kind of visualization big data workflow configuration method and system based on WEB
CN108052618B (en) * 2017-12-15 2020-06-30 北京搜狐新媒体信息技术有限公司 Data management method and device
CN109981698B (en) * 2017-12-27 2022-03-04 博元森禾信息科技(北京)有限公司 Metadata-based data networking cross-domain data access standardization system and method
CN108874971B (en) * 2018-06-07 2021-09-24 北京赛思信安技术股份有限公司 Tool and method applied to mass tagged entity data storage
CN110188887B (en) * 2018-09-26 2022-11-08 第四范式(北京)技术有限公司 Data management method and device for machine learning
CN109150908A (en) * 2018-10-08 2019-01-04 四川大学 A kind of big data platform protective device and its guard method being deployed in gateway
CN109522090B (en) * 2018-11-09 2020-12-22 中国联合网络通信集团有限公司 Resource scheduling method and device
CN109614167B (en) * 2018-12-07 2023-10-20 杭州数澜科技有限公司 Method and system for managing plug-ins
CN110223185A (en) * 2019-05-20 2019-09-10 中国平安财产保险股份有限公司 A kind of information benefit transmission method and relevant device based on data processing
CN110750218A (en) * 2019-10-18 2020-02-04 北京浪潮数据技术有限公司 Storage resource management method, device, equipment and readable storage medium
CN111143449B (en) * 2019-12-12 2023-05-30 北京中电普华信息技术有限公司 Data service method and device based on unified data model
CN112395340B (en) * 2020-11-16 2023-07-28 青岛海信网络科技股份有限公司 Data asset management method and device
CN112685425B (en) * 2021-01-08 2022-06-17 东云睿连(武汉)计算技术有限公司 Data asset meta-information processing system and method
CN112966036B (en) * 2021-03-10 2023-02-21 浪潮云信息技术股份公司 Method for constructing main data service based on logic model
CN115510121B (en) * 2022-10-08 2024-01-05 上海数禾信息科技有限公司 List data management method, device, equipment and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724291A (en) * 2012-05-23 2012-10-10 北京经纬恒润科技有限公司 Vehicle network data acquisition method, unit and system
CN104657214A (en) * 2015-03-13 2015-05-27 华存数据信息技术有限公司 Multi-queue multi-priority big data task management system and method for achieving big data task management by utilizing system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724291A (en) * 2012-05-23 2012-10-10 北京经纬恒润科技有限公司 Vehicle network data acquisition method, unit and system
CN104657214A (en) * 2015-03-13 2015-05-27 华存数据信息技术有限公司 Multi-queue multi-priority big data task management system and method for achieving big data task management by utilizing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向海量数据的在线流数据服务框架;杨臻等;《计算机应用与软件》;20160331;第33卷(第3期);第56-61页 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131289A (en) * 2020-08-17 2020-12-25 武汉旷视金智科技有限公司 Data processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN106202452A (en) 2016-12-07

Similar Documents

Publication Publication Date Title
CN106202452B (en) Unified data resource management system and method for big data platform
US11550763B2 (en) Versioning schemas for hierarchical data structures
JP7090606B2 (en) Formation and operation of test data in a database system
US11574070B2 (en) Application specific schema extensions for a hierarchical data structure
JP4726545B2 (en) Method, system and apparatus for discovering and connecting data sources
RU2446456C2 (en) Integration of corporate search engines with access control application programming special interfaces
US9223817B2 (en) Virtual repository management
KR100959473B1 (en) Systems and methods for interfacing application programs with an item-based storage platform
US10089371B2 (en) Extensible extract, transform and load (ETL) framework
KR101024730B1 (en) Systems and methods for data modeling in an item-based storage platform
US9189507B2 (en) System and method for supporting agile development in an enterprise crawl and search framework environment
JP5589205B2 (en) Computer system and data management method
US20120246115A1 (en) Folder structure and authorization mirroring from enterprise resource planning systems to document management systems
US8321487B1 (en) Recovery of directory information
US20050234934A1 (en) System and method for controlling the release of updates to a database configuration
KR20120106544A (en) Method for accessing files of a file system according to metadata and device implementing the method
KR20140041499A (en) Brokered item access for isolated applications
EP2884408B1 (en) Content management systems for content items and methods of operating content management systems
US11609758B2 (en) Drug research and development software repository and software package management system
WO2010091607A1 (en) Method for providing custom access control mode in file system
Francia MongoDB and PHP: Document-Oriented Data for Web Developers
US11010361B1 (en) Executing code associated with objects in a hierarchial data structure
US9767313B2 (en) Method for automated separation and partitioning of data in a payroll and resource planning system
CN116737113B (en) Metadata catalog management system and method for mass scientific data
Garbus et al. Administrator's guide to Sybase ASE 15

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant