CN116821921A - Method, device, system and storage medium for accessing file - Google Patents

Method, device, system and storage medium for accessing file Download PDF

Info

Publication number
CN116821921A
CN116821921A CN202210511098.1A CN202210511098A CN116821921A CN 116821921 A CN116821921 A CN 116821921A CN 202210511098 A CN202210511098 A CN 202210511098A CN 116821921 A CN116821921 A CN 116821921A
Authority
CN
China
Prior art keywords
file
information
access
granularity
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210511098.1A
Other languages
Chinese (zh)
Inventor
黄爽
许田立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to PCT/CN2023/070167 priority Critical patent/WO2023173908A1/en
Publication of CN116821921A publication Critical patent/CN116821921A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures

Abstract

The application discloses a method, a device, a system and a storage medium for accessing files, and belongs to the field of computers. The method comprises the following steps: receiving a data access request, wherein the data access request comprises access requirement information, and the access requirement information is used for indicating the content in a first file which a first user needs to access, and the first file is stored in an object file storage system; when determining that the granularity of accessing the content in the first file is a first granularity based on the access requirement information, accessing the first file based on the account information of the first user and the access requirement information; and when the granularity of accessing the content in the first file is determined to be a second granularity based on the access requirement information, accessing the first file based on the specified administrator account information and the access requirement information, wherein the second granularity is smaller than the first granularity. The application can enrich the access service provided for the user.

Description

Method, device, system and storage medium for accessing file
The present application claims priority from chinese patent application No. 202210264898.8 entitled "an efficient fine grained data lake authority management scheme" filed on 3 months 17 of 2022, the entire contents of which are incorporated herein by reference.
Technical Field
The present application relates to the field of computers, and in particular, to a method, an apparatus, a system, and a storage medium for accessing files.
Background
For the computation of separate data lakes, the data lakes include a structured query language (structured query language, SQL) engine and an object file storage system. The object file storage system includes at least one file, each file for storing data. The SQL engine receives an SQL statement from the user, the SQL statement for indicating to the SQL engine a file that the user needs to access, the SQL engine accessing the file in the object file storage system based on the SQL statement.
Typically, the files in an object file storage system are structured data files that store data in the form of a list. For example, the object file storage system has a file for storing an employee data table, the file including four columns, a first column for storing employee names, a second column for storing employee addresses, a third column for storing departments of employees, and a fourth column for storing job positions of employees. Each row of the file is used to store the employee's name, address, department, and job title.
Currently users can access an entire file located in an object file storage system using an SQL engine, that is, the SQL engine provides the user with access granularity of an entire file. The SQL engine provides access services with file granularity to users, and the access services provided for users are too single.
Disclosure of Invention
The application provides a method, a device, a system and a storage medium for accessing files, which are used for enriching access services provided for users. The technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a method for accessing a file, in which a data access request is received, where the data access request includes access requirement information, where the access requirement information is used to indicate content in a first file that needs to be accessed by a first user, and the first file is stored in an object file storage system. When the granularity of the content in the first file is determined to be the first granularity based on the access requirement information, the first file is accessed based on the account information of the first user and the access requirement information. And when the granularity of accessing the content in the first file is determined to be the second granularity based on the access requirement information, accessing the first file based on the designated administrator account information and the access requirement information, wherein the second granularity is smaller than the first granularity.
And determining the granularity of accessing the content in the first file based on the access requirement information, and accessing the first file based on the account information of the first user and the access requirement information when the determined granularity is the first granularity. And accessing the first file based on the specified administrator account information and the access requirement information when the determined granularity is the second granularity. Thus, the access service with the first granularity can be provided for the user, and the access service with the second granularity can be provided for the user, so that the access service provided for the user is enriched.
Because the first granularity is larger than the second granularity, when the determined granularity is the first granularity, the first file is accessed based on the account information of the first user and the access requirement information, so that the first file is accessed without borrowing the account information of an administrator, and the efficiency of accessing the first file and the performance of reading and writing the first file are improved.
If the first user is configured with the rights capable of accessing the content with the second granularity, the first user can access other contents except the content in the first file, and the rights configured for the first user are automatically expanded to access any content in the first file, so that the rights are expanded too much, and the rights management is not facilitated. However, in the application, when the determined granularity is the second granularity, the first file is accessed based on the designated administrator account information and the access requirement information, so that the account information of the first user is replaced by the administrator account information, and the first file is accessed by using the administrator account information, so that the first user does not need to be configured with the permission capable of accessing the second granularity, thereby avoiding the expansion of the access permission of the first user and facilitating the permission management.
In one possible implementation, the access requirement information includes identification information of a first file, and the first granularity is a file granularity; or the access requirement information comprises identification information of the first file and identification information of a partition in the first file, and the first granularity is partition granularity. The first granularity may be file granularity or partition granularity, so that access service with the file granularity may be provided to the user, or access service with the partition granularity may be provided, which enriches the provided access service. In addition, partition granularity is a newly defined granularity, that is, the present application is also able to provide partition access services.
In one possible implementation, file granularity refers to the need to access the entire content of the first file. Partition granularity refers to the need to access the entire content in one partition in a first file.
In another possible implementation manner, an authentication request is sent to a file path authentication module, where the authentication request includes authentication information, where the authentication information is used to indicate a first user, a file path of the first file, and an access operation of the first user to access the file path, the authentication information is obtained based on the access requirement information and account information of the first user, the authentication request is used to trigger the file path authentication module to authenticate, based on first authority information and the authentication information, an authority of the first user to access the file path using the access operation, where the file path is used to indicate a storage location of the first file, and the first authority information is used to indicate a user identity and an access operation that can access the file path. And receiving an authentication response sent by the file path authentication module after the authority authentication passes, wherein the authentication response comprises a temporary certificate, and the temporary certificate, the file path and the operation type of the access operation are correspondingly stored in an object file storage system. Based on the temporary credential, the access requirement information, and the file path, a first file is accessed.
And the temporary certificate is used for accessing the first file in the object file storage system after the file path authentication module receives the temporary certificate sent by the file path authentication module after the authority authentication is passed, so that the security of accessing the first file is improved.
In another possible implementation, the first file is a structured data file, the first file stores data in a list form, the access requirement information includes identification information of the first file and first information, the first information is used for indicating at least one column of the first file and/or at least one row of the first file, and the second granularity is a row-column granularity. Or the first file is a semi-structured data file, the first file comprises at least one data segment, the data segment is used for storing data with the same service attribute, the access requirement information comprises identification information of the first file and identification information of one or more data segments in the first file, and the second granularity is the data segment granularity. The second granularity may be a rank granularity or a data fragment granularity, so that access services with rank granularity may be provided to the user, or access services with data fragment granularity may be provided, which enriches the access services provided.
In another possible implementation, an access instruction is sent to the data filtering engine, the access instruction including the access requirement information, the data filtering engine including administrator account information, the access instruction being for triggering the data filtering engine to access the first file based on the administrator account information and the access requirement information.
Because the data filtering engine comprises the administrator account information, an access instruction is sent to the data filtering engine, and the data filtering engine accesses the first file based on the appointed administrator account information and the access requirement information. Therefore, the account information of the first user is replaced by the account information of the manager, the first file is accessed by using the account information of the manager, and the permission capable of accessing the second granularity is not required to be configured for the first user, so that the access permission of the first user is prevented from being enlarged, and the permission management is convenient.
In another possible implementation, the right of the first user to access the content is authenticated based on second right information, account information of the first user, and the access requirement information, where the second right information is used to indicate a user identity and an access operation capable of accessing the content. After the right to access the content for the first user passes, determining a granularity of accessing the content in the first file based on the access requirement information.
After the authority authentication of the first user for accessing the content passes, the granularity of accessing the content in the first file is determined, and then the first file is accessed in different modes based on different granularities, so that the security of accessing the first file is improved.
In another possible implementation, the first permission information is generated based on the second permission information, the first permission information being used to indicate a user identity and an access operation of a file path capable of accessing the first file, the file path being used to indicate a storage location of the first file. Therefore, the first authority information can be automatically generated, the efficiency of obtaining the first authority information is improved, and the cost of obtaining the first authority information is reduced.
In a second aspect, the present application provides an access system comprising: a computing engine and an object file storage system.
A computing engine for receiving a data access request, the data access request including access requirement information indicating content in a first file that a first user needs to access, the first file stored in an object file storage system.
The computing engine is further configured to access the first file based on the account information of the first user and the access requirement information when it is determined that the granularity of accessing the content in the first file is the first granularity based on the access requirement information.
The computing engine is further configured to access the first file based on the specified administrator account information and the access requirement information when it is determined that the granularity of accessing the content in the first file is a second granularity, where the second granularity is smaller than the first granularity.
The computing engine determines granularity of accessing the content in the first file based on the access requirement information, and accesses the first file based on the account information of the first user and the access requirement information when the determined granularity is the first granularity. And accessing the first file based on the specified administrator account information and the access requirement information when the determined granularity is the second granularity. Thus, the access service with the first granularity can be provided for the user, and the access service with the second granularity can be provided for the user, so that the access service provided for the user is enriched.
Because the first granularity is larger than the second granularity, when the determined granularity is the first granularity, the computing engine accesses the first file based on the account information of the first user and the access requirement information, so that the first file is accessed without borrowing the account information of an administrator, and the efficiency of accessing the first file and the performance of reading and writing the first file are improved.
If the first user is configured with the rights capable of accessing the content with the second granularity, the first user can access other contents except the content in the first file, and the rights configured for the first user are automatically expanded to access any content in the first file, so that the rights are expanded too much, and the rights management is not facilitated. However, in the application, when the determined granularity is the second granularity, the computing engine accesses the first file based on the specified administrator account information and the access requirement information, so that the account information of the first user is replaced by the administrator account information, and the first file is accessed by using the administrator account information, so that the permission capable of accessing the second granularity does not need to be configured for the first user, thereby avoiding the expansion of the access permission of the first user and facilitating the permission management.
In one possible implementation, the access requirement information includes identification information of a first file, and the first granularity is a file granularity; or the access requirement information comprises identification information of the first file and identification information of a partition in the first file, and the first granularity is partition granularity. The first granularity may be file granularity or partition granularity, so that access service with the file granularity may be provided to the user, or access service with the partition granularity may be provided, which enriches the provided access service. In addition, partition granularity is a newly defined granularity, that is, the present application is also able to provide partition access services.
In one possible implementation, file granularity refers to the need to access the entire content of the first file. Partition granularity refers to the need to access the entire content in one partition in a first file.
In another possible implementation, the system further includes a file path authentication module,
the computing engine is used for sending an authentication request to the file path authentication module, the authentication request comprises authentication information, the authentication information is used for indicating a first user, a file path of the first file and access operation of the first user for accessing the file path, the authentication information is obtained based on the access requirement information and account information of the first user, and the file path is used for indicating a storage position of the first file.
And the file path authentication module is used for authenticating the authority of the first user for accessing the file path by adopting the access operation based on the first authority information and the authentication information, wherein the first authority information is used for indicating the identity of the user capable of accessing the file path and the access operation, and an authentication response is sent to the computing engine after the authority authentication is passed, and the authentication response comprises a temporary certificate.
And the object file storage system is used for correspondingly storing the temporary certificate, the file path and the operation type of the access operation.
The computing engine is further configured to access the first file based on the temporary credential, the access requirement information, and the file path.
The computing engine receives the temporary certificate sent by the file path authentication module after passing the authority authentication, and uses the temporary certificate to access the first file in the object file storage system, so that the security of accessing the first file is improved.
In another possible implementation, the first file is a structured data file, the first file stores data in a list form, the access requirement information includes identification information of the first file and first information, the first information is used for indicating at least one column of the first file and/or at least one row of the first file, and the second granularity is a row-column granularity. Or the first file is a semi-structured data file, the first file comprises at least one data segment, the data segment is used for storing data with the same service attribute, the access requirement information comprises identification information of the first file and identification information of one or more data segments in the first file, and the second granularity is the data segment granularity. The second granularity may be a rank granularity or a data fragment granularity, so that access services with rank granularity may be provided to the user, or access services with data fragment granularity may be provided, which enriches the access services provided.
In another possible implementation, the system further includes a data filtering engine that includes administrator account information.
The computing engine is used for sending an access instruction to the data filtering engine, wherein the access instruction comprises a file path of the first file and the access requirement information, and the file path is used for indicating the storage position of the first file.
The data filtering engine is used for accessing the first file based on the administrator account information, the file path and the access requirement information.
Because the data filtering engine includes administrator account information, the computing engine sends an access instruction to the data filtering engine, which accesses the first file based on the specified administrator account information and the access requirement information. Therefore, the account information of the first user is replaced by the account information of the manager, the first file is accessed by using the account information of the manager, and the permission capable of accessing the second granularity is not required to be configured for the first user, so that the access permission of the first user is prevented from being enlarged, and the permission management is convenient.
In another possible implementation, the computing engine is further configured to authenticate a right of the first user to access the content based on second right information, account information of the first user, and the access requirement information, where the second right information is used to indicate a user identity and an access operation capable of accessing the content. After the right to access the content for the first user passes, determining a granularity of accessing the content in the first file based on the access requirement information. After the authority authentication of the first user for accessing the content is passed, the computing engine determines the granularity of accessing the content in the first file, and then accesses the first file in different modes based on different granularities, so that the security of accessing the first file is improved.
In another possible implementation, the system further includes a linked authority module,
and the linkage authority module is used for generating first authority information based on the second authority information, wherein the first authority information is used for indicating the user identity and the access operation of a file path capable of accessing the first file, and the file path is used for indicating the storage position of the first file. Therefore, the first authority information can be automatically generated, the efficiency of obtaining the first authority information is improved, and the cost of obtaining the first authority information is reduced.
In a third aspect, the present application provides an apparatus for accessing a file, for performing the method of the first aspect or any one of the possible implementation manners of the first aspect. In particular, the apparatus comprises means for performing the method of the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, the present application provides an apparatus for accessing a file, the apparatus comprising a processor and a memory. The processor and the memory can be connected through internal connection. The memory is for storing a program and the processor is for executing the program in the memory to cause the apparatus to perform the method of the first aspect or any possible implementation of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising a computer program stored in a computer readable storage medium and loaded by a processor to implement the method of the first aspect or any possible implementation of the first aspect.
In a sixth aspect, the present application provides a computer readable storage medium storing a computer program to be loaded by a processor for performing the method of the first aspect or any possible implementation of the first aspect.
In a seventh aspect, the present application provides a chip comprising a memory for storing computer instructions and a processor for calling and running the computer instructions from the memory to perform the method of the first aspect or any possible implementation of the first aspect.
Drawings
FIG. 1 is a schematic diagram of an access system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a document according to an embodiment of the present application;
FIG. 3 is a schematic diagram of another access system according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for accessing a file according to an embodiment of the present application;
FIG. 5 is a schematic diagram of another access system according to an embodiment of the present application;
FIG. 6 is a schematic diagram of another access system according to an embodiment of the present application;
FIG. 7 is a flowchart of a method for obtaining first rights information according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a device for accessing files according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of another device for accessing files according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Referring to FIG. 1, an embodiment of the present application provides an access system 100, the access system 100 comprising a compute engine 101 and an object file storage system 102, the compute engine 101 in communication with the object file storage system 102.
In some embodiments, the access system 100 is a separate database system in which the object file storage system 102 is responsible for data storage and the compute engine 101 is responsible for data computation.
In some embodiments, the access system 100 is applied to the field of computing separate data lakes and the like, as well as to the field of big data processing.
Wherein the object file storage system 102 is configured to store at least one file, and for any file stored in the object file storage system 102, the file is configured to store data.
In some embodiments, the file may be a structured data file that stores data in a list format, so the structured data file is a data table. For any column in the file, the column is used to hold data with the same business attributes.
For a structured data file, the file is essentially a data table, and the identification information of the file is the identification information of the data table. For example, the identification information of the file is the file name of the file, that is, the identification information of the file is the table name of the data table.
For example, see the file shown in table 1 below as a structured data file, the file is a data table comprising five columns of data for storing company information, the table name of the data table is "Company information", the file name of the file is also "Company information", and the file is a company information table, the table name and the file name are the same.
Referring to table 1 below, each column of the file is used to hold data with the same business attributes. As shown in table 1 below, the data stored in the first column is a line number, that is, each service attribute of the data stored in the first column is a line number. The data stored in the second column are all company names, that is, each data stored in the second column has a business attribute which is a company name. The data stored in the third column is an industry name, that is, each service attribute of the data stored in the third column is an industry name. The data stored in the fourth column are all cities, namely, the business attribute of each data stored in the fourth column is the name of the city. The data stored in the fifth column are all countries, that is, each data stored in the fifth column has a service attribute that is a country name.
Table 1: company information (company information)
In some embodiments, the file is a semi-structured data file that includes at least one data segment for holding data having the same business attributes for any one of the data segments in the file.
For example, referring to the semi-structured data file shown in fig. 2, the semi-structured data file includes four data segments, namely a first data segment, a second data segment, a third data segment, and a fourth data segment. The data stored in the first data segment is a company name, for example, the data stored in the first data segment includes "company 1", "company 2", "company 3", "company 4", "company 5" and "company 6", that is, each data stored in the first data segment has a business attribute that is a company name. The data stored in the second data segment is an industry name, for example, the data stored in the second data segment includes "internet", "communication", "logistics", "communication" and "logistics", that is, each data stored in the second data segment has a business attribute that is the industry name. The data stored in the third data segment are all cities, for example, the data stored in the third data segment includes "city 1", "city 2", "city 1" and "city 3", that is, each service attribute of the data stored in the third data segment is a city name. The data stored in the fourth data segment is a country, for example, the data stored in the fourth data segment includes "country 1", "country 2", "country 1", and "country 3", that is, each data stored in the fourth data segment has a service attribute that is a country name.
In some embodiments, the semi-structured data file is an extensible markup language (extensible markup language, XML) file or the like, in which the tag blocks are data fragments.
In some embodiments, the file may also include at least one partition.
In some embodiments, the file is stored in the object file storage system 102 and the file path of the file is used to indicate the storage location of the file in the object file storage system 102. For example, assume that the file path for a file as shown in Table 1 is "C: \windows\systems 32\ Company information," which is used to indicate the storage location of the file in the object file storage system 102 as shown in Table 1.
Referring to fig. 1, a first user has a need to access content in a file stored in an object file storage system 102. The granularity of the first user accessing the content in the file may be a first granularity or may be a second granularity, the second granularity being smaller than the first granularity.
In some embodiments, the first granularity may be a file granularity, i.e., the first user needs to access the entire content of the file; alternatively, the first granularity may be a partition granularity, i.e., a partition that the first user needs to access the file. File granularity refers to the total content that needs to be accessed for the first file. Partition granularity refers to the need to access the entire content in one partition in a first file.
In some embodiments, the file is a structured data file and the second granularity is a rank granularity, i.e., at least one row and/or at least one column of the file that the first user needs to access. Alternatively, the file is a semi-structured data file and the second granularity is the data fragment granularity, i.e., the first user needs to access the data fragments in the file.
The access services provided to the user can be enriched by providing file-granularity access services, partition-granularity access services, row-column-granularity access services, or data-fragment-granularity access services to the user.
The first user is a user with data access requirements, which may also be referred to as a service user. Alternatively, the first user may be an application program or the like.
When a first user needs to access data, the first user sends a data access request to the computing engine 101, where the data access request includes access requirement information, where the access requirement information is used to indicate content in a first file that the first user needs to access, and the first file is a file stored in the object file storage system 102.
The computing engine 101 is configured to receive the data access request, determine a granularity of accessing the content in the first file based on the access requirement information, and access the first file based on the account information of the first user and the access requirement information when the determined granularity is the first granularity. And accessing the first file based on the specified administrator account information and the access requirement information when the determined granularity is a second granularity, wherein the second granularity is smaller than the first granularity.
In some embodiments, the compute engine 101 includes an interface that the first user can invoke, through which the data access request is sent to the compute engine 101. Optionally, the interface includes a Java database connection (Java database connectivity, JDBC) interface or an open database connection (open database connectivity, ODBC) interface, or the like.
In some embodiments, the first file is a structured data file, the access requirement information includes identification information of the first file, and the access requirement information does not include identification information of partitions in the first file and first information indicating at least one column and/or at least one row in the first file. The granularity determined by the calculation engine 101 based on the access requirement information is the file granularity.
In some embodiments, the first file is a semi-structured data file, the access requirement information includes identification information of the first file, and the access requirement information does not include identification information of the partitions in the first file and identification information of the data segments in the first file. The granularity determined by the calculation engine 101 based on the access requirement information is the file granularity.
In some embodiments, the first file is a structured data file, the access requirement information includes identification information of the first file and identification information of partitions in the first file, and the access requirement information does not include the first information. The granularity determined by the compute engine 101 based on the access requirement information is partition granularity.
In some embodiments, the first file is a semi-structured data file, the access requirement information includes identification information of the first file and identification information of partitions in the first file, and the access requirement information does not include identification information of data segments in the first file. The granularity determined by the compute engine 101 based on the access requirement information is partition granularity.
In some embodiments, the first file is a structured data file, and the access requirement information includes identification information of the first file and first information for indicating at least one column and/or at least one row in the first file. The granularity determined by the calculation engine 101 based on the access requirement information is a rank granularity.
In some embodiments, the first file is a semi-structured data file, and the access requirement information includes identification information of the first file and identification information of the data segments in the first file. The granularity determined by the compute engine 101 based on the access requirement information is the data fragment granularity.
In some embodiments, referring to fig. 1 and 3, the computing engine 101 includes a computing module 1011 and a routing module 1012, the computing module 1011 in the computing engine 101 receives the data access request, and the routing module 1012 in the computing engine 101 determines the granularity of accessing the content in the first file based on the access requirement information.
In some embodiments, the access requirement information further includes a first operation type, the first operation type being used to indicate a first access operation of the first user to access the first file. Optionally, the first access operation includes querying, updating, inserting or deleting, etc.
In some embodiments, the operation of the compute engine 101 to access the first file may be: querying the content in the first file, and returning the queried content to the first user. Alternatively, the access requirement information includes content to be updated, and the operation of the computing engine 101 to access the first file may be: and updating all or part of the content in the first file to be updated. Alternatively, the access requirement information includes content to be inserted, and the operation of the computing engine 101 to access the first file may be: to insert the content to be inserted into the first file. Alternatively, the operation of the compute engine 101 to access the first file may be: delete all or part of the content in the first file, etc.
In some embodiments, the data access request further includes account information of the first user. Alternatively, the data access request may not include account information of the first user. The communication connection between the calculation engine 101 and the first user is bound with the account information of the first user, and the calculation engine 101 acquires the account information of the first user bound with the communication connection. Optionally, the communication connection is a session between the first user and the compute engine 101.
In some embodiments, the access system 100 includes one or more compute engines 101. Alternatively, the computing engine 101 is a Hive engine or Spark engine, etc., i.e., the access system 100 includes one or more Hive engines, and/or one or more Spark engines, etc.
In some embodiments, the Hive engine is a data warehouse tool based on Hadoop (which is a distributed system infrastructure) that can map structured data files into a table and provide query functionality.
In some embodiments, spark engines are fast general-purpose computing engines designed for large-scale data processing.
In some embodiments, referring to fig. 3, the access system 100 further includes a file path authentication module 103, the file path authentication module 103 in communication with the computing engine 101 and the object file storage system 102, respectively.
The computing engine 101 is configured to send an authentication request to the file path authentication module 103 when the determined granularity is the first granularity, where the authentication request includes authentication information, where the authentication information is used to indicate the first user, a file path of the first file, and a second access operation of the first user to access the file path, where the authentication information is obtained based on the access requirement information and account information of the first user;
A file path authentication module 103, configured to receive the authentication request, authenticate a first user with a right to access the file path using a second access operation based on first right information and the authentication information, where the first right information is used to indicate a user identity capable of accessing the file path and a third access operation capable of accessing the file path, send an authentication response to the computing engine 101 after passing the right authentication, the authentication response includes a temporary credential, and send storage information to the object file storage system 102, the storage information includes the temporary credential, the file path, and a second operation type, and the second operation type is an operation type of the second access operation;
an object file storage system 102, configured to receive the storage information, and store the temporary credential, the file path, and the second operation type correspondingly;
the computing engine 101 is further configured to access the first file based on the temporary credential, the access requirement information, and the file path.
The second access operation is an operation mapped to the first access operation, and the second access operation is an operation capable of accessing the object file storage system 102. Typically the second access operation comprises a read operation and/or a write operation.
For example, the first access operation is a query, and the second access operation mapped to the query operation is a read operation. Assuming that the content in the first file needs to be queried, the first file is read from the object file storage system 102, and the content needing to be queried is obtained from the read first file.
For another example, the first access operation is an update, and the second access operation mapped to the update operation includes a read operation and a write operation. Assuming that a part of the content in the first file needs to be updated to be the content to be updated, the first file is read from the object file storage system 102, the part of the content in the first file is updated to be the content to be updated, and the updated first file is written into the object file storage system 102 to cover the first file already stored in the object file storage system 102.
In some embodiments, the authentication information includes a user identity of the first user, a file path of the first file, and a second operation type. The user identity of the first user is obtained by the computing engine 101 based on account information of the first user, the second operation type is obtained by mapping the first operation type, and the file path of the first file is obtained by the computing engine 101 based on identification information of the first file.
In some embodiments, the authentication information includes account information of the first user, identification information of the first file, and the first operation type.
In some embodiments, the user identity of the first user includes a user group to which the first user belongs and/or a role of the first user, and the like.
In some embodiments, the first permission information includes the file path, a user identity capable of accessing the file path, and a third type of operation, the third type of operation being a type of third access operation capable of accessing the file path.
In some embodiments, referring to fig. 3, the access system 100 further includes a linking rights module 104, the linking rights module 104 in communication with the compute engine 101 and the file path authentication module 103, respectively. The linking authority module 104 stores the first authority information.
After the file path authentication module 103 receives the authentication request, the file path of the first file, the user identity of the first user and the second operation type of the second access operation are obtained based on the authentication information included in the authentication request, and the first authority information including the file path is obtained from the linkage authority module 104. If the user identity of the first user is the same as the user identity included in the first authority information and the second operation type of the second access operation is the same as the third operation type of the third access operation included in the first authority information, the authority authentication is passed, and the first user is authorized to access the file path by adopting the second access operation.
In some embodiments, the linked authority module 104 includes a first read-write interface, and the file path authentication module 103 invokes the first read-write interface of the linked authority module 104, and obtains the first authority information including the file path from the linked authority module 104 through the first read-write interface.
In some embodiments, the authentication information includes a user identity of the first user, a file path of the first file, and a second operation type, and the file path authentication module 103 directly obtains the file path of the first file, the user identity of the first user, and the second operation type of the second access operation from the authentication information.
In some embodiments, the authentication information includes account information of the first user, identification information of the first file, and a first operation type, and the file path authentication module 103 obtains a user identity of the first user based on the account information of the first user, maps the first operation type to obtain a second operation type, and obtains a file path of the first file based on the identification information of the first file.
In some embodiments, referring to fig. 3, the access system 100 further includes a data filtering engine 105, the data filtering engine 105 including specified administrator account information; the data filtering engine 105 communicates with the computing engine 101 and the object file storage system 102, respectively.
A computing engine 101 configured to send an access instruction to the data filtering engine 105 when the determined granularity is the second granularity, the access instruction including a file path of the first file and the access requirement information;
the data filtering engine 105 is configured to access the first file based on the administrator account information, the file path, and the access requirement information.
Optionally, the data filtering engine 105 is also in communication with the linked authority module 104.
Referring to fig. 3, in some embodiments, the computing module 1011 of the computing engine 101, upon receiving the data access request, authenticates the right of the first user to access the content based on the second rights information, the account information of the first user, and the access requirement information. The second rights information is used to indicate a user identity and a fourth access operation that are capable of accessing the content.
After authentication passes, if the granularity determined by the routing module 1012 of the compute engine 101 is the first granularity, the routing module 1012 of the compute engine 101 sends an authentication request to the file path authentication module 103. If the granularity determined by the routing module 1012 of the compute engine 101 is the second granularity, the routing module 1012 of the compute engine 101 sends access instructions to the data filtering engine 105.
In some embodiments, the second rights information includes content identification information of the content, a user identity capable of accessing the content, and a fourth operation type of a fourth access operation capable of accessing the content.
In some embodiments, the linked authority module 104 stores the second authority information, and the linked authority module 104 includes a second read-write interface. After receiving the data access request, the computing module 1011 of the computing engine 101 obtains content identification information of the content based on the access requirement information, and obtains the user identity of the first user based on the account information of the first user. And calling a second read-write interface in the linkage permission module 104, and acquiring second permission information comprising the content identification information from the linkage permission module 104 through the second read-write interface. If the user identity of the first user is the same as the user identity included in the second authority information and the first operation type of the first access operation is the same as the fourth operation type of the fourth access operation included in the second authority information, the authority authentication of the first user for accessing the content passes, and the first user has authority to access the content.
The content identification information of the content is part of the content in the access requirement information.
In some embodiments, when the content is the entire content of the first file, the content identification information of the content includes identification information of the first file. Or when the content is a partition of the first file, the content identification information of the content includes identification information of the first file and identification information of the partition. Or the content is at least one column or at least one row in the first file, the content identification information of the content comprises the identification information of the first file and the column identification of the at least one column, or the content identification information of the content comprises the identification information of the first file and the row number of the at least one row. Or, the content is at least one data segment in the first file, and the content identification information of the content includes identification information of the first file and identification information of each data segment in the at least one data segment.
In some embodiments, for an authentication operation that authenticates the right to access the content of the first user, and for a determination operation that determines the granularity of access to the content in the first file, the computing module 1011 of the computing engine 101 may perform the authentication operation first after receiving the data access request, and then the routing module 102 of the computing engine 101 may perform the determination operation again, i.e., the computing module 1011 of the computing engine 101 may authenticate the right to access the content of the first user first. After authentication passes, the routing module 1012 of the compute engine 101 determines the granularity of access to the content in the first file based on the access requirement information. Or alternatively, the process may be performed,
After receiving the data access request, the computing module 1011 of the computing engine 101 may perform the determining operation first by the routing module 1012 of the computing engine 101, and then perform the authenticating operation by the computing module 1011 of the computing engine 101. That is, the routing module 1012 of the computing engine 101 may first determine the granularity of access to the content in the first file based on the access requirement information, and then the computing module 1011 of the computing engine 101 authenticates the right of the first user to access the content. Or alternatively, the process may be performed,
after receiving the data access request, the computing module 1011 of the computing engine 101 performs the authentication operation, while the routing module 1012 of the computing engine 101 performs the determination operation, i.e., the authentication operation and the determination operation are performed simultaneously.
In some embodiments, referring to fig. 3, the access system 100 further includes an authentication center 106, where the authentication center 106 is configured to store a correspondence between account information of a user and an identity of the user,
in some embodiments, the operation of the computing module 101 of the computing engine 101 to obtain the user identity of the first user is: the computing module 101 of the computing engine 101 queries the user identity of the first user from the authentication center 106 based on the account information of the first user.
In some embodiments, the operation of the file path authentication module 103 to obtain the user identity of the first user is: the file path authentication module 103 queries the user identity of the first user from the identity authentication center 106 based on the account information of the first user.
In some embodiments, referring to fig. 3, the access system 100 further includes a metadata center 107, where the metadata center 107 is configured to receive and store metadata of the first file input by the second user, where the metadata includes identification information of the first file, an operation type required to operate on the first file, and a file path of the first file. Alternatively, the type of operation may be creating a first file, deleting a first file, querying a first file, or modifying a first file, etc.
In some embodiments, the first file is a structured data file, the metadata of the first file further comprising one or more of: column identification for each column in the first file, column type for each column in the first file, row separator for the first file, or column separator for the first file, etc.
In some embodiments, the first file is a semi-structured data file, the metadata of the first file further comprising one or more of: identification information of each data segment in the first file, a type of each data segment in the first file, or a line separator of the first file, which is used to distinguish each line of data in any one data segment in the first file, etc.
In some embodiments, the operation of the compute engine 101 to obtain the file path of the first file is: the calculation engine 101 acquires metadata including identification information of the first file from the metadata center 107, and acquires a file path of the first file from the metadata.
In some embodiments, the operation of the file path authentication module 103 to obtain the file path of the first file is: the file path authentication module 103 acquires metadata including identification information of the first file from the metadata center 107, and acquires a file path of the first file from the metadata.
In some embodiments, the metadata center 107 displays a first interface to the second user, where the second user may input metadata of the first file, and receive metadata of the first file input by the second user through the first interface. Optionally, the first interface includes a Web product interface design (website user interface, web UI), or the like.
In some embodiments, the metadata center 107, upon receiving metadata of the first file, also obtains account information of the second user, and verifies the metadata of the first file based on the account information of the second user. In the course of the implementation thereof,
the metadata center 107 verifies the validity of the second user based on the account information of the second user. And when the second user is verified to be legal, acquiring the user identity of the second user, acquiring the operation type which can be operated by the second user based on the user identity of the second user, if the operation type which is included in the metadata and needs to be operated on the first file is the operation type which can be operated by the second user, verifying the metadata of the first file, and then storing the metadata of the first file.
In some embodiments, the operations of the metadata center 107 verifying the legitimacy of the second user and obtaining the user identity of the second user are:
the corresponding relation between the account information and the user identity is stored in the identity authentication center 106, the metadata center 107 inquires whether the identity authentication center 106 stores the account information of the second user, and if the identity authentication center 106 stores the account information of the second user, the second user is verified to be a legal user. The user identity of the second user is queried from the authentication center 106 based on the account information of the second user.
In some embodiments, the operations of the metadata center 10 to obtain the type of operations that the second user can operate are:
the corresponding relation between the user identity and the operation type is stored in the metadata center 107, and the metadata center 107 obtains the corresponding operation type from the corresponding relation between the user identity and the operation type based on the user identity of the second user, and the operation type is used as the operation type which can be operated by the second user.
In some embodiments, the operation of the metadata center 107 to save metadata of the first file is: the metadata center 107 inquires whether metadata including identification information of the first file has been saved, and if the metadata has been saved, updates the saved metadata to metadata of the first file. If the metadata is not saved, the metadata of the first file is directly saved.
Metadata center 107 includes specified administrator account information. After the metadata of the first file is verified, if the operation type included in the metadata is that of creating the first file, the metadata center 107 creates a file path of the first file in the object file storage system 102 based on the specified administrator account information, where a storage location corresponding to the file path is used to store the first file. If the metadata includes an operation type of deleting the first file, the metadata center 107 determines the first file in the object file storage system 102 based on the specified administrator account information and the file path of the first file, and deletes the determined first file. If the metadata includes an operation type of querying the first file, the metadata center 107 determines the first file in the object file storage system 102 based on the specified administrator account information and the file path of the first file, acquires the description information and/or attribute information of the first file, and returns the acquired content to the second user. If the metadata includes an operation type of modifying the first file, the metadata center 107 determines the first file in the object file storage system 102 based on the specified administrator account information and the file path of the first file, and modifies the description information and/or attribute information of the first file.
Referring to fig. 3, the coordinated rights module 104 is further configured to receive second rights information configured by a rights manager, the second rights information indicating a user identity and a fourth access operation capable of accessing the content in the first file. First authority information indicating a user identity of a file path capable of accessing the first file and a third access operation of the file path capable of accessing the first file is generated based on the second authority information. And saving the second authority information and the first authority information.
In some embodiments, the linking authority module 104 is further configured to obtain a metadata of the first file from the metadata center 107, obtain at least one user identity from the identity authentication center 106, and display a second interface to the authority manager, where the second interface includes metadata of the first file and the at least one user identity.
In this way, the rights manager selects content identification information of the content in the first file from the metadata of the first file, selects a user identity capable of accessing the content from the at least one user identity, and inputs a fourth operation type of a fourth access operation capable of accessing the content to the second interface, thus obtaining the second rights information. The second rights information includes content identification information of the content, a selected user identity, and a fourth operation type of the input. Optionally, the second interface includes a Web UI or the like.
In some embodiments, the first file is a structured data file, the metadata of the first file includes a file identification of the first file and a column identification of each column of the first file, the content identification information of the content includes a file identification of the first file, or the content identification information of the content includes a file identification of the first file and a column identification of at least one column of the first file, or the content identification information of the content includes a file identification of the first file and a row number of at least one row of the first file.
In some embodiments, the first file is a semi-structured data file, the metadata of the first file includes a file identification of the first file and identification information of each data segment of the first file, and the content identification information of the content includes a file identification of the first file, or the content identification information includes a file identification of the first file and identification information of at least one data segment in the first file.
In some embodiments, the operation of the linked authority module 104 to generate the first authority information is to:
(1): the linking authority module 104 acquires a file path of the first file based on the content identification information of the content in the second authority information.
In some embodiments, the content identification information of the content includes identification information of the first file, metadata including the identification information of the first file is obtained from the metadata center 107, the metadata is metadata of the first file, and a file path of the first file is obtained from the metadata of the first file.
(2): the linkage permission module 104 maps the fourth operation type included in the second permission information to obtain a third operation type.
(3): the linkage permission module 104 reads the user identity from the second permission information, and forms the file path of the first file, the user identity and the third operation type into first permission information.
In the embodiment of the application, a computing engine receives a data access request, determines granularity of accessing content in a first file based on access requirement information in the data access request, and accesses the first file based on account information of a first user and the access requirement information when the determined granularity is the first granularity. And accessing the first file based on the specified administrator account information and the access requirement information when the determined granularity is the second granularity. Thus, the access service with the first granularity can be provided for the user, and the access service with the second granularity can be provided for the user, so that the access service provided for the user is enriched. Because the first granularity is larger than the second granularity, when the determined granularity is the first granularity, the computing engine accesses the first file based on the account information of the first user and the access requirement information, so that the first file is accessed without borrowing the account information of an administrator, and the efficiency of accessing the first file and the performance of reading and writing the first file are improved.
In addition, if the first user is configured with the rights to access the content with the second granularity, the first user can access other content except the content in the first file, and the rights of the first user are automatically expanded to access any content in the first file, so that the rights are expanded too much, which is unfavorable for rights management. However, in the application, when the determined granularity is the second granularity, the computing engine accesses the first file based on the specified administrator account information and the access requirement information, so that the account information of the first user is replaced by the administrator account information, and the first file is accessed by using the administrator account information, so that the permission capable of accessing the second granularity does not need to be configured for the first user, thereby avoiding enlarging the access permission of the first user and facilitating the permission management.
Referring to fig. 4, an embodiment of the present application provides a method 400 for accessing a file, the method 400 being applied to the access system 100 shown in fig. 1 or fig. 3, the method 400 including the following steps 401 to 410.
Step 401: the computing engine receives a data access request including access requirement information indicating content in a first file that the first user needs to access, the first file stored in the object file storage system.
In some embodiments, the first user is a business user performing a data access business, and the first user sends a data access request to the compute engine.
In some embodiments, the access requirement information is an access statement for accessing the database, e.g., the access requirement information is an SQL statement or the like.
In some embodiments, the access requirement information includes content identification information of the content and a first operation type for indicating a first access operation to access the content in the first file. Optionally, the first access operation is to query the first file, update the first file, delete the first file, or the like.
In some embodiments, when the first access operation indicated by the first operation type is to update the first file, the access requirement information further includes content to be updated.
In some embodiments, the data access request may also include account information of the first user.
In some embodiments, the access requirement information may include several types of information, which are described below, respectively.
Type 1, the access requirement information includes identification information of the first file and a first operation type.
The access requirement information does not include identification information of the partition of the first file. When the first file is a structured data file, the access requirement information does not include first information, the first information being used to indicate at least one column and/or at least one row in the first file. When the first file is a semi-structured data file, the access requirement information does not include identification information of the partition in the first file.
In this case, the content identification information of the content is identification information of the first file.
For the type 1 access requirement information, the content is the whole content of the first file, which means that the first user needs to access the whole content of the first file, and the granularity of the first user accessing the content in the first file is the file granularity.
For example, the access requirement information is: select From Company information. The access requirement information includes identification information "Company information" of the first file as shown in table 1 and a first operation type "Select" which is a query for the first file.
Type 2, the access requirement information includes identification information of the first file, identification information of a partition in the first file, and a first operation type.
When the first file is a structured data file, the access requirement information does not include first information, the first information being used to indicate at least one column and/or at least one row in the first file. When the first file is a semi-structured data file, the access requirement information does not include identification information of the partition in the first file.
In this case, the content identification information of the content includes identification information of the first file and identification information of the partition in the first file.
For the type 2 access requirement information, the content is the partition of the first file, which indicates that the first user needs to access the partition of the first file, and the granularity of the first user accessing the content in the first file is the partition granularity.
Type 3, wherein the first file is a structured data file, and the access requirement information comprises identification information, first information and a first operation type of the first file, wherein the first information is used for indicating at least one column and/or at least one row in the first file.
For the type 3 access requirement information, the content is the at least one column or the at least one row of the first file, which indicates that the first user needs to access the at least one column or the at least one row of the first file, and the granularity of the first user accessing the content in the first file is the row-column granularity.
In some embodiments, the first information includes a column identification of the at least one column in the first file, the content being the at least one column in the first file indicating that the first user needs to access the at least one column of the first file. In this case, the content identification information of the content includes identification information of the first file and a column identification of the at least one column in the first file.
For example, the access requirement information is: select Name, city From Company information. The access requirement information includes identification information "Company information" of the first file, a column identification "Name" of the second column of the first file, a column identification "City" of the fourth column of the first file, and a first operation type "Select" which is a query of the first file, as shown in table 1.
In some embodiments, the first information includes a column identification of at least one column in the first file and row filtering information corresponding to each column in the at least one column, where the content is at least one row in the first file, indicating that the first user needs to access at least one row of the first file. In this case, the content identification information of the content includes identification information of the first file and a column identification of the at least one column in the first file.
For any one of the at least one column, the content of the column may be located from the first file as one or more rows of the row filtering information. The content is the content of the one or more lines located.
For example, referring to table 1 above, assuming that the first information includes a column identification "City" of the fourth column in table 1 and row filtering information "City 1" corresponding to the fourth column, the first, second, and fifth rows having a City of "City 1" in the fourth column may be located from the first file shown in table 1 based on the first information. The access requirement information is: select From Company information Where City =city 1, the first operation type "Select" is to query the first file.
In some embodiments, the first information includes a line number of at least one line in the first file, the content being the at least one line in the first file indicating that the first user needs to access the at least one line of the first file. In this case, the content identification information of the content includes identification information of the first file and a line number of the at least one line in the first file.
The type 4, the first file is a semi-structured data file, and the access requirement information includes identification information of the first file, identification information of at least one data segment in the first file, and a first operation type.
In this case, the content identification information of the content includes identification information of the first file and identification information of the at least one data segment in the first file.
For the type 4 access requirement information, the content is the at least one data segment of the first file, indicating that the first user needs to access the at least one data segment of the first file, and a granularity at which the first user accesses the content in the first file is data segment granularity.
For type 4 access requirement information, the data access request is a remote procedure call (remote procedure call, RPC) request that includes type 4 access requirement information.
For example, the data access information in the RPC request includes identification information "Company information" of the first file, identification information "Name" of the first data fragment of the first file, identification information "count" of the third data fragment of the first file, and the first operation type "query the first file" as shown in fig. 2.
In summary, the access requirement information at least includes identification information of the first file and the first operation type, and may further include the first information, identification information of a partition in the first file, identification information of a data segment in the first file, and the like.
Step 402: the computing engine authenticates the right of the first user to access the content based on the second right information, the account information of the first user and the access requirement information, and after the right of the first user to access the content is authenticated, step 403 is executed.
The second rights information is used to indicate a user identity capable of accessing the content and a fourth access operation capable of accessing the content.
Referring to fig. 5, in some embodiments, the computing engine is a Hive engine that authenticates the first user's rights to access the content by operation of 4021 to 4023 as follows.
4021: for the content identification information of the content in the access requirement information, the Hive engine determines whether the content exists in the object file storage system based on the content identification information, and if the content exists in the object file storage system, performs the operation of 4022 below.
In 4021, the content identification information includes identification information of the first file, and the Hive engine obtains metadata of the first file from the metadata center based on the identification information of the first file, the metadata of the first file including the identification information of the first file. When the metadata of the first file includes an operation type of deleting the first file, the Hive engine determines that the object file storage system does not exist the content based on the metadata of the first file, wherein the operation type is that the first file is deleted, and the metadata center is used for deleting the first file in the object file storage system. The metadata of the first file includes an operation type that is to create the first file, modify the first file or query the first file, and indicates that the object file storage system stores the first file. The Hive engine determines whether the content exists in the object file storage system based on the metadata of the first file and the content identification information.
In some embodiments, the operation of the Hive engine to obtain metadata for the first file is:
the Hive engine sends a first get command to the metadata center, the first get command including identification information of the first file. The metadata center receives a first acquisition command, acquires metadata comprising identification information of a first file from stored metadata, acquires the metadata as metadata of the first file, and sends a first acquisition response to the Hive engine, wherein the first acquisition response comprises the metadata of the first file. Or alternatively, the process may be performed,
the Hive engine sends a first get command to the metadata center. The metadata center receives a first acquisition command, acquires each piece of saved metadata, and sends a first acquisition response to the Hive engine, wherein the first acquisition response comprises each piece of metadata. The Hive engine receives the first acquisition response, acquires metadata comprising identification information of the first file from each piece of metadata, and the acquired metadata is the metadata of the first file.
In some embodiments, where the metadata of the first file includes an operation type that is to create the first file, modify the first file, or query the first file, the Hive engine determines, based on the metadata of the first file and the content identification information, whether the object file storage system has the content, as follows:
If the content identification information includes identification information of the first file and identification information of a partition in the first file, the metadata in the first file further includes identification information of the partition in the first file, and it is determined that the content exists in the object file storage system. The metadata at the first file does not include identification information for the partition in the first file, and it is determined that the content does not exist at the object file storage system.
If the content identification information includes identification information of the first file and column identification of at least one column in the first file, metadata of the first file further includes the column identification of the at least one column, and it is determined that the object file storage system exists the content. The method further includes determining that the object file storage system does not have the content when the metadata of the first file does not include the column identification of the at least one column.
If the content identification information comprises identification information of the first file and identification information of a data segment in the first file, metadata of the first file further comprises identification information of the data segment, and it is determined that the object file storage system exists the content. And determining that the object file storage system does not exist the content when the metadata of the first file does not include the identification information of the data segment.
If the content identification information includes identification information of the first file and a line number of at least one line in the first file, the object file storage system may be considered to exist for the content when it is determined that the object file storage system stores the first file.
The metadata of the first file includes a file path of the first file, so the computing engine reads the file path of the first file from the metadata of the first file.
The operation of 4021 is an optional operation, that is, the operation of 4022 may be directly performed without performing the operation of 4021. Alternatively, the operation of 4021 may be performed, and the operation of 4022 may be performed as follows.
4022: the Hive engine acquires second authority information comprising content identification information of the content from the linkage authority module, wherein the second authority information comprises the content identification information of the content, a user identity capable of accessing the content and a fourth operation type capable of accessing a fourth access operation of the content.
In some embodiments, the linkage authority module includes a second read-write interface, and the Hive engine reads each piece of second authority information stored in the linkage authority module through the second read-write interface, and obtains second authority information corresponding to the content from each piece of second authority information based on the content identification information of the content.
In some embodiments, the Hive engine sends a second get command to the linkage rights module, the second get command including content identification information for the content. The linkage authority module receives a second acquisition command, acquires second authority information corresponding to the content from each piece of stored second authority information based on the content identification information, and sends a second acquisition response to the Hive engine, wherein the second acquisition response comprises the acquired second authority information.
And if the content identification information comprises the identification information of the first file and the content identification information does not comprise other information, the acquired second authority information is the second authority information comprising the identification information of the first file.
And if the content identification information comprises the identification information of the first file and the line number of at least one line in the first file, and the content identification information does not comprise other information, the acquired second authority information is the second authority information comprising the identification information of the first file.
If the content identification information includes identification information of the first file and column identification of at least one column in the first file, and the content identification information does not include other information, the acquired second permission information is second permission information including identification information of the first file and column identification of the at least one column.
If the content identification information includes identification information of the first file and identification information of a partition in the first file, and the content identification information does not include other information, the acquired second right information is second right information including identification information of the first file and identification information of the partition.
If the content identification information includes identification information of the first file and identification information of a data segment in the first file, and the content identification information does not include other information, the acquired second permission information is second permission information including the identification information of the first file and the data segment.
4023: the Hive engine authenticates the right of the first user to access the content based on the second right information, the account information of the first user and the first operation type.
In 4023, the Hive engine determines the user identity of the first user based on the account information of the first user, compares the user identity of the first user with the user identity included in the second permission information, and compares the first operation type with the fourth operation type included in the second permission information. If the user identity of the first user is compared to be the same as the user identity included in the second permission information, and the first operation type is compared to be the same as the fourth operation type included in the second permission information, permission authentication for accessing the content by the first user is passed.
In some embodiments, the identity authentication center stores a correspondence between account information of the user and the identity of the user. The Hive engine queries the user identity of the first user from the identity authentication center based on the account information of the first user.
Referring to fig. 6, in some embodiments, the computing engine is a Spark engine that authenticates the first user's rights to access the content by following 4121 to 4125 operations.
4121: the Spark engine sends the access requirement information to the metadata center.
In some embodiments, the Spark engine also sends account information of the first user to the metadata center.
4122: for the content identification information of the content in the access requirement information, the metadata center determines whether the content exists in the object file storage system based on the content identification information, and if the content exists in the object file storage system, performs the operation of 4123 below.
The detailed process of the metadata center determining whether the object file storage system exists for the content, see the above-mentioned Hive engine in 4021, will not be described in detail herein.
If the Spark engine also transmits account information of the first user to the metadata center, the metadata center acquires second authority information comprising content identification information of the content from the linkage authority module when determining that the content exists in the object file storage system. Based on the second rights information, account information of the first user and the first operation type, rights of the first user to access the content are authenticated, and after the rights of the first user to access the content are authenticated, the following operation 4123 is performed.
The process of the metadata center obtaining the second rights information and authenticating the rights of the first user to access the content is referred to in 4022 and 4023, and the process of the Hive engine obtaining the second rights information and authenticating the rights of the first user to access the content is not described in detail herein.
4123: the metadata center sends acknowledgement information to the Spark engine.
And if the content does not exist in the object file storage system, transmitting negative acknowledgement information to the Spark engine. Or if the authority authentication of the first user for accessing the content is not passed, transmitting negative acknowledgement information to the Spark engine.
4124: the Spark engine receives the confirmation information, and acquires second authority information comprising content identification information of the content from the linkage authority module, wherein the second authority information comprises the content identification information of the content, the user identity capable of accessing the content and a fourth operation type of a fourth access operation capable of accessing the content.
The process of the Spark engine obtaining the second authority information refers to the process of the Hive engine obtaining the second authority information in 4022, and will not be described in detail herein.
If the Spark engine receives the negative, the operation is ended.
4125: the Spark engine authenticates the right of the first user to access the content based on the second right information, the account information of the first user and the first operation type.
The process of the Spark engine authenticating the right of the first user to access the content is referred to in 4023, and is not described in detail herein.
The operations 4121-4123 are optional operations, that is, the operations 4121-4123 may not be performed, and the Spark directly performs the operations 4124-4125, that is, the Spark engine obtains second rights information including content identification information of the content from the linked rights module, and based on the second rights information, account information of the first user and the first operation type, authenticates rights of the first user to access the content.
Step 403: the compute engine determines the granularity of accessing the content of the first file based on the access requirement information, performs step 404 if the determined granularity is the first granularity, and performs step 409 if the determined granularity is the second granularity.
In step 403, the compute engine determines a granularity of accessing the content of the first file based on the content identification information of the content included in the access requirement information.
The above describes four types of information for the access requirement information, and the process of determining the granularity is described next for each type of access requirement information.
For the above type 1 access requirement information, the access requirement information includes identification information of the first file. But the access requirement information does not include identification information of the partition of the first file. And, when the first file is a structured data file, the access requirement information does not include the first information. When the first file is a semi-structured data file, the access requirement information does not include identification information of the data segments in the first file. At this time, the content identification information of the content is the identification information of the first file, and the computing engine determines the granularity of accessing the content of the first file as the file granularity.
For the above type 2 access requirement information, the access requirement information includes identification information of the first file and identification information of the partition in the first file. And, when the first file is a structured data file, the access requirement information does not include the first information. When the first file is a semi-structured data file, the access requirement information does not include identification information of the partition in the first file. At this time, the content identification information of the content is the identification information of the first file and the identification information of the partition in the first file, and the computing engine determines the granularity of accessing the content of the first file as the partition granularity.
Wherein the first granularity is a file granularity or a partition granularity, so if the determined granularity is the first granularity, the content identification information of the content includes identification information of the first file, or the content identification information of the content includes identification information of the first file and identification information of the partition in the first file.
For the above type 3 access requirement information, the first file is a structured data file, and the access requirement information includes identification information of the first file and first information, where the first information is used to indicate at least one row and/or at least one column in the first file. At this time, the content identification information of the content is the identification information of the first file and the column identification of the at least one column in the first file, or the content identification information of the content is the identification information of the first file and the row number of the at least one row in the first file, and the computing engine determines that the granularity of accessing the content of the first file is the row-column granularity.
For the above type 4 access requirement information, the first file is a semi-structured data file, and the access requirement information includes identification information of the first file and identification information of a data segment in the first file. At this time, the content identification information of the content is the identification information of the first file and the identification information of the data segment in the first file, and the computing engine determines the granularity of accessing the content of the first file as the data segment granularity.
Wherein the second granularity is a row-column granularity or a data fragment granularity, so if the determined granularity is the second granularity, the content identification information of the content includes identification information of the first file and a column identification of the at least one column in the first file, and optionally, the access requirement information may further include row filtering information corresponding to the at least one column. Alternatively, the content identification information of the content includes identification information of the first file and a line number of the at least one line in the first file. Alternatively, the content identification information of the content includes identification information of the first file and identification information of the data fragment in the first file.
The computing engine includes a computing module and a routing module. The above step 402 is performed by a calculation module and the above step 403 is performed by a routing module. The calculation module may perform step 402, and then the routing module may perform step 403. Alternatively, the routing module performs step 403 above, and then the computing module may perform step 402 above. Alternatively, the routing module performs step 403 above, and the computing module also performs step 402 above. The computing engine performs the following step 404 when the authority authentication for the first user to access the content passes and if the granularity determined by the routing module is a first granularity, performs the following step 409 if the granularity determined by the routing module is a second granularity.
Step 404: the computing engine sends an authentication request to the file path authentication module, the authentication request including authentication information indicating the first user, a file path of the first file, and a second access operation of the first user to access the file path.
In some embodiments, the authentication information includes a file path of the first file, a user identity of the first user, and a second operation type. The second operation type is an operation type which corresponds to the first operation type and can access the object file storage system, the computing engine maps the first operation type to obtain the second operation type, and the second operation type comprises a read operation and/or a write operation.
In some embodiments, the authentication information includes identification information of the first file, account information of the first user, and a first operation type. The authentication information may also be information including other contents, which are not listed here.
Referring to fig. 5 or 6, in step 404, a routing module of a computing engine (Hive engine or Spark engine) sends an authentication request to a file path authentication module.
Step 405: the file path authentication module receives the authentication request, and authenticates the authority of the first user to access the file path by adopting a second access operation based on the first authority information and the authentication information, wherein the second access operation is an access operation corresponding to a second operation type.
And the linkage authority module stores first authority information corresponding to at least one file, and for any file, the first authority information corresponding to the file comprises a file path of the file, a user identity capable of accessing the file path of the file and a third operation type of a third access operation capable of accessing the file path of the file.
In some embodiments, the authentication information includes a file path of the first file, a user identity of the first user, and a second operation type. In step 405, the file path authentication module reads first rights information including the file path from the linked rights module based on the file path, the read first rights information corresponding to the first file. And comparing the user identity of the first user with the user identity included in the read first authority information, and comparing the second operation type with the third operation type included in the read first authority information. If the user identity of the first user is the same as the user identity included in the read first authority information, and the second operation type is the same as the third operation type included in the read first authority information, the authority authentication of accessing the file path by adopting the second access operation for the first user is passed.
In some embodiments, the authentication information includes identification information of the first file, account information of the first user, and a first operation type. The file path authentication module firstly obtains a file path of the first file based on the identification information of the first file, obtains the user identity of the first user based on the account information of the first user, and maps the first operation type to obtain a second operation type. And then authenticating the right of the first user to access the file path by adopting the second access operation.
In some embodiments, the file path authentication module obtains metadata including identification information of the first file from a metadata center, the metadata being metadata of the first file, and reads a file path of the first file from the metadata of the first file. And the file path authentication module acquires the user identity of the first user from the identity authentication center according to the account information of the first user.
Step 406: and when the authority authentication of the first user for accessing the file path by adopting the second access operation passes, the file path authentication module sends storage information to the object file storage system and sends an authentication response to the computing engine, wherein the storage information comprises a temporary certificate, the file path and a second operation type, and the authentication response comprises the temporary certificate.
In step 406, the file path authentication module assigns a temporary credential when authority to access the file path using the second access operation for the first user is authenticated.
Step 407: the object file storage system receives the storage information and correspondingly stores the corresponding relation among the temporary certificate, the file path and the second operation type.
The object file storage system stores the corresponding relation of the temporary certificate, the file path and the operation type. In step 407, the object file storage system receives the storage information, and stores the temporary credential, the file path, and the second operation type in correspondence with the temporary credential, the file path, and the operation type.
In the corresponding relation of the temporary certificate, the file path and the operation type, if the time length of the temporary certificate is up to the appointed duration, the object file storage system deletes the record comprising the temporary certificate from the corresponding relation of the temporary certificate, the file path and the operation type.
Step 408: the computing engine receives the authentication response and accesses a first file in the object file storage system based on the temporary credential, the access requirement information, and the file path.
The access requirement information includes content identification information of the content and a first operation type of the first access operation, the content identification information of the content includes identification information of the first file, or the content identification information of the content includes identification information of the first file and identification information of a partition in the first file.
Assuming that the first access operation is to query the first file, the second access operation mapped by the first access operation includes a read operation, and in step 408, the first file is accessed as follows.
1-1: the compute engine sends a read request to the object file storage system, the read request including the temporary credential and a file path of the first file.
For example, the access requirement information listed above is: select From Company information. In this example, the first operation type "Select" is a query for the first file, the content identification information of the content is "Company information", and the second access operation mapped by the computing engine to "Select" includes a read operation.
Assume that the temporary credential assigned by the file path authentication module is "P1", and the correspondence of the temporary credential, the file path, and the operation type as shown in table 2 below is saved in the object file storage system. The first record in the correspondence includes temporary credential 1, file path "C \windows\systems 32\ Company information" of the first file as shown in Table 1, and a second operation type "read operation" of a second access operation to access the file path.
TABLE 2
Temporary evidence File path Operation type
P1 C:\windows\system32\Company information Read operation
…… …… ……
In 1-1, the compute engine sends a read request to the object file storage system, the read request including the temporary credential "P1" and the file path "C: \windows\systems 32\ Company information" for the first file.
1-2: the object file storage system receives the read request, and obtains a corresponding file path and a second operation type from the corresponding relation among the temporary credential, the file path and the operation type based on the temporary credential included in the read request.
For example, the object file storage system receives the read request, the read request including the temporary credential "P1" and the file path "C: \windows\systems 32\ Company information" for the first file. Based on the temporary certificate 'P1', a corresponding file path 'C\windows\systems 32\ Company information' and a second operation type 'read operation' are acquired from the corresponding relation of the temporary certificate, the file path and the operation type shown in the table 2.
1-3: if the file path included in the read request is the same as the acquired file path and the second access operation corresponding to the second operation type includes a read operation, the object file storage system reads the first file based on the file path of the first file and returns the first file to the computing engine.
The read request includes a file path "C: \windows\systems 32\ Company information" that is the same as the corresponding file path "C: \windows\systems 32\ Company information", and the second access operation corresponding to the second operation type (read operation) that is obtained includes a read operation, so the object file storage system reads the first file as shown in Table 1 based on the file path "C: \windows\systems 32\ Company information" of the first file, and returns the first file as shown in Table 1 to the compute engine.
1-4: the computing engine receives a first file, wherein the content identification information comprises identification information of the first file, and returns the first file to a first user; the content identification information comprises identification information of a first file and identification information of the partition in the first file, the content of the partition is obtained from the first file, and the content of the partition is returned to the first user.
The content identification information comprises identification information of the first file, and the identification information indicates that the first user needs to inquire the whole content of the first file. The content identification information comprises identification information of the first file and identification information of the partition in the first file, and indicates that the first user needs to inquire the content of the partition in the first file.
For example, the computing engine receives the first file as shown in table 1, where the content identification information includes identification information "Company information" of the file as shown in table 1, and returns the first file as shown in table 1 to the first user.
Assuming that the first access operation is to update the first file, the second access operation mapped by the first access operation includes a read operation and a write operation, and the access requirement information includes content to be updated. In step 408, the first file is accessed as follows.
2-1: the compute engine sends a read request to the object file storage system, the read request including the temporary credential and the file path.
2-2: the object file storage system receives the read request, and obtains a corresponding file path and a second operation type from the corresponding relation among the temporary credential, the file path and the operation type based on the temporary credential included in the read request.
2-3: if the file path included in the read request is the same as the acquired file path and the second access operation corresponding to the second operation type includes a read operation, the object file storage system reads the first file based on the file path and returns the first file to the computing engine.
2-4: the method comprises the steps that a computing engine receives a first file, the content identification information comprises identification information of the first file, and content in the first file is updated to be updated; and updating the content of the partition in the first file to be updated content, wherein the content identification information comprises identification information of the first file and identification information of the partition in the first file.
2-5: the compute engine sends a write request to the object file storage system, the write request including the temporary credential, the first file, and the file path.
2-6: the object file storage system receives the write request, and obtains a corresponding file path and a second operation type from the corresponding relation among the temporary credential, the file path and the operation type based on the temporary credential included in the write request.
2-7: if the file path included in the write request is the same as the acquired file path and the second access operation corresponding to the second operation type includes a write operation, the object file storage system replaces the first file stored at the file path with the first file included in the write request.
The first access operation may also be other operations, for example, the first access operation may also be deleting the first file, etc., which are not listed here.
Step 409: the computing engine sends an access instruction to the data filtering engine, the access instruction including a file path of the first file and the access requirement information.
Referring to fig. 5 or 6, in step 409, a routing module of a computing engine (Hive engine or Spark engine) sends an access instruction to a data filtering engine.
Step 410: the data filtering engine receives the access instruction and accesses the first file based on the administrator account information, the file path and the access requirement information.
In some embodiments, the data filtering engine authenticates the right of the first user to access the content based on the second rights information, the account information of the first user, and the access requirement information. After the right of the first user to access the content is authenticated, the first file is accessed based on the administrator account information, the file path and the access requirement information. The second rights information is used to indicate a user identity capable of accessing the content and a fourth access operation capable of accessing the content.
The details of the data filtering engine for authenticating the right of the first user to access the content, see the details of the foregoing step 402 for authenticating the right of the first user to access the content by the computing engine, will not be described in detail herein.
The first file is a structured data file, the access requirement information includes identification information of the first file, first information and a first operation type of the first access operation, the first information includes a column identification of at least one column of the first file, or the first information includes a column identification of at least one column of the first file and row filtering information corresponding to the at least one column, or the first information includes a row number of the at least one row in the first file. Or the first file is a semi-structured data file, and the content identification information of the content comprises identification information of the first file and identification information of data fragments in the first file.
Assuming that the first access operation is to query the first file, the second access operation mapped by the first access operation includes a read operation, and in step 410, the first file is accessed as follows.
3-1: the data filtering engine sends a read request to the object file storage system, wherein the read request comprises administrator account information and a file path of the first file.
For example, the access requirement information listed above is: select From Company information Where City =city 1. In this example, the first operation type "Select" is a query for the first file, the content identification information of the content includes "Company information" and the column identification "City" of the fourth column, and the row filtering information "City 1" corresponding to the fourth column, and the second access operation mapped by the computing engine to the "Select" includes a read operation.
The data filtering engine sends a read request to the object file storage system, wherein the read request comprises administrator account information of 'administtrators' and a file path of a first file of 'C \windows\systems 32\ Company information'.
3-2: and the object file storage system receives the read request, reads the first file based on the file path of the first file when the account information included in the read request is determined to be the administrator account information, and returns the first file to the data filtering engine.
The authority of the administrator is large, so that when the account information included in the read request is determined to be the administrator account information, the object file storage system can directly read the first file based on the file path of the first file.
For example, the object file storage system receives the read request, reads the first file as shown in Table 1 based on the file path "C \windows\systems 32\ Company information" of the first file, and returns the first file as shown in Table 1 to the data filtering engine.
3-3: the data filtering engine receives a first file, the access requirement information comprises first information, and content in the first file is acquired based on the first information; the content identification information comprises identification information of a data segment in the first file, and content in the first file is obtained from the first file, wherein the content is the content in the data segment.
In some embodiments, the first information includes a column identification of at least one column in the first file, and the data filtering engine obtains content of the at least one column from the first file based on the first information and returns the content of the at least one column to the computing engine.
In some embodiments, the first information includes column identification and row filtering information for at least one column in the first file, and the data filtering engine obtains content in the at least one column from the first file based on the first information as content for one or more rows of the row filtering information, and returns the content for the one or more rows to the computing engine.
For example, the data filtering engine receives the first file as shown in table 1, obtains three rows of which the content of the fourth column is "City 1" from the first file as shown in table 1 based on the column identification "City" of the fourth column and the row filtering information "City 1" corresponding to the fourth column, returns the contents of the three rows to the computing engine in the first row, the second row and the fifth row in table 1, and returns the contents of the three rows to the first user.
In some embodiments, the first information includes a line number of at least one line in the first file, and the data filtering engine obtains content of the at least one line from the first file based on the first information and returns the content of the at least one line to the computing engine.
In some embodiments, the first information includes identification information of a data segment in the first file, and the data filtering engine obtains content of the data segment from the first file based on the first information and returns the content of the data segment to the computing engine.
3-4: the data filtering engine returns the content in the first file to the computing engine.
3-5: the computing engine receives the content in the first file and returns the content in the first file to the first user.
Assuming that the first access operation is to update the first file, the second access operation mapped by the first access operation includes a read operation and a write operation, and the access requirement information includes content to be updated. In step 410, a first file is accessed as follows.
4-1: the data filtering engine sends a read request to the object file storage system, the read request including administrator account information and a file path for the first file.
4-2: and the object file storage system receives the read request, reads the first file based on the file path of the first file when the account information included in the read request is determined to be the administrator account information, and returns the first file to the data filtering engine.
4-3: the data filtering engine receives a first file, wherein the access requirement information comprises first information, and at least one column or at least one row indicated by the first information in the first file is updated to be updated; and the access requirement information comprises identification information of the data fragments in the first file, and the contents in the data fragments in the first file are updated to be updated contents.
In some embodiments, the first information includes a column identification of at least one column in the first file, and the data filtering engine updates the at least one column in the first file to the content to be updated.
In some embodiments, the first information includes column identification and row filtering information for at least one column in the first file, and the data filtering engine determines, from the first file, based on the first information, that content in the at least one column is content for one or more rows of the row filtering information, and updates the content for the one or more rows to content to be updated.
In some embodiments, the first information includes a line number of at least one line in the first file, and the data filtering engine updates content of the at least one line in the first file to content to be updated based on the first information.
In some embodiments, the first information includes identification information of a data segment in the first file, and the data filtering engine updates content of the data segment in the first file to content to be updated based on the identification information of the data segment.
4-4: the data filtering engine sends a write request to the object file storage system, the write request including administrator account information, a file path of the first file, and the updated first file.
4-6: and the object file storage system receives the writing request, and when the account information included in the writing request is determined to be the administrator account information, the first file stored in the file path is replaced by the first file included in the writing request.
The first access operation may also be other operations, for example, the first access operation may also be deleting the first file, etc., which are not listed here.
In some embodiments, for the first rights information and the second rights information in the linked rights module, the linked rights module receives the second rights information configured by the rights manager and generates the first rights information based on the second rights information.
Thus, the rights manager authorizes the rights of accessing the content of the file by configuring the second rights information, and the linkage rights module generates the first rights information based on the second rights information, so that the computing engine in the access system uses the second rights information for authentication, and the file path authentication module uses the first rights information for authentication. Thus requiring only one authorization by the rights manager, two-dimensional authentication is possible.
In some embodiments, the first user accesses the first file in an open source manner, that is, the first user sends authentication information to the file path authentication module through the client, where the authentication information includes a file path of the first file, a user identity of the first user, and a second operation type of the second access operation. The file path authentication module authenticates the authority of the first user for accessing the file path by adopting the second access operation based on the authentication information, and after the authentication is passed, an authentication response is sent to the client, wherein the authentication response comprises a temporary certificate, and storage information is sent to the object file storage system, and the storage information comprises the temporary certificate, the file path and the second operation type. The client receives the authentication response and accesses a first file in the object file storage system based on the temporary credential, the file path and the access requirement information. Thus realizing transparent access of the file.
In the embodiment of the application, because the computing engine determines the granularity of accessing the content in the first file based on the access requirement information, when the determined granularity is the first granularity, the computing engine requests the file path authentication module to authenticate the authority of the first user to access the file path of the first file, and after the authentication is passed, the temporary certificate distributed by the file path authentication module is obtained, and based on the temporary certificate, the access requirement information and the file path access object file storage system. The computing engine directly accesses the object file storage system, so that the read-write performance of the file is improved. When the determined granularity is the second granularity, the computing engine requests the data filtering engine to access the object file storage system, the data filtering engine comprises the appointed manager account information, so that the first file can be read from the object file storage system, and the data which the first user needs to access is segmented from the first file based on the second granularity, thereby realizing the purpose of providing access services smaller than the file granularity for the user and enriching the access services provided by the user.
Referring to fig. 7, an embodiment of the present application provides a method 700 for obtaining first rights information. The first permission information in the embodiment shown in fig. 1 or fig. 3, or the first permission information in the embodiment shown in fig. 4 is obtained by the method 700. The method 700 includes the steps of:
Step 701: the linkage permission module receives second permission information, wherein the second permission information is used for indicating the identity of a user capable of accessing the content in the first file and a fourth access operation.
The linkage authority module acquires a metadata report of a first file from the metadata center, wherein the metadata of the first file is any metadata stored in the metadata center. At least one user identity is obtained from the identity authentication center, and a second interface is displayed to the rights manager, wherein the second interface comprises metadata of the first file and the at least one user identity.
In this way, the rights manager selects content identification information of the content in the first file from the metadata of the first file, selects a user identity capable of accessing the content from the at least one user identity, and inputs a fourth operation type of a fourth access operation capable of accessing the content to the second interface, thus obtaining the second rights information. The second rights information includes content identification information of the content, a selected user identity, and a fourth operation type of the input. The linkage permission module reads second permission information from the second interface.
In some embodiments, the first file is a structured data file, and the metadata of the first file includes a file identification of the first file and a column identification of each column of the first file. Optionally, the content identification information of the content selected by the rights manager includes a file identification of the first file, or the content identification information of the content selected by the rights manager includes a file identification of the first file and a column identification of at least one column in the first file, or the content identification information of the content selected by the rights manager includes a file identification of the first file and a line number of at least one line in the first file.
In some embodiments, the first file is a semi-structured data file, and the metadata of the first file includes a file identification of the first file and identification information of each data segment of the first file. Optionally, the content identification information of the content selected by the rights manager includes a file identification of the first file, or the content identification information of the content selected by the rights manager includes a file identification of the first file and identification information of at least one data segment in the first file.
Step 702: the linkage permission module generates first permission information based on the second permission information.
In step 702, the linked authority module generates first authority information by the following 7021-7024 operations, which 7021-7024 operations are:
7021: the linkage authority module obtains a file path of the first file based on the content identification information of the content in the second authority information.
In some embodiments, the content identification information of the content includes identification information of the first file, and the linking authority module obtains metadata including the identification information of the first file from a metadata center, where the metadata is metadata of the first file, and obtains a file path of the first file from the metadata of the first file.
7022: and the linkage authority module maps the fourth operation type included in the second authority information to obtain a third operation type.
The access operation corresponding to the fourth operation type is a fourth access operation configured by an administrator and capable of accessing the content in the first file, and the fourth access operation may be querying the first file, updating the first file, deleting the first file, or the like. The third operation type is an access operation which corresponds to the fourth operation type and can access the object file storage system. The third operation type includes a read operation and/or a write operation, etc.
7023: the linkage authority module reads the user identity from the second authority information, and forms the file path of the first file, the user identity and the third operation type into the second authority information.
Step 703: the linkage authority module stores the first authority information and the second authority information.
The above-described process of steps 701-703 may be repeated such that the linked authority module generates a large amount of first authority information and second authority information.
In the embodiment of the application, the linkage authority module receives second authority information configured by an authority manager, generates first authority information based on the second authority information, and the first authority information is used for indicating the user identity and the access operation of a file path capable of accessing the first file. Therefore, the first authority information can be automatically generated, the efficiency of obtaining the first authority information is improved, and the cost of obtaining the first authority information is reduced. The linkage authority module automatically generates first authority information based on the second authority information, wherein the second authority information is used for authenticating the authority of the user to access the content in the first file, and the first authority information is used for authenticating the authority of the user to access the file path of the first file. Thus the rights manager only needs to authorize once (configure the second rights information) and the access system uses the second rights information and the first rights information for two-dimensional authentication.
Referring to FIG. 8, an embodiment of the present application provides an apparatus 800 for accessing files, where the apparatus 800 may be deployed on a computing engine in the system shown in FIG. 1 or FIG. 3, or on a computing engine in the embodiment shown in FIG. 4, FIG. 5, or FIG. 6. The apparatus 800 includes:
a communication unit 801 for receiving a data access request, the data access request including access requirement information for indicating contents in a first file that a first user needs to access, the first file being stored in an object file storage system;
a processing unit 802, configured to, when determining, based on the access requirement information, that the granularity of accessing the content in the first file is the first granularity, access the first file based on the account information of the first user and the access requirement information;
the processing unit 802 is further configured to, when determining, based on the access requirement information, that the granularity of accessing the content in the first file is a second granularity, access the first file based on the specified administrator account information and the access requirement information, where the second granularity is smaller than the first granularity.
Optionally, the detailed implementation process of the communication unit 801 for receiving the data access request is referred to in the relevant content of step 401 of the embodiment shown in fig. 4, and will not be described in detail here.
Optionally, the processing unit 802 accesses the detailed implementation procedure of the first file based on the account information of the first user and the access requirement information, which is referred to in the relevant content of steps 405 to 408 of the embodiment shown in fig. 4, and will not be described in detail here.
Optionally, the processing unit 802 accesses the detailed implementation procedure of the first file based on the specified administrator account information and the access requirement information, see relevant contents of steps 409-410 of the embodiment shown in fig. 4, which are not described in detail herein.
Optionally, the access requirement information includes identification information of a first file, and the first granularity is file granularity; or alternatively, the process may be performed,
optionally, the access requirement information includes identification information of the first file and identification information of a partition in the first file, and the first granularity is a partition granularity.
Optionally, the communication unit 801 is further configured to send an authentication request to the file path authentication module, where the authentication request includes authentication information, where the authentication information is used to indicate the first user, a file path of the first file, and an access operation of the first user to access the file path, the authentication information is obtained based on the access requirement information and account information of the first user, the authentication request is used to trigger the file path authentication module to authenticate, based on first permission information and the authentication information, a permission of the first user to access the file path using the access operation, where the file path is used to indicate a storage location of the first file, and the first permission information is used to indicate a user identity and an access operation that can access the file path;
The communication unit 801 is further configured to receive an authentication response sent by the file path authentication module after the authority authentication passes, where the authentication response includes a temporary credential, and the temporary credential, the file path, and an operation type of the access operation are correspondingly stored in the object file storage system;
a processing unit 802 for accessing a first file based on the temporary credential, the access requirement information and the file path.
Optionally, the detailed implementation process of the communication unit 801 for sending the authentication request to the file path authentication module is referred to in the relevant content of step 404 of the embodiment shown in fig. 4, which is not described in detail here.
Optionally, the detailed implementation process of the communication unit 801 to receive the authentication response, see the relevant content of step 408 of the embodiment shown in fig. 4, will not be described in detail here.
Optionally, the processing unit 802 accesses the detailed implementation procedure of the first file based on the temporary credential, the access requirement information and the file path, see the relevant content of step 408 of the embodiment shown in fig. 4, which is not described in detail here.
Optionally, the first file is a structured data file, the first file stores data in a list form, the access requirement information includes identification information and first information of the first file, the first information is used for indicating at least one column of the first file and/or at least one row of the first file, and the second granularity is a row-column granularity; or alternatively, the process may be performed,
The first file is a semi-structured data file, the first file comprises at least one data segment, the data segment is used for storing data with the same service attribute, the access requirement information comprises identification information of the first file and identification information of one or more data segments in the first file, and the second granularity is the data segment granularity.
Optionally, the communication unit 801 is further configured to send an access instruction to the data filtering engine, where the access instruction includes the access requirement information, and the data filtering engine includes administrator account information, and the access instruction is configured to trigger the data filtering engine to access the first file based on the administrator account information and the access requirement information.
Optionally, the detailed implementation process of the communication unit 801 to send the access instruction to the data filtering engine, see the relevant content of step 409 in the embodiment shown in fig. 4, which will not be described in detail here.
Optionally, the processing module 802 is further configured to:
authenticating the authority of the first user to access the content based on second authority information, account information of the first user and the access requirement information, wherein the second authority information is used for indicating the identity of the user capable of accessing the content and the access operation;
After the right to access the content for the first user passes, determining a granularity of accessing the content in the first file based on the access requirement information.
Optionally, the processing unit 802 is further configured to:
first rights information indicating a user identity and an access operation of a file path capable of accessing the first file, the file path indicating a storage location of the first file, is generated based on the second rights information.
Optionally, the detailed implementation process of the first authority information generated by the processing unit 802, see the relevant content of step 702 in the embodiment shown in fig. 7, which is not described in detail here.
In the embodiment of the application, because the first granularity is larger than the second granularity, when the determined granularity is the first granularity, the processing unit accesses the first file based on the account information of the first user and the access demand information, so that the first file is accessed without borrowing the account information of an administrator, and the efficiency of accessing the first file and the performance of reading and writing the first file are improved. When the determined granularity is the second granularity, the processing unit accesses the first file based on the specified administrator account information and the access requirement information, so that the account information of the first user is replaced by the administrator account information, and the first file is accessed by using the administrator account information, so that the permission capable of accessing the second granularity does not need to be configured for the first user, the access permission of the first user is prevented from being expanded, and the permission management is facilitated.
Referring to fig. 9, an apparatus 900 for accessing a file is provided in an embodiment of the present application. The apparatus 900 may be a computing engine as in any of the embodiments described above, for example, the computing engine provided in the embodiments shown in fig. 1, 3, 4, 5, or 6. The apparatus 900 comprises at least one processor 901, an internal connection 902, a memory 903, and at least one transceiver 904.
The apparatus 900 is a hardware configuration apparatus that may be used to implement the functional modules in the apparatus 800 illustrated in fig. 8. For example, it will be appreciated by those skilled in the art that the processing unit 802 in the apparatus 800 shown in fig. 8 may be implemented by the at least one processor 901 invoking code in the memory 903, and the communication unit 801 in the apparatus 800 shown in fig. 8 may be implemented by the transceiver 904.
Optionally, the apparatus 900 may also be used to implement the functionality of the computing engine of any of the embodiments described above.
Alternatively, the processor 901 may be a general purpose central processing unit (central processing unit, CPU), network processor (network processor, NP), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present application.
The internal connection 902 may include a path to transfer information between the components. Alternatively, the internal connection 902 is a board or bus, etc.
The transceiver 904 is used to communicate with other devices or communication networks.
The memory 903 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), a compact disc read-only memory (compact disc read-only memory) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be stand alone and coupled to the processor via a bus. The memory may also be integrated with the processor.
The memory 903 is used for storing application program codes for executing the scheme of the present application, and the processor 901 controls the execution. The processor 901 is configured to execute application code stored in the memory 903 and cooperate with at least one transceiver 904 to cause the apparatus 900 to perform the functions of the methods of the present patent.
In a particular implementation, processor 901 may include one or more CPUs, such as CPU0 and CPU1 of FIG. 9, as an embodiment.
In a specific implementation, the apparatus 900 may include multiple processors, such as the processor 901 and the processor 907 in fig. 9, as one embodiment. Each of these processors may be a single-core (single-CPU) processor or may be a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the present application is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present application are intended to be included within the scope of the present application.

Claims (23)

1. A method of accessing a file, the method comprising:
receiving a data access request, wherein the data access request comprises access requirement information, and the access requirement information is used for indicating the content in a first file which a first user needs to access, and the first file is stored in an object file storage system;
when determining that the granularity of accessing the content in the first file is a first granularity based on the access requirement information, accessing the first file based on the account information of the first user and the access requirement information;
and when the granularity of accessing the content in the first file is determined to be a second granularity based on the access requirement information, accessing the first file based on the specified administrator account information and the access requirement information, wherein the second granularity is smaller than the first granularity.
2. The method of claim 1, wherein the access requirement information includes identification information of the first file, the first granularity being a file granularity; or alternatively, the process may be performed,
The access requirement information comprises identification information of the first file and identification information of partitions in the first file, and the first granularity is partition granularity.
3. The method of claim 2, wherein the accessing the first file based on the account information of the first user and the access requirement information comprises:
sending an authentication request to a file path authentication module, wherein the authentication request comprises authentication information, the authentication information is used for indicating the first user, a file path of the first file and access operation of the first user for accessing the file path, the authentication information is obtained based on the access requirement information and account information of the first user, the authentication request is used for triggering the file path authentication module to authenticate the authority of the first user for accessing the file path by adopting the access operation based on first authority information and the authentication information, the file path is used for indicating the storage position of the first file, and the first authority information is used for indicating the identity and the access operation of the user capable of accessing the file path;
receiving an authentication response sent by the file path authentication module after the authority authentication passes, wherein the authentication response comprises a temporary certificate, and the temporary certificate, the file path and the operation type of the access operation are correspondingly stored in the object file storage system;
Accessing the first file based on the temporary credential, the access requirement information, and the file path.
4. The method of claim 1, wherein the first file is a structured data file, the first file storing data in a list, the access requirement information including identification information of the first file and first information indicating at least one column of the first file and/or at least one row of the first file, the second granularity being a column granularity; or alternatively, the process may be performed,
the first file is a semi-structured data file, the first file comprises at least one data segment, the data segment is used for storing data with the same service attribute, the access requirement information comprises identification information of the first file and identification information of one or more data segments in the first file, and the second granularity is data segment granularity.
5. The method of claim 4, wherein accessing the first file based on the specified administrator account information and the access requirement information comprises:
and sending an access instruction to a data filtering engine, wherein the access instruction comprises the access requirement information, the data filtering engine comprises the administrator account information, and the access instruction is used for triggering the data filtering engine to access the first file based on the administrator account information and the access requirement information.
6. The method of any one of claims 1-5, wherein the method further comprises:
authenticating the authority of the first user to access the content based on second authority information, account information of the first user and the access requirement information, wherein the second authority information is used for indicating the identity of the user capable of accessing the content and the access operation;
after the right of the first user to access the content passes, determining granularity of accessing the content in the first file based on the access requirement information.
7. The method of claim 6, wherein the method further comprises:
and generating first authority information based on the second authority information, wherein the first authority information is used for indicating the user identity and access operation of a file path capable of accessing the first file, and the file path is used for indicating the storage position of the first file.
8. An access system, the system comprising: a computing engine and an object file storage system;
the computing engine is used for receiving a data access request, the data access request comprises access requirement information, the access requirement information is used for indicating content in a first file which a first user needs to access, and the first file is stored in the object file storage system;
The computing engine is further configured to, when determining that the granularity of accessing the content in the first file is a first granularity based on the access requirement information, access the first file based on account information of the first user and the access requirement information;
the computing engine is further configured to, when determining, based on the access requirement information, that a granularity of accessing the content in the first file is a second granularity, access the first file based on specified administrator account information and the access requirement information, where the second granularity is smaller than the first granularity.
9. The system of claim 8, wherein the access requirement information includes identification information of the first file, the first granularity being a file granularity; or alternatively, the process may be performed,
the access requirement information comprises identification information of the first file and identification information of partitions in the first file, and the first granularity is partition granularity.
10. The system of claim 9, further comprising a file path authentication module,
the computing engine is configured to send an authentication request to the file path authentication module, where the authentication request includes authentication information, where the authentication information is used to indicate the first user, a file path of the first file, and an access operation of the first user to access the file path, and the authentication information is obtained based on the access requirement information and account information of the first user, and the file path is used to indicate a storage location of the first file;
The file path authentication module is used for authenticating the authority of the first user for accessing the file path by adopting the access operation based on first authority information and authentication information, the first authority information is used for indicating the user identity and the access operation which can access the file path, and an authentication response is sent to the calculation engine after the authority authentication is passed, and the authentication response comprises a temporary certificate;
the object file storage system is used for correspondingly storing the temporary certificate, the file path and the operation type of the access operation;
the computing engine is further configured to access the first file based on the temporary credential, the access requirement information, and the file path.
11. The system of claim 8, wherein the first file is a structured data file, the first file storing data in a list, the access requirement information including identification information of the first file and first information indicating at least one column of the first file and/or at least one row of the first file, the second granularity being a column granularity; or alternatively, the process may be performed,
The first file is a semi-structured data file, the first file comprises at least one data segment, the data segment is used for storing data with the same service attribute, the access requirement information comprises identification information of the first file and identification information of one or more data segments in the first file, and the second granularity is data segment granularity.
12. The system of claim 11, further comprising a data filtering engine that includes the administrator account information;
the computing engine is used for sending an access instruction to the data filtering engine, wherein the access instruction comprises a file path of the first file and the access requirement information, and the file path is used for indicating the storage position of the first file;
the data filtering engine is used for accessing the first file based on the administrator account information, the file path and the access requirement information.
13. The system of any of claims 8-12, wherein the computing engine is further to:
authenticating the authority of the first user to access the content based on second authority information, account information of the first user and the access requirement information, wherein the second authority information is used for indicating the identity of the user capable of accessing the content and the access operation;
After the right of the first user to access the content passes, determining granularity of accessing the content in the first file based on the access requirement information.
14. The system of claim 13, wherein the system further comprises a linked authority module,
the linkage authority module is used for generating first authority information based on the second authority information, wherein the first authority information is used for indicating the user identity and the access operation of a file path capable of accessing the first file, and the file path is used for indicating the storage position of the first file.
15. An apparatus for accessing a file, the apparatus comprising:
a communication unit configured to receive a data access request, where the data access request includes access requirement information, where the access requirement information is used to indicate content in a first file that a first user needs to access, and the first file is stored in an object file storage system;
the processing unit is used for accessing the first file based on the account information of the first user and the access demand information when the granularity of accessing the content in the first file is determined to be a first granularity based on the access demand information;
The processing unit is further configured to, when determining, based on the access requirement information, that the granularity of accessing the content in the first file is a second granularity, access the first file based on specified administrator account information and the access requirement information, where the second granularity is smaller than the first granularity.
16. The apparatus of claim 15, wherein the access requirement information comprises identification information of the first file, the first granularity being a file granularity; or alternatively, the process may be performed,
the access requirement information comprises identification information of the first file and identification information of partitions in the first file, and the first granularity is partition granularity.
17. The apparatus of claim 16, wherein the device comprises a plurality of sensors,
the communication unit is further configured to send an authentication request to a file path authentication module, where the authentication request includes authentication information, the authentication information is used to indicate the first user, a file path of the first file, and an access operation of the first user to access the file path, the authentication information is obtained based on the access requirement information and account information of the first user, the authentication request is used to trigger the file path authentication module to authenticate a right of the first user to access the file path by using the access operation based on first right information and the authentication information, the file path is used to indicate a storage location of the first file, and the first right information is used to indicate a user identity and an access operation that can access the file path;
The communication unit is further configured to receive an authentication response sent by the file path authentication module after the authority authentication passes, where the authentication response includes a temporary credential, and the temporary credential, the file path, and an operation type of the access operation are correspondingly stored in the object file storage system;
the processing unit is configured to access the first file based on the temporary credential, the access requirement information, and the file path.
18. The apparatus of claim 15, wherein the first file is a structured data file, the first file storing data in a list, the access requirement information including identification information of the first file and first information indicating at least one column of the first file and/or at least one row of the first file, the second granularity being a column granularity; or alternatively, the process may be performed,
the first file is a semi-structured data file, the first file comprises at least one data segment, the data segment is used for storing data with the same service attribute, the access requirement information comprises identification information of the first file and identification information of one or more data segments in the first file, and the second granularity is data segment granularity.
19. The apparatus of claim 18, wherein the device comprises a plurality of sensors,
the communication unit is further configured to send an access instruction to a data filtering engine, where the access instruction includes the access requirement information, the data filtering engine includes the administrator account information, and the access instruction is configured to trigger the data filtering engine to access the first file based on the administrator account information and the access requirement information.
20. The apparatus of any of claims 15-19, wherein the processing module is further to:
authenticating the authority of the first user to access the content based on second authority information, account information of the first user and the access requirement information, wherein the second authority information is used for indicating the identity of the user capable of accessing the content and the access operation;
after the right of the first user to access the content passes, determining granularity of accessing the content in the first file based on the access requirement information.
21. The apparatus of claim 20, wherein the processing module is further to:
and generating first authority information based on the second authority information, wherein the first authority information is used for indicating the user identity and access operation of a file path capable of accessing the first file, and the file path is used for indicating the storage position of the first file.
22. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a computer, implements the method according to any of claims 1-7.
23. A computer program product, characterized in that the computer program product comprises a computer program stored in a computer readable storage medium and that the computer program is loaded by a processor to implement the method according to any of claims 1-7.
CN202210511098.1A 2022-03-17 2022-05-11 Method, device, system and storage medium for accessing file Pending CN116821921A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/070167 WO2023173908A1 (en) 2022-03-17 2023-01-03 Method, apparatus and system for accessing file, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2022102648988 2022-03-17
CN202210264898 2022-03-17

Publications (1)

Publication Number Publication Date
CN116821921A true CN116821921A (en) 2023-09-29

Family

ID=88124518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210511098.1A Pending CN116821921A (en) 2022-03-17 2022-05-11 Method, device, system and storage medium for accessing file

Country Status (1)

Country Link
CN (1) CN116821921A (en)

Similar Documents

Publication Publication Date Title
JP4726545B2 (en) Method, system and apparatus for discovering and connecting data sources
US9251163B2 (en) File sharing system and file sharing method
US11574070B2 (en) Application specific schema extensions for a hierarchical data structure
US8555018B1 (en) Techniques for storing data
US9619503B2 (en) Method, server, and system for accessing metadata
US20090024794A1 (en) Enhanced Access To Data Available In A Cache
US20160267132A1 (en) Abstraction layer between a database query engine and a distributed file system
US20150088854A1 (en) Securing application information in system-wide search engines
US10089371B2 (en) Extensible extract, transform and load (ETL) framework
WO2012114531A1 (en) Computer system and data management method
JP5283478B2 (en) Search system
US8521768B2 (en) Data storage and management system
US20120290592A1 (en) Federated search apparatus, federated search system, and federated search method
US10678784B2 (en) Dynamic column synopsis for analytical databases
US7577663B2 (en) Distributed database systems and methods
CN109144978A (en) Right management method and device
US11741144B2 (en) Direct storage loading for adding data to a database
US9537941B2 (en) Method and system for verifying quality of server
JP7101427B1 (en) Access distribution method
CN116821921A (en) Method, device, system and storage medium for accessing file
WO2023173908A1 (en) Method, apparatus and system for accessing file, and storage medium
US10114864B1 (en) List element query support and processing
CN108256019A (en) Database key generation method, device, equipment and its storage medium
CN114553521A (en) Remote memory access method, device, equipment and medium
JP5783010B2 (en) Index management program, index management device, and search system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication