WO2023173908A1 - Method, apparatus and system for accessing file, and storage medium - Google Patents
Method, apparatus and system for accessing file, and storage medium Download PDFInfo
- Publication number
- WO2023173908A1 WO2023173908A1 PCT/CN2023/070167 CN2023070167W WO2023173908A1 WO 2023173908 A1 WO2023173908 A1 WO 2023173908A1 CN 2023070167 W CN2023070167 W CN 2023070167W WO 2023173908 A1 WO2023173908 A1 WO 2023173908A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- file
- access
- information
- granularity
- user
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
Definitions
- the present application relates to the field of computers, and in particular to a method, device, system and storage medium for accessing files.
- the data lake includes a structured query language (SQL) engine and an object file storage system.
- the object file storage system includes at least one file, each file is used to store data.
- the SQL engine receives a SQL statement from the user, and the SQL statement is used to indicate to the SQL engine the file that the user needs to access.
- the SQL engine accesses the file in the object file storage system based on the SQL statement.
- the files in the object file storage system are structured data files, which store data in list form.
- the object file storage system has a file used to save an employee data table.
- the file includes four columns. The first column is used to store the employee's name, the second column is used to store the employee's address, and the third column is used to store the employee's department. The fourth column is used to store the employee's position. Each row of the file stores the employee's name, address, department, and position.
- the SQL engine provides file-granular access services to users, but the access services provided to users are too single.
- This application provides a method, device, system and storage medium for accessing files to enrich the access services provided to users.
- the technical solutions are as follows:
- embodiments of the present application provide a method for accessing files.
- a data access request is received.
- the data access request includes access requirement information.
- the access requirement information is used to indicate that the first user needs to access.
- the content of the first file is stored in the object file storage system.
- the granularity of accessing the content in the first file is determined to be the second granularity based on the access requirement information
- the first file is accessed based on the specified administrator account information and the access requirement information, and the second granularity is smaller than the first granularity.
- the granularity of accessing the content in the first file is determined based on the access requirement information.
- the determined granularity is the first granularity
- the first file is accessed based on the account information of the first user and the access requirement information.
- the determined granularity is the second granularity
- the first file is accessed based on the specified administrator account information and the access requirement information. In this way, the first-granularity access service and the second-granularity access service can be provided to the user, which enriches the access services provided to the user.
- the first file is accessed based on the first user's account information and the access requirement information. In this way, there is no need to borrow the administrator account information to access the first file, which improves the efficiency. The efficiency of accessing the first file and the performance of reading and writing the first file.
- the first user If the first user is configured with permission to access the content at the second granularity, the first user can not only access the content in the first file, but also access other content in the first file except the content. This gives the first user The permissions configured by the user are automatically expanded to access any content in the first file, which results in too much permission expansion and is not conducive to permission management.
- the first file is accessed based on the specified administrator account information and the access requirement information, so the administrator account information is used to replace the first user's account information, and Using the administrator account information to access the first file eliminates the need to configure permissions for the first user to access the second granularity, thereby avoiding extending the first user's access permissions and facilitating permission management.
- the access requirement information includes the identification information of the first file, and the first granularity is the file granularity; or, the access requirement information includes the identification information of the first file and the identification of the partition in the first file.
- the first granularity is the partition granularity.
- the first granularity may be file granularity or partition granularity, thereby providing users with file-granular access services, or providing partition-granular access services, thereby enriching the provided access services.
- the partition granularity is a newly defined granularity, which means that this application can also provide partition access services.
- file granularity refers to the need to access all contents of the first file.
- Partition granularity refers to the need to access the entire contents of a partition in the first file.
- an authentication request is sent to the file path authentication module.
- the authentication request includes authentication information, and the authentication information is used to indicate the first user, the file path of the first file, and the first file path.
- the user accesses the file path.
- the authentication information is obtained based on the access requirement information and the first user's account information.
- the authentication request is used to trigger the file path authentication module based on the first permission information and the authentication
- the information authenticates the first user's permission to access the file path using the access operation.
- the file path is used to indicate the storage location of the first file.
- the first permission information is used to indicate the identity and access operation of the user who can access the file path. .
- the authentication response includes a temporary credential.
- the temporary credential, the file path, and the operation type of the access operation are correspondingly stored in the object file storage system. Based on the temporary credentials, the access requirement information and the file path, access the first file.
- the temporary credential sent by the file path authentication module after passing the permission authentication is received, the temporary credential is used to access the first file in the object file storage system, thereby improving the security of accessing the first file.
- the first file is a structured data file
- the first file uses a list form to store data
- the access requirement information includes identification information of the first file and first information
- the first information is used to Indicates at least one column of the first file and/or at least one row of the first file
- the second granularity is row and column granularity.
- the first file is a semi-structured data file
- the first file includes at least one data fragment
- the data fragment is used to save data with the same business attributes
- the access requirement information includes the identification information of the first file and the Identification information of one or more data fragments
- the second granularity is the data fragment granularity.
- the second granularity may be row-column granularity or data fragment granularity, thereby providing users with row-column granular access services, or providing data fragment granular access services, thereby enriching the provided access services.
- an access instruction is sent to the data filtering engine.
- the access instruction includes the access requirement information.
- the data filtering engine includes administrator account information. The access instruction is used to trigger the data filtering engine based on the administrator account. Information and the access requirement information, access the first file.
- the access instruction is sent to the data filtering engine, so that the data filtering engine accesses the first file based on the specified administrator account information and the access requirement information.
- the administrator account information is used to replace the account information of the first user and the administrator account information is used to access the first file.
- the first user's permission to access the content is authenticated based on the second permission information, the first user's account information and the access requirement information, and the second permission information is used to indicate that the first user can The identity and access actions of the user who accessed the content.
- the granularity of accessing the content in the first file is determined based on the access requirement information.
- the granularity of accessing the content in the first file is determined, and then different methods are used to access the first file based on different granularities, thereby improving the security of accessing the first file.
- the first permission information is generated based on the second permission information.
- the first permission information is used to indicate the user identity and access operation of the file path that can access the first file.
- the file path is used to indicate the third file path. The storage location of a file. Therefore, the first authority information can be automatically generated, which improves the efficiency of obtaining the first authority information and reduces the cost of obtaining the first authority information.
- this application provides an access system, which includes: a computing engine and an object file storage system.
- the computing engine is configured to receive a data access request, where the data access request includes access requirement information.
- the access requirement information is used to indicate the content of the first file that the first user needs to access.
- the first file is stored in the object file storage system.
- the computing engine is also configured to access the first file based on the account information of the first user and the access requirement information when it is determined that the granularity of accessing the content in the first file is the first granularity based on the access requirement information.
- the computing engine is also configured to access the first file based on the specified administrator account information and the access requirement information, and the second granularity is smaller than First granularity.
- the computing engine determines the granularity of accessing the content in the first file based on the access requirement information.
- the determined granularity is the first granularity
- the first file is accessed based on the account information of the first user and the access requirement information.
- the determined granularity is the second granularity
- the first file is accessed based on the specified administrator account information and the access requirement information. In this way, the first-granularity access service and the second-granularity access service can be provided to the user, which enriches the access services provided to the user.
- the computing engine accesses the first file based on the first user's account information and the access requirement information, so that there is no need to borrow the administrator account information to access the first file. , improve the efficiency of accessing the first file and the performance of reading and writing the first file.
- the first user can not only access the content in the first file, but also access other content in the first file except the content.
- the permissions configured by the user are automatically expanded to access any content in the first file, which results in too much permission expansion and is not conducive to permission management.
- the computing engine accesses the first file based on the specified administrator account information and the access requirement information, so that the administrator account information is used to replace the first user's account information. , and use the administrator account information to access the first file, so that there is no need to configure permissions for the first user to access the second granularity, thereby avoiding extending the first user's access permissions and facilitating permission management.
- the access requirement information includes the identification information of the first file, and the first granularity is the file granularity; or, the access requirement information includes the identification information of the first file and the identification of the partition in the first file.
- the first granularity is the partition granularity.
- the first granularity may be file granularity or partition granularity, thereby providing users with file-granular access services, or providing partition-granular access services, thereby enriching the provided access services.
- the partition granularity is a newly defined granularity, which means that this application can also provide partition access services.
- file granularity refers to the need to access all contents of the first file.
- Partition granularity refers to the need to access the entire contents of a partition in the first file.
- system also includes a file path authentication module
- a computing engine configured to send an authentication request to the file path authentication module, where the authentication request includes authentication information, and the authentication information is used to indicate the first user, the file path of the first file, and the first user's access to the file path.
- the authentication information is obtained based on the access requirement information and the first user's account information, and the file path is used to indicate the storage location of the first file.
- a file path authentication module configured to authenticate the first user's permission to access the file path using the access operation based on the first permission information and the authentication information.
- the first permission information is used to indicate users who can access the file path.
- Identity and access operations after passing the permission authentication, send an authentication response to the computing engine.
- the authentication response includes temporary credentials.
- the object file storage system is used to store the temporary credentials, the file path, and the operation type of the access operation.
- the calculation engine is also used to access the first file based on the temporary credential, the access requirement information and the file path.
- the computing engine Since the computing engine receives the temporary credential sent by the file path authentication module after passing the permission authentication, it uses the temporary credential to access the first file in the object file storage system, thereby improving the security of accessing the first file.
- the first file is a structured data file
- the first file uses a list form to store data
- the access requirement information includes identification information of the first file and first information
- the first information is used to Indicates at least one column of the first file and/or at least one row of the first file
- the second granularity is row and column granularity.
- the first file is a semi-structured data file
- the first file includes at least one data fragment
- the data fragment is used to save data with the same business attributes
- the access requirement information includes the identification information of the first file and the Identification information of one or more data fragments
- the second granularity is the data fragment granularity.
- the second granularity may be row-column granularity or data fragment granularity, thereby providing users with row-column granular access services, or providing data fragment granular access services, thereby enriching the provided access services.
- system further includes a data filtering engine, where the data filtering engine includes administrator account information.
- the computing engine is configured to send an access instruction to the data filtering engine.
- the access instruction includes the file path of the first file and the access requirement information.
- the file path is used to indicate the storage location of the first file.
- a data filtering engine is used to access the first file based on the administrator account information, the file path and the access requirement information.
- the computing engine sends an access instruction to the data filtering engine, and the data filtering engine accesses the first file based on the specified administrator account information and the access requirement information.
- the administrator account information is used to replace the account information of the first user and the administrator account information is used to access the first file.
- the computing engine is also used to authenticate the first user's permission to access the content based on the second permission information, the first user's account information, and the access requirement information.
- the second permission Information used to indicate the identity and access actions of users who have access to the content.
- the granularity of accessing the content in the first file is determined based on the access requirement information.
- the computing engine determines the granularity of accessing the content in the first file, and then accesses the first file in different ways based on different granularities, thereby improving the security of accessing the first file. sex.
- system also includes a linkage authority module
- a linkage permission module configured to generate first permission information based on the second permission information.
- the first permission information is used to indicate the user identity and access operation of the file path that can access the first file.
- the file path is used to indicate the storage of the first file. Location. Therefore, the first authority information can be automatically generated, which improves the efficiency of obtaining the first authority information and reduces the cost of obtaining the first authority information.
- this application provides a device for accessing a file, for performing the method in the first aspect or any possible implementation of the first aspect.
- the apparatus includes a unit for performing the method in the first aspect or any possible implementation of the first aspect.
- the present application provides a device for accessing files, where the device includes a processor and a memory.
- the processor and the memory may be connected through internal connections.
- the memory is used to store programs, and the processor is used to execute the programs in the memory, so that the device completes the method in the first aspect or any possible implementation of the first aspect.
- the present application provides a computer program product.
- the computer program product includes a computer program stored in a computer-readable storage medium, and the computing program is loaded by a processor to implement the first aspect or the third aspect.
- any possible implementation method any possible implementation method.
- the present application provides a computer-readable storage medium for storing a computer program, which is loaded by a processor to execute the method of the above-mentioned first aspect or any possible implementation of the first aspect.
- this application provides a chip, including a memory and a processor.
- the memory is used to store computer instructions.
- the processor is used to call and run the computer instructions from the memory to execute the first aspect or any possible method of the first aspect. Ways to implement it.
- Figure 1 is a schematic structural diagram of an access system provided by an embodiment of the present application.
- Figure 2 is a schematic diagram of a file provided by an embodiment of the present application.
- FIG. 3 is a schematic structural diagram of another access system provided by an embodiment of the present application.
- Figure 4 is a flow chart of a method for accessing files provided by an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of another access system according to the embodiment of the present application.
- FIG. 6 is a schematic structural diagram of another access system according to an embodiment of the present application.
- Figure 7 is a flow chart of a method for obtaining first permission information provided by an embodiment of the present application.
- Figure 8 is a schematic structural diagram of a device for accessing files provided by an embodiment of the present application.
- Figure 9 is a schematic structural diagram of another device for accessing files provided by an embodiment of the present application.
- an embodiment of the present application provides an access system 100.
- the access system 100 includes a computing engine 101 and an object file storage system 102.
- the computing engine 101 communicates with the object file storage system 102.
- the access system 100 is a database system with separate storage and calculation, in which the object file storage system 102 is responsible for data storage, and the computing engine 101 is responsible for data calculation.
- the access system 100 is applied in scenarios such as data lakes where storage and computing are separated, and in the field of big data processing.
- the object file storage system 102 is used to store at least one file, and for any file saved in the object file storage system 102, the file is used to store data.
- the file may be a structured data file.
- the structured data file uses a list form to store data, so the structured data file is a data table. For any column in the file, the column holds data with the same business attributes.
- the file is essentially a data table, and the identification information of the file is the identification information of the data table.
- the identification information of the file is the file name of the file, that is, the identification information of the file is the table name of the data table.
- the file shown in Table 1 below is a structured data file.
- the file is a data table.
- the data table includes five columns of data.
- the data table is used to store company information.
- the table name of the data table is "Company information"
- the file name of this file is also "Company information”.
- This file is a company information table, and the table name and file name are the same.
- each column of this file is used to save data with the same business attributes.
- the data stored in the first column are all row numbers, that is, the business attributes of each data stored in the first column are row numbers.
- the data stored in the second column are all company names, that is, the business attributes of each data stored in the second column are company names.
- the data stored in the third column are all industry names, that is, the business attributes of each data stored in the third column are industry names.
- the data stored in the fourth column are all cities, that is, the business attributes of each data stored in the fourth column are city names.
- the data stored in the fifth column are all countries, that is, the business attributes of each data stored in the fifth column are country names.
- Table 1 Company information (company information)
- the file is a semi-structured data file, and the file includes at least one data fragment.
- the data fragment is used to save data with the same business attributes.
- the semi-structured data file includes four data fragments, namely a first data fragment, a second data fragment, a third data fragment and a fourth data fragment.
- the data saved in the first data fragment are all company names.
- the data saved in the first data fragment include "Company 1", “Company 2", “Company 3", “Company 4", “Company 5" and “Company 6" , that is, the business attribute of each data stored in the first data fragment is the company name.
- the data saved in the second data fragment are all industry names.
- the data saved in the second data fragment include “Internet”, “Internet”, “Communication”, “Logistics”, “Communication” and “Logistics”, that is, the second data fragment
- the business attribute of each data stored is the industry name.
- the data saved in the third data fragment are all cities.
- the data saved in the third data fragment include “City 1", “City 1”, “City 2”, “City 2", “City 1” and “City 3”. That is, the business attributes of each data stored in the third data fragment are city names.
- the data stored in the fourth data fragment are all countries.
- the data stored in the fourth data fragment include "Country 1", “Country 2", “Country 1", “Country 1", “Country 1” and “Country 3". That is, the business attributes of each data stored in the fourth data fragment are country names.
- the semi-structured data file is an extensible markup language (XML) file, etc.
- the tag blocks in the XML file are data fragments.
- the file may also include at least one partition.
- the file is stored in the object file storage system 102, and the file path of the file is used to indicate the storage location of the file in the object file storage system 102.
- the file path of the file shown in Table 1 is "C: ⁇ windows ⁇ system32 ⁇ Company information”. This file path is used to indicate the storage location of the file shown in Table 1 in the object file storage system 102.
- a first user has a need to access content in a file, which is a file stored in the object file storage system 102 .
- the granularity with which the first user accesses the content in the file may be the first granularity or the second granularity, and the second granularity is smaller than the first granularity.
- the first granularity may be a file granularity, that is, the first user needs to access the entire content of the file; or the first granularity may be a partition granularity, that is, the first user needs to access a partition of the file.
- File granularity refers to the need to access the entire contents of the first file.
- Partition granularity refers to the need to access the entire contents of a partition in the first file.
- the file is a structured data file
- the second granularity is row-column granularity, that is, the first user needs to access at least one row and/or at least one column of the file.
- the file is a semi-structured data file
- the second granularity is data fragment granularity, that is, the first user needs to access the data fragments in the file.
- the first user is a user with data access requirements, and can also be called a business user.
- the first user may be an application or the like.
- the first user When the first user needs to access data, he sends a data access request to the computing engine 101.
- the data access request includes access requirement information.
- the access requirement information is used to indicate the content of the first file that the first user needs to access.
- the first file is a file stored in the object file storage system 102.
- the computing engine 101 is configured to receive the data access request, and determine the granularity of accessing the content in the first file based on the access requirement information.
- the determined granularity is the first granularity, based on the account information of the first user and the access requirement Information, access the first file.
- the determined granularity is the second granularity, the first file is accessed based on the specified administrator account information and the access requirement information, and the second granularity is smaller than the first granularity.
- the computing engine 101 includes an interface that the first user can call to send the data access request to the computing engine 101 through the interface.
- the interface includes a Java database connectivity (JDBC) interface or an open database connectivity (ODBC) interface.
- the first file is a structured data file
- the access requirement information includes identification information of the first file
- the access requirement information does not include identification information and first information of the partition in the first file
- the first The information is used to indicate at least one column and/or at least one row in the first file.
- the granularity determined by the computing engine 101 based on the access requirement information is file granularity.
- the first file is a semi-structured data file
- the access requirement information includes the identification information of the first file
- the access requirement information does not include the identification information of the partition in the first file and the identification information of the partition in the first file.
- the granularity determined by the computing engine 101 based on the access requirement information is file granularity.
- the first file is a structured data file
- the access requirement information includes identification information of the first file and identification information of partitions in the first file
- the access requirement information does not include the first information.
- the granularity determined by the computing engine 101 based on the access requirement information is partition granularity.
- the first file is a semi-structured data file
- the access requirement information includes identification information of the first file and identification information of partitions in the first file
- the access requirement information does not include identification information of the first file.
- the granularity determined by the computing engine 101 based on the access requirement information is partition granularity.
- the first file is a structured data file
- the access requirement information includes identification information of the first file and first information
- the first information is used to indicate at least one column and/or at least one row in the first file.
- the granularity determined by the computing engine 101 based on the access requirement information is row-column granularity.
- the first file is a semi-structured data file
- the access requirement information includes identification information of the first file and identification information of the data fragments in the first file.
- the granularity determined by the computing engine 101 based on the access requirement information is the data fragment granularity.
- the computing engine 101 includes a computing module 1011 and a routing module 1012.
- the computing module 1011 in the computing engine 101 receives the data access request, and the routing module 1012 in the computing engine 101 is based on the access request.
- Requirement information determines the granularity of accessing the content in the first file.
- the access requirement information further includes a first operation type, and the first operation type is used to indicate a first access operation for the first user to access the first file.
- the first access operation includes query, update, insert or delete, etc.
- the operation of the computing engine 101 to access the first file may be: query the content in the first file, and return the queried content to the first user.
- the access requirement information includes content to be updated, and the operation of the computing engine 101 to access the first file may be: updating all or part of the content in the first file to the content to be updated.
- the access requirement information includes content to be inserted, and the operation of the computing engine 101 to access the first file may be: inserting the content to be inserted into the first file.
- the operation of the computing engine 101 to access the first file may be: deleting all or part of the content in the first file, etc.
- the data access request also includes the first user's account information.
- the data access request may not include the first user's account information.
- the communication connection between the computing engine 101 and the first user is bound to the first user's account information, and the computing engine 101 obtains the first user's account information bound to the communication connection.
- the communication connection is a session between the first user and the computing engine 101.
- the access system 100 includes one or more computing engines 101 .
- the computing engine 101 is a Hive engine or a Spark engine, that is, the access system 100 includes one or more Hive engines, and/or, one or more Spark engines, etc.
- the Hive engine is a data warehouse tool based on Hadoop (a distributed system infrastructure), which can map structured data files into a table and provide query functions.
- Hadoop a distributed system infrastructure
- the Spark engine is a fast and general computing engine designed for large-scale data processing.
- the access system 100 also includes a file path authentication module 103.
- the file path authentication module 103 communicates with the computing engine 101 and the object file storage system 102 respectively.
- the computing engine 101 is configured to send an authentication request to the file path authentication module 103 when the determined granularity is the first granularity.
- the authentication request includes authentication information, and the authentication information is used to indicate the first user, the first The file path of the file and the second access operation of the first user to access the file path, the authentication information is obtained based on the access requirement information and the first user's account information;
- the file path authentication module 103 is configured to receive the authentication request, and authenticate the first user's permission to access the file path using the second access operation based on the first permission information and the authentication information.
- the first permission information is used to Indicates the identity of the user who can access the file path and the third access operation that can access the file path.
- an authentication response is sent to the computing engine 101.
- the authentication response includes a temporary credential and a request to the object file.
- the storage system 102 sends storage information, which includes the temporary credential, the file path, and a second operation type.
- the second operation type is the operation type of the second access operation;
- the object file storage system 102 is used to receive the storage information and correspondingly save the temporary voucher, the file path and the second operation type;
- the computing engine 101 is also used to access the first file based on the temporary credential, the access requirement information and the file path.
- the second access operation is an operation obtained by mapping the first access operation, and the second access operation is an operation capable of accessing the object file storage system 102 .
- the second access operation includes a read operation and/or a write operation.
- the first access operation is a query
- the second access operation mapped to the query operation is a read operation.
- the first file is read from the object file storage system 102, and the content that needs to be queried is obtained from the read first file.
- the first access operation is an update
- the second access operation mapped to the update operation includes a read operation and a write operation.
- the first file is read from the object file storage system 102
- the part of the content in the first file is updated to the content to be updated
- the updated third file is updated to the content to be updated.
- a file is written to the object file storage system 102 to overwrite the first file saved in the object file storage system 102 .
- the authentication information includes the user identity of the first user, the file path of the first file, and the second operation type.
- the user identity of the first user is obtained by the computing engine 101 based on the first user's account information
- the second operation type is obtained by mapping the first operation type
- the file path of the first file is obtained by the computing engine 101 based on the first operation type.
- the identification information of the file is obtained.
- the authentication information includes account information of the first user, identification information of the first file, and the first operation type.
- the user identity of the first user includes a user group to which the first user belongs and/or a role of the first user, etc.
- the first permission information includes the file path, a user identity that can access the file path, and a third operation type
- the third operation type is a type of a third access operation that can access the file path.
- the access system 100 also includes a linkage authority module 104, which communicates with the computing engine 101 and the file path authentication module 103 respectively.
- the linkage authority module 104 stores the above-mentioned first authority information.
- the file path authentication module 103 After receiving the authentication request, the file path authentication module 103 obtains the file path of the first file, the user identity of the first user and the second operation type of the second access operation based on the authentication information included in the authentication request.
- the permission module 104 obtains the first permission information including the file path. If the user identity of the first user is the same as the user identity included in the first permission information and the second operation type of the second access operation is the same as the third operation type of the third access operation included in the first permission information, then the permission is authenticated. Passed permission means that the first user has permission to use the second access operation to access the file path.
- the linkage permission module 104 includes a first read-write interface.
- the file path authentication module 103 calls the first read-write interface of the linkage permission module 104, and obtains the information including the linkage permission module 104 from the linkage permission module 104 through the first read-write interface.
- the first permission information of the file path is not limited to the file path.
- the authentication information includes the user identity of the first user, the file path of the first file, and the second operation type.
- the file path authentication module 103 directly obtains the file path of the first file from the authentication information. , the user identity of the first user and the second operation type of the second access operation.
- the authentication information includes the first user's account information, the first file's identification information, and the first operation type.
- the file path authentication module 103 obtains the first user's user identity based on the first user's account information. , map the first operation type to obtain the second operation type, and obtain the file path of the first file based on the identification information of the first file.
- the access system 100 also includes a data filtering engine 105, which includes specified administrator account information; the data filtering engine 105 communicates with the computing engine 101 and the object file storage system 102 respectively.
- the computing engine 101 is configured to send an access instruction to the data filtering engine 105 when the determined granularity is the second granularity, where the access instruction includes the file path of the first file and the access requirement information;
- the data filtering engine 105 is used to access the first file based on the administrator account information, the file path and the access requirement information.
- the data filtering engine 105 also communicates with the linkage authority module 104.
- the computing module 1011 of the computing engine 101 allows the first user to access the content based on the second permission information, the first user's account information and the access requirement information. Authentication of permissions.
- the second permission information is used to indicate the user identity and the fourth access operation that can access the content.
- the routing module 1012 of the computing engine 101 After the authentication is passed, if the granularity determined by the routing module 1012 of the computing engine 101 is the first granularity, the routing module 1012 of the computing engine 101 sends an authentication request to the file path authentication module 103 . If the granularity determined by the routing module 1012 of the computing engine 101 is the second granularity, the routing module 1012 of the computing engine 101 sends an access instruction to the data filtering engine 105 .
- the second permission information includes content identification information of the content, a user identity that can access the content, and a fourth operation type of a fourth access operation that can access the content.
- the linkage permission module 104 stores the second permission information
- the linkage permission module 104 includes a second read-write interface.
- the computing module 1011 of the computing engine 101 obtains the content identification information of the content based on the access requirement information, and obtains the user identity of the first user based on the first user's account information.
- the second read-write interface in the linkage permission module 104 is called, and the second permission information including the content identification information is obtained from the linkage permission module 104 through the second read-write interface.
- the first user If the user identity of the first user is the same as the user identity included in the second permission information and the first operation type of the first access operation is the same as the fourth operation type of the fourth access operation included in the second permission information, then the first user If the authorization to access the content is passed, it means that the first user has the authorization to access the content.
- the content identification information of the content is part of the access requirement information.
- the content identification information of the content when the content is the entire content of the first file, includes identification information of the first file.
- the content identification information of the content includes identification information of the first file and identification information of the partition.
- the content identification information of the content when the content is at least one column or at least one row in the first file, includes the identification information of the first file and the column identification of the at least one column, or the content identification information of the content includes the first file The identification information and the line number of at least one line.
- the content is at least one data fragment in the first file, and the content identification information of the content includes identification information of the first file and identification information of each data fragment in the at least one data fragment.
- the computing module 1011 of the computing engine 101 receives After the data access request, the authentication operation may be performed first, and then the routing module 102 of the computing engine 101 may perform the determination operation. That is, the computing module 1011 of the computing engine 101 may first authenticate the first user's permission to access the content. . After the authentication is passed, the routing module 1012 of the computing engine 101 determines the granularity of accessing the content in the first file based on the access requirement information. or,
- the routing module 1012 of the computing engine 101 may first perform the determination operation, and then the computing module 1011 of the computing engine 101 may perform the authentication operation. That is, the routing module 1012 of the computing engine 101 may first determine the granularity of accessing the content in the first file based on the access requirement information, and then the computing module 1011 of the computing engine 101 authenticates the first user's permission to access the content. or,
- the computing module 1011 of the computing engine 101 After the computing module 1011 of the computing engine 101 receives the data access request, the computing module 1011 of the computing engine 101 performs the authentication operation, and at the same time, the routing module 1012 of the computing engine 101 performs the determination operation, that is, the authentication operation and the determination operation. executed simultaneously.
- the access system 100 also includes an identity authentication center 106, which is used to save the corresponding relationship between the user's account information and the user's identity.
- the operation of the computing module 101 of the computing engine 101 to obtain the user identity of the first user is: the computing module 101 of the computing engine 101 queries the first user's identity from the identity authentication center 106 based on the first user's account information. User ID.
- the file path authentication module 103 obtains the user identity of the first user as follows: the file path authentication module 103 queries the first user's user identity from the identity authentication center 106 based on the first user's account information. .
- the access system 100 further includes a metadata center 107 , which is configured to receive and save the metadata of the first file input by the second user, where the metadata includes the metadata of the first file.
- Identification information the type of operation that needs to be performed on the first file and the file path of the first file.
- the operation type may be creating the first file, deleting the first file, querying the first file, or modifying the first file, etc.
- the first file is a structured data file
- the metadata of the first file also includes one or more of the following: a column identifier of each column in the first file, a column type of each column in the first file , the row separator of the first file, or the column separator of the first file, etc.
- the first file is a semi-structured data file
- the metadata of the first file also includes one or more of the following: identification information of each data fragment in the first file, each The type of the data fragment, or the line delimiter of the first file, etc.
- the line delimiter is used to distinguish each line of data in any data fragment in the first file.
- the operation of the computing engine 101 to obtain the file path of the first file is: the computing engine 101 obtains metadata including the identification information of the first file from the metadata center 107, and obtains the first file from the metadata. file path.
- the operation of the file path authentication module 103 to obtain the file path of the first file is: the file path authentication module 103 obtains metadata including the identification information of the first file from the metadata center 107, and obtains the metadata from the metadata center 107. Get the file path of the first file in the data.
- the metadata center 107 displays a first interface to the second user, in which the second user can input the metadata of the first file, and receives the metadata of the first file input by the second user through the first interface.
- the first interface includes network product interface design (website user interface, Web UI), etc.
- the metadata center 107 when receiving the metadata of the first file, the metadata center 107 also obtains the account information of the second user, and verifies the metadata of the first file based on the account information of the second user.
- the metadata center 107 verifies the legitimacy of the second user based on the second user's account information. When verifying that the second user is legitimate, the user identity of the second user is obtained, and the operation type that the second user can operate is obtained based on the user identity of the second user. If the metadata includes an operation type that needs to operate on the first file, The operation type that the second user can operate is to pass the verification of the metadata of the first file, and then save the metadata of the first file.
- the operation of the metadata center 107 to verify the legitimacy of the second user and obtain the user identity of the second user is:
- the identity authentication center 106 stores the corresponding relationship between the account information and the user's identity.
- the metadata center 107 queries the identity authentication center 106 to see whether the second user's account information is stored in the identity authentication center 106. If the identity authentication center 106 stores the second user's account information, Verify that the second user is a legitimate user. Query the user identity of the second user from the identity authentication center 106 based on the second user's account information.
- the metadata center 10 obtains the operation type that the second user can operate as:
- the metadata center 107 stores the corresponding relationship between the user identity and the operation type. Based on the user identity of the second user, the metadata center 107 obtains the corresponding operation type from the corresponding relationship between the user identity and the operation type as the operation type that the second user can operate. Operation type.
- the operation of the metadata center 107 to save the metadata of the first file is: the metadata center 107 queries whether the metadata including the identification information of the first file has been saved, and if the metadata has been saved, it will be saved. The metadata is updated to the metadata of the first file. If the metadata is not saved, the metadata of the first file is saved directly.
- Metadata center 107 includes designated administrator account information. After the metadata of the first file is verified, if the operation type included in the metadata is to create the first file, the metadata center 107 creates the first file in the object file storage system 102 based on the specified administrator account information. The file path, the storage location corresponding to the file path is used to save the first file. If the operation type included in the metadata is to delete the first file, the metadata center 107 determines the first file in the object file storage system 102 based on the specified administrator account information and the file path of the first file, and deletes the determined file. First document.
- the metadata center 107 determines the first file in the object file storage system 102 based on the specified administrator account information and the file path of the first file, and obtains the first file. description information and/or attribute information, etc., and return the obtained content to the second user. If the operation type included in the metadata is to modify the first file, the metadata center 107 determines the first file in the object file storage system 102 based on the specified administrator account information and the file path of the first file, and modifies the first file. description information and/or attribute information.
- the linkage permission module 104 is also configured to receive second permission information configured by the permission administrator.
- the second permission information is used to indicate the identity of the user who can access the content in the first file and the fourth access operation.
- the first permission information is generated based on the second permission information, and the first permission information is used to indicate a user identity that can access the file path of the first file and a third access operation that can access the file path of the first file. Save the second authority information and the first authority information.
- the linkage permission module 104 is also used to obtain the metadata report of the first file from the metadata center 107, obtain at least one user identity from the identity authentication center 106, and display the second interface to the permission administrator.
- the second interface includes metadata of the first file and the at least one user identity.
- the rights administrator selects the content identification information of the content in the first file from the metadata of the first file, selects a user identity that can access the content from the at least one user identity, and inputs the user identity that can access the content into the second interface.
- the fourth operation type of the fourth access operation of the content is used to obtain the second permission information.
- the second permission information includes content identification information of the content, the selected user identity and the input fourth operation type.
- the second interface includes Web UI, etc.
- the first file is a structured data file
- the metadata of the first file includes a file identifier of the first file and a column identifier of each column of the first file
- the content identification information of the content includes a file identifier of the first file.
- File identification, or the content identification information of the content includes the file identification of the first file and the column identification of at least one column in the first file
- the content identification information of the content includes the file identification of the first file and the column identification of at least one column in the first file.
- the first file is a semi-structured data file
- the metadata of the first file includes a file identification of the first file and identification information of each data fragment of the first file
- the content identification information of the content includes the third file.
- the file identification of a file, or the content identification information includes the file identification of the first file and the identification information of at least one data fragment in the first file.
- the operation of the linkage permission module 104 to generate the first permission information is:
- the linkage permission module 104 obtains the file path of the first file based on the content identification information of the content in the second permission information.
- the content identification information of the content includes identification information of the first file, and metadata including the identification information of the first file is obtained from the metadata center 107.
- the metadata is metadata of the first file, from Obtain the file path of the first file from the metadata of the first file.
- the linkage authority module 104 maps the fourth operation type included in the second authority information to obtain the third operation type.
- the linkage permission module 104 reads the user identity from the second permission information, and combines the file path of the first file, the user identity and the third operation type into the first permission information.
- the computing engine receives the data access request, and determines the granularity of accessing the content in the first file based on the access requirement information in the data access request.
- the determined granularity is the first granularity, based on the first user's Account information and the access requirement information, access the first file.
- the determined granularity is the second granularity, the first file is accessed based on the specified administrator account information and the access requirement information. In this way, the first-granularity access service and the second-granularity access service can be provided to the user, which enriches the access services provided to the user.
- the computing engine accesses the first file based on the first user's account information and the access requirement information, so that there is no need to borrow the administrator account information to access the first file. , improve the efficiency of accessing the first file and the performance of reading and writing the first file.
- the first user can also access other content in the first file except the content.
- the user's permissions are automatically expanded to access any content in the first file, which results in too much permission expansion and is not conducive to permission management.
- the computing engine accesses the first file based on the specified administrator account information and the access requirement information, so that the administrator account information is used to replace the first user's account information. , and use the administrator account information to access the first file, so that there is no need to configure the first user with permissions to access the second granularity, thereby avoiding expanding the first user's access permissions and facilitating permission management.
- this embodiment of the present application provides a method 400 for accessing files.
- the method 400 is applied to the access system 100 shown in Figure 1 or Figure 3.
- the method 400 includes the following steps 401 to 410.
- Step 401 The computing engine receives a data access request.
- the data access request includes access requirement information.
- the access requirement information is used to indicate the content of the first file that the first user needs to access.
- the first file is stored in the object file storage system.
- the first user is a business user who performs data access services, and the first user sends a data access request to the computing engine.
- the access requirement information is an access statement for accessing the database, for example, the access requirement information is a SQL statement, etc.
- the access requirement information includes content identification information of the content and a first operation type, and the first operation type is used to indicate a first access operation for accessing the content in the first file.
- the first access operation is querying the first file, updating the first file, or deleting the first file, etc.
- the access requirement information when the first access operation indicated by the first operation type is to update the first file, the access requirement information also includes content to be updated.
- the data access request may also include the first user's account information.
- the access requirement information may include the following types of information. The following types of access requirement information will be described respectively.
- the access requirement information includes identification information of the first file and the first operation type.
- the access requirement information does not include identification information of the partition of the first file.
- the access requirement information does not include the first information, and the first information is used to indicate at least one column and/or at least one row in the first file.
- the access requirement information does not include identification information of partitions in the first file.
- the content identification information of the content is the identification information of the first file.
- the content is the entire content of the first file, indicating that the first user needs to access all the content of the first file, and the granularity of the first user's access to the content in the first file is file granularity.
- the access requirement information is: Select*From Company information.
- the access requirement information includes the identification information "Company information" of the first file as shown in Table 1 and the first operation type "Select".
- the first operation type "Select" is to query the first file.
- Type 2 the access requirement information includes identification information of the first file, identification information of the partition in the first file, and the first operation type.
- the access requirement information does not include the first information, and the first information is used to indicate at least one column and/or at least one row in the first file.
- the access requirement information does not include identification information of partitions in the first file.
- the content identification information of the content includes identification information of the first file and identification information of the partition in the first file.
- the content is the partition of the first file, indicating that the first user needs to access the partition of the first file, and the granularity of the first user accessing the content in the first file is the partition granularity.
- the first file is a structured data file
- the access requirement information includes the identification information of the first file, the first information and the first operation type
- the first information is used to indicate at least one column in the first file and/or at least One line.
- the content is the at least one column or the at least one row of the first file, indicating that the first user needs to access the at least one column or the at least one row of the first file, and the first user accesses the at least one column or the at least one row of the first file.
- the granularity of this content is row-column granularity.
- the first information includes a column identifier of the at least one column in the first file
- the content is the at least one column in the first file, indicating that the first user needs to access the at least one column of the first file.
- the content identification information of the content includes identification information of the first file and a column identification of the at least one column in the first file.
- the access requirement information is: Select Name, City From Company information.
- the access requirement information includes the identification information "Company information" of the first file as shown in Table 1, the column identification "Name” of the second column of the first file, the column identification "City” of the fourth column of the first file, and The first operation type "Select" is to query the first file.
- the first information includes a column identifier of at least one column in the first file and row filtering information corresponding to each column in the at least one column, and the content is at least one row in the first file, indicating that the first user needs Access at least one line of the first file.
- the content identification information of the content includes identification information of the first file and a column identification of the at least one column in the first file.
- one or more rows whose content of the column is the row of filter information can be located from the first file.
- the content is the content of the located row or rows.
- the first information can be obtained from the information shown in Table 1 based on the first information.
- the City in the fourth column is located as the first, second and fifth rows of "City 1".
- the first information includes a line number of at least one line in the first file
- the content is the at least one line in the first file, indicating that the first user needs to access the at least one line in the first file.
- the content identification information of the content includes identification information of the first file and a line number of the at least one line in the first file.
- the first file is a semi-structured data file
- the access requirement information includes identification information of the first file, identification information of at least one data fragment in the first file, and the first operation type.
- the content identification information of the content includes identification information of the first file and identification information of the at least one data fragment in the first file.
- the content is the at least one data fragment of the first file, indicating that the first user needs to access the at least one data fragment of the first file, and the granularity of the first user's access to the content in the first file. is the data fragment granularity.
- the data access request is a remote procedure call (RPC) request, and the RPC request includes type 4 access requirement information.
- RPC remote procedure call
- the data access information in the RPC request includes the identification information "Company information" of the first file, the identification information "Name” of the first data fragment of the first file, and the third data of the first file as shown in Figure 2
- the identification information of the fragment "Country” and the first operation type "query the first file”.
- the access requirement information at least includes the identification information of the first file and the first operation type, and may also include the first information, the identification information of the partition in the first file, or the identification information of the data fragment in the first file. wait.
- Step 402 The computing engine authenticates the first user's permission to access the content based on the second permission information, the first user's account information and the access requirement information, and then authenticates the first user's permission to access the content. Afterwards, step 403 is executed.
- the second permission information is used to indicate the user identity that can access the content and the fourth access operation that can access the content.
- the computing engine is a Hive engine
- the Hive engine authenticates the first user's permission to access the content through the following operations 4021 to 4023.
- the Hive engine determines whether the content exists in the object file storage system based on the content identification information. If the content exists in the object file storage system, perform the following operation 4022.
- the content identification information includes the identification information of the first file.
- the Hive engine obtains the metadata of the first file from the metadata center based on the identification information of the first file.
- the metadata of the first file includes the metadata of the first file. Identification information.
- the operation type included in the metadata of the first file is to delete the first file, which means that the metadata center has deleted the first file in the object file storage system.
- the Hive engine determines that the object file storage system does not exist based on the metadata of the first file. the content.
- the operation type included in the metadata of the first file is creating the first file, modifying the first file, or querying the first file, which means that the object file storage system stores the first file.
- the Hive engine determines whether the content exists in the object file storage system based on the metadata of the first file and the content identification information.
- the operation of the Hive engine to obtain the metadata of the first file is:
- the Hive engine sends a first acquisition command to the metadata center, where the first acquisition command includes identification information of the first file.
- the metadata center receives the first acquisition command, acquires metadata including the identification information of the first file from the saved metadata, and the acquired metadata is the metadata of the first file, and sends a first acquisition response to the Hive engine.
- the get response includes metadata for the first file. or,
- the Hive engine sends the first fetch command to the metadata center.
- the metadata center receives the first acquisition command, acquires each saved metadata, and sends a first acquisition response to the Hive engine, where the first acquisition response includes each metadata.
- the Hive engine receives the first acquisition response, and acquires metadata including the identification information of the first file from each metadata, and the acquired metadata is the metadata of the first file.
- the Hive engine based on the metadata of the first file and the content identification information
- the content identification information includes the identification information of the first file and the identification information of the partition in the first file, and the metadata of the first file also includes the identification information of the partition in the first file, it is determined that the object file storage system exists the content.
- the metadata in the first file does not include the identification information of the partition in the first file, and it is determined that the content does not exist in the object file storage system.
- the content identification information includes the identification information of the first file and the column identification of at least one column in the first file, and the metadata of the first file also includes the column identification of the at least one column, it is determined that the content exists in the object file storage system. If the metadata of the first file does not include the column identifier of the at least one column, it is determined that the content does not exist in the object file storage system.
- the content identification information includes the identification information of the first file and the identification information of the data fragment in the first file, and the metadata of the first file also includes the identification information of the data fragment, it is determined that the content exists in the object file storage system.
- the metadata of the first file does not include the identification information of the data fragment, and it is determined that the content does not exist in the object file storage system.
- the content identification information includes the identification information of the first file and the line number of at least one line in the first file, when it is determined that the first file is stored in the object file storage system, the content can be considered to exist in the object file storage system.
- the metadata of the first file includes the file path of the first file, so the calculation engine reads the file path of the first file from the metadata of the first file.
- the operation of 4021 is an optional operation, that is to say, you can directly perform the following operation of 4022 without performing the operation of 4021. Alternatively, you can also perform the operation 4021, and then perform the following operation 4022.
- the Hive engine obtains the second permission information including the content identification information of the content from the linkage permission module.
- the second permission information includes the content identification information of the content, the identity of the user who can access the content, and the fourth user who can access the content.
- the fourth operation type of access operation is the first operation type of access operation.
- the linkage permission module includes a second read-write interface.
- the Hive engine reads each second permission information saved in the linkage permission module through the second read-write interface. Based on the content identification information of the content, from each Obtain the second permission information corresponding to the content from the second permission information.
- the Hive engine sends a second acquisition command to the linkage permission module, and the second acquisition command includes the content identification information of the content.
- the linkage permission module receives the second acquisition command, obtains the second permission information corresponding to the content from each saved second permission information based on the content identification information, and sends a second acquisition response to the Hive engine.
- the second acquisition response includes acquisition second permission information.
- the obtained second permission information is the second permission information including the identification information of the first file.
- the obtained second permission information includes the identification information of the first file. Second authority information.
- the obtained second permission information includes the identification information of the first file and the column identification of the first file. At least one column identifies the secondary permission information.
- the obtained second permission information includes the identification information of the first file and the identification information of the first file.
- the second permission information of the partition identification information is
- the obtained second permission information includes the identification information of the first file and The second permission information of this data fragment.
- the Hive engine authenticates the first user's permission to access the content based on the second permission information, the first user's account information and the first operation type.
- the Hive engine determines the user identity of the first user based on the account information of the first user, compares the user identity of the first user with the user identity included in the second permission information, and compares the first operation type with the user identity included in the second permission information.
- the fourth operation type If the user identity of the first user is compared with the user identity included in the second permission information, and the first operation type is compared with the fourth operation type included in the second permission information, then the permission for the first user to access the content is Authentication passed.
- the first user accesses the Content permission authentication failed.
- the identity authentication center stores the corresponding relationship between the user's account information and the user's identity.
- the Hive engine queries the first user's user identity from the identity authentication center based on the first user's account information.
- the computing engine is a Spark engine
- the Spark engine authenticates the first user's permission to access the content through the following operations 4121 to 4125.
- the Spark engine sends the access requirement information to the metadata center.
- the Spark engine also sends the first user's account information to the metadata center.
- the metadata center determines whether the content exists in the object file storage system based on the content identification information. If the content exists in the object file storage system, perform the following operations 4123. .
- the Spark engine also sends the first user's account information to the metadata center, when the metadata center determines that the content exists in the object file storage system, it obtains the second permission information including the content identification information of the content from the linkage permission module. Based on the second permission information, the first user's account information and the first operation type, the first user's permission to access the content is authenticated. After the first user's permission to access the content is authenticated, the following 4123 is performed. operation.
- the metadata center sends confirmation information to the Spark engine.
- a denial message is sent to the Spark engine.
- a denial message is sent to the Spark engine.
- the Spark engine receives the confirmation information and obtains the second permission information including the content identification information of the content from the linkage permission module.
- the second permission information includes the content identification information of the content, the identity of the user who can access the content, and the identity of the user who can access the content.
- the Spark engine authenticates the first user's permission to access the content based on the second permission information, the first user's account information and the first operation type.
- the above-mentioned operations 4121-4123 are optional operations, that is, you do not need to perform the above-mentioned operations 4121-4123.
- Spark directly performs the operations 4124-4125, that is, the Spark engine obtains the content identification information including the content from the linkage permission module.
- the second permission information is used to authenticate the first user's permission to access the content based on the second permission information, the first user's account information and the first operation type.
- Step 403 The computing engine determines the granularity of accessing the content of the first file based on the access requirement information. If the determined granularity is the first granularity, step 404 is executed. If the determined granularity is the second granularity, step 409 is executed.
- step 403 the computing engine determines the granularity of accessing the content of the first file based on the content identification information of the content included in the access requirement information.
- the access requirement information includes identification information of the first file. However, the access requirement information does not include identification information of the partition of the first file. Moreover, when the first file is a structured data file, the access requirement information does not include the first information. When the first file is a semi-structured data file, the access requirement information does not include identification information of the data fragments in the first file. At this time, the content identification information of the content is the identification information of the first file, and the computing engine determines that the granularity of accessing the content of the first file is file granularity.
- the access requirement information includes identification information of the first file and identification information of the partition in the first file. Moreover, when the first file is a structured data file, the access requirement information does not include the first information. When the first file is a semi-structured data file, the access requirement information does not include identification information of partitions in the first file. At this time, the content identification information of the content is the identification information of the first file and the identification information of the partition in the first file, and the computing engine determines that the granularity of accessing the content of the first file is the partition granularity.
- the first granularity is file granularity or partition granularity, so if the determined granularity is the first granularity, the content identification information of the content includes the identification information of the first file, or the content identification information of the content includes the identification of the first file. information and the identification information of the partition in the first file.
- the first file is a structured data file.
- the access requirement information includes identification information of the first file and first information.
- the first information is used to indicate at least one line in the first file and/or At least one column.
- the content identification information of the content is the identification information of the first file and the column identification of the at least one column in the first file, or the content identification information of the content is the identification information of the first file and the column identification of the first file.
- the line number of at least one line, the calculation engine determines that the granularity of accessing the content of the first file is the row and column granularity.
- the first file is a semi-structured data file
- the access requirement information includes identification information of the first file and identification information of the data fragments in the first file.
- the content identification information of the content is the identification information of the first file and the identification information of the data fragment in the first file
- the computing engine determines that the granularity of accessing the content of the first file is the data fragment granularity.
- the second granularity is row-column granularity or data fragment granularity. Therefore, if the determined granularity is the second granularity, the content identification information of the content includes the identification information of the first file and the column identification of the at least one column in the first file.
- the access requirement information may also include row filtering information corresponding to the at least one column.
- the content identification information of the content includes identification information of the first file and the line number of the at least one line in the first file.
- the content identification information of the content includes identification information of the first file and identification information of the data fragment in the first file.
- the computing engine includes computing module and routing module.
- the above step 402 is executed by the computing module, and the above step 403 is executed by the routing module.
- the computing module may first perform the above step 402, and then the routing module may perform the above step 403.
- the routing module first performs the above step 403, and then the computing module can perform the above step 402.
- the computing module also performs the above step 402.
- the computing engine passes the authentication of the first user's permission to access the content, and if the granularity determined by the routing module is the first granularity, the following step 404 is executed. If the granularity determined by the routing module is the second granularity, the following step 409 is executed.
- Step 404 The computing engine sends an authentication request to the file path authentication module.
- the authentication request includes authentication information.
- the authentication information is used to indicate the first user, the file path of the first file, and the first user's access to the file path. second access operation.
- the authentication information includes the file path of the first file, the user identity of the first user, and the second operation type.
- the second operation type is an operation type corresponding to the first operation type that can access the object file storage system.
- the computing engine maps the first operation type to obtain the second operation type.
- the second operation type includes read operations and/or write operations.
- the authentication information includes identification information of the first file, account information of the first user, and the first operation type.
- the authentication information may also include other information, which will not be listed here.
- step 404 the routing module of the computing engine (Hive engine or Spark engine) sends an authentication request to the file path authentication module.
- Step 405 The file path authentication module receives the authentication request, and authenticates the first user's permission to access the file path using the second access operation based on the first permission information and the authentication information.
- the second access operation is the second The access operation corresponding to the operation type.
- the linkage permission module stores the first permission information corresponding to at least one file.
- the first permission information corresponding to the file includes the file path of the file, the identity of the user who can access the file path of the file and the user who can access the file path.
- the third operation type for the third access operation of the file path of the file.
- the authentication information includes the file path of the first file, the user identity of the first user, and the second operation type.
- the file path authentication module reads the first permission information including the file path from the linkage permission module based on the file path, and the read first permission information corresponds to the first file. Comparing the user identity of the first user with the user identity included in the read first permission information, and comparing the second operation type with the third operation type included in the read first permission information. If the user identity of the first user is compared with the user identity included in the read first permission information, and the second operation type is compared with the third operation type included in the read first permission information, then the first The user's permission to access the file path using the second access operation is authenticated and passed.
- the comparison shows that the user identity of the first user is different from the user identity included in the read first permission information, and/or the comparison shows that the second operation type is different from the third operation type included in the read first permission information, then The permission authentication for the first user to access the file path using the second access operation failed.
- the authentication information includes identification information of the first file, account information of the first user, and the first operation type.
- the file path authentication module first obtains the file path of the first file based on the identification information of the first file, obtains the user identity of the first user based on the account information of the first user, and maps the first operation type to obtain the second operation type. Then, the first user's permission to access the file path using the second access operation is authenticated.
- the file path authentication module obtains metadata including identification information of the first file from the metadata center, the metadata is metadata of the first file, and reads the metadata of the first file from the metadata of the first file.
- the file path of a file obtains the user identity of the first user from the identity authentication center based on the account information of the first user.
- Step 406 When the file path authentication module passes the permission authentication for the first user to access the file path using the second access operation, it sends storage information to the object file storage system and an authentication response to the computing engine.
- the storage information includes The temporary credential, the file path and the second operation type, the authentication response includes the temporary credential.
- step 406 the file path authentication module allocates a temporary credential when the first user's permission to access the file path through the second access operation is authenticated.
- Step 407 The object file storage system receives the storage information and saves the corresponding relationship between the temporary certificate, the file path and the second operation type.
- the object file storage system stores the correspondence between temporary credentials, file paths, and operation types.
- the object file storage system receives the storage information and stores the temporary certificate, the file path and the second operation type in the corresponding relationship between the temporary certificate, the file path and the operation type.
- the object file storage system deletes the temporary credentials from the correspondence between temporary credentials, file paths and operation types. Record.
- Step 408 The computing engine receives the authentication response, and based on the temporary credential, the access requirement information and the file path, accesses the first file in the object file storage system.
- the access requirement information includes content identification information of the content and a first operation type of the first access operation.
- the content identification information of the content includes identification information of the first file, or the content identification information of the content includes the identification of the first file. information and identification information of the partition in the first file.
- the first access operation is to query the first file
- the second access operation mapped by the first access operation includes a read operation.
- the first file is accessed according to the following process.
- the computing engine sends a read request to the object file storage system.
- the read request includes the temporary credential and the file path of the first file.
- the access requirement information listed above is: Select*From Company information.
- the first operation type "Select" is to query the first file
- the content identification information of the content is "Company information”
- the second access operation mapped by the computing engine to "Select” includes a read operation.
- the first record in this correspondence includes temporary credentials 1, the file path of the first file "C: ⁇ windows ⁇ system32 ⁇ Company information" as shown in Table 1 and the second access operation of the second access operation to the file path. Operation type "read operation”.
- Temporary credentials file path Operation type P1 C ⁇ windows ⁇ system32 ⁇ Company information Read operation ... ... ...
- the computing engine sends a read request to the object file storage system.
- the read request includes the temporary credential "P1" and the file path of the first file "C: ⁇ windows ⁇ system32 ⁇ Company information”.
- the object file storage system receives the read request, and based on the temporary credentials included in the read request, obtains the corresponding file path and second operation type from the correspondence between the temporary credentials, file path, and operation type.
- the object file storage system receives the read request, which includes the temporary credential "P1" and the file path of the first file “C: ⁇ windows ⁇ system32 ⁇ Company information”. Based on the temporary credential "P1", obtain the corresponding file path "C: ⁇ windows ⁇ system32 ⁇ Company information” and the second operation type "read operation” from the corresponding relationship between the temporary credential, file path and operation type shown in Table 2. ".
- the object file storage system If the file path included in the read request is the same as the obtained file path and the second access operation corresponding to the second operation type includes a read operation, the object file storage system reads the first file based on the file path of the first file, Return the first file to the calculation engine.
- the file path "C: ⁇ windows ⁇ system32 ⁇ Company information" included in the read request is the same as the obtained corresponding file path "C: ⁇ windows ⁇ system32 ⁇ Company information", and the obtained second operation type (read operation) corresponds to
- the second access operation includes a read operation, so the object file storage system reads the first file as shown in Table 1 based on the file path "C: ⁇ windows ⁇ system32 ⁇ Company information" of the first file, and returns to the computing engine as shown in Table 1 The first file shown.
- the computing engine receives the first file, the content identification information includes the identification information of the first file, and returns the first file to the first user; the content identification information includes the identification information of the first file and the first file identification information of the partition, obtain the contents of the partition from the first file, and return the contents of the partition to the first user.
- the content identification information includes identification information of the first file, indicating that the first user needs to query the entire content of the first file.
- the content identification information includes identification information of the first file and identification information of the partition in the first file, indicating that the first user needs to query the content of the partition in the first file.
- the computing engine receives the first file shown in Table 1, where the content identification information includes the identification information "Company information" of the file shown in Table 1, and returns the first file shown in Table 1 to the first user. .
- the second access operation mapped by the first access operation includes a read operation and a write operation, and the access requirement information includes content to be updated.
- the first file is accessed according to the following process.
- the computing engine sends a read request to the object file storage system, and the read request includes the temporary credential and the file path.
- the object file storage system receives the read request, and based on the temporary credentials included in the read request, obtains the corresponding file path and second operation type from the correspondence between the temporary credentials, file path, and operation type.
- the object file storage system reads the first file based on the file path and reports it to the computing engine. Return to the first file.
- the computing engine receives the first file, the content identification information includes the identification information of the first file, and updates the content in the first file to the content to be updated; the content identification information includes the identification information of the first file and The identification information of the partition in the first file is used to update the content of the partition in the first file to the content to be updated.
- the computing engine sends a write request to the object file storage system.
- the write request includes the temporary certificate, the first file, and the file path.
- the object file storage system receives the write request, and based on the temporary credentials included in the write request, obtains the corresponding file path and second operation type from the correspondence between the temporary credentials, file path, and operation type.
- the object file storage system replaces the first file saved at the file path with this Write the first file included in the request.
- the first access operation may also be other operations, for example, the first access operation may be deleting the first file, etc., which will not be listed one by one here.
- Step 409 The computing engine sends an access instruction to the data filtering engine, where the access instruction includes the file path of the first file and the access requirement information.
- the routing module of the computing engine sends an access instruction to the data filtering engine.
- Step 410 The data filtering engine receives the access instruction and accesses the first file based on the administrator account information, the file path and the access requirement information.
- the data filtering engine authenticates the first user's permission to access the content based on the second permission information, the first user's account information, and the access requirement information. After the first user's permission to access the content is authenticated, the first file is accessed based on the administrator account information, the file path and the access requirement information.
- the second permission information is used to indicate the user identity that can access the content and the fourth access operation that can access the content.
- the first file is a structured data file
- the access requirement information includes identification information of the first file, first information and a first operation type of the first access operation
- the first information includes a column identifier of at least one column of the first file, or , the first information includes a column identifier of at least one column of the first file and row filtering information corresponding to the at least one column, or the first information includes a line number of the at least one line in the first file.
- the first file is a semi-structured data file
- the content identification information of the content includes identification information of the first file and identification information of the data fragments in the first file.
- the first access operation is to query the first file
- the second access operation mapped by the first access operation includes a read operation.
- the first file is accessed according to the following process.
- the data filtering engine sends a read request to the object file storage system.
- the read request includes the administrator account information and the file path of the first file.
- the first operation type "Select" is to query the first file.
- the content identification information of the content includes "Company information” and the column identification "City” in the fourth column.
- the row filter information corresponding to the fourth column is "City 1". ”
- the second access operation mapped by the calculation engine to “Select” includes a read operation.
- the data filtering engine sends a read request to the object file storage system.
- the read request includes the administrator account information "administrators” and the file path of the first file "C: ⁇ windows ⁇ system32 ⁇ Company information”.
- the object file storage system When the object file storage system receives the read request and determines that the account information included in the read request is the administrator account information, it reads the first file based on the file path of the first file and returns the first file to the data filtering engine.
- the administrator has greater authority, so when the object file storage system determines that the account information included in the read request is the administrator's account information, it can directly read the first file based on the file path of the first file.
- the object file storage system receives the read request, reads the first file shown in Table 1 based on the file path "C: ⁇ windows ⁇ system32 ⁇ Company information" of the first file, and returns the information shown in Table 1 to the data filtering engine.
- the first file shown receives the read request, reads the first file shown in Table 1 based on the file path "C: ⁇ windows ⁇ system32 ⁇ Company information" of the first file, and returns the information shown in Table 1 to the data filtering engine.
- the first file shown shown.
- the data filtering engine receives the first file, the access requirement information includes the first information, and obtains the content in the first file based on the first information; the content identification information includes identification information of the data fragments in the first file. , obtain the content in the first file from the first file, and the content is the content in the data fragment.
- the first information includes a column identifier of at least one column in the first file
- the data filtering engine obtains the content of the at least one column from the first file based on the first information, and returns the content of the at least one column to the computing engine.
- the first information includes a column identifier and row filtering information of at least one column in the first file
- the data filtering engine obtains the content of the at least one column from the first file based on the first information as the row filtering information.
- the content of one or more lines is returned to the calculation engine.
- the data filtering engine receives the first file shown in Table 1 and filters the information "City 1" based on the column identifier "City" in the fourth column and the row corresponding to the fourth column "City 1". Obtain the three rows whose content in the fourth column is "City 1". These three rows are the first row, the second row and the fifth row in Table 1. Return the contents of the three rows to the calculation engine. The calculation engine returns the content to the first user. Return the three lines of content.
- the first information includes a line number of at least one line in the first file
- the data filtering engine obtains the content of the at least one line from the first file based on the first information, and returns the content of the at least one line to the computing engine.
- the first information includes identification information of the data fragment in the first file
- the data filtering engine obtains the content of the data fragment from the first file based on the first information, and returns the content of the data fragment to the computing engine.
- the data filtering engine returns the content in the first file to the calculation engine.
- the calculation engine receives the content in the first file and returns the content in the first file to the first user.
- the second access operation mapped by the first access operation includes a read operation and a write operation, and the access requirement information includes content to be updated.
- the first file is accessed according to the following process.
- the data filtering engine sends a read request to the object file storage system.
- the read request includes the administrator account information and the file path of the first file.
- the object file storage system When the object file storage system receives the read request and determines that the account information included in the read request is the administrator account information, it reads the first file based on the file path of the first file and returns the first file to the data filtering engine.
- the data filtering engine receives the first file, where the access requirement information includes the first information, and updates at least one column or at least one row indicated by the first information in the first file to the content to be updated; where the access requirement information includes The identification information of the data fragment in the first file updates the content in the data fragment in the first file to the content to be updated.
- the first information includes a column identifier of at least one column in the first file
- the data filtering engine updates the at least one column in the first file to the content to be updated.
- the first information includes a column identifier and row filtering information of at least one column in the first file
- the data filtering engine determines from the first file based on the first information that the content in the at least one column is the row filtering information.
- the content of one or more rows is updated to the content to be updated.
- the first information includes a line number of at least one line in the first file
- the data filtering engine updates the content of the at least one line in the first file to the content to be updated based on the first information.
- the first information includes identification information of the data fragment in the first file
- the data filtering engine updates the content of the data fragment in the first file to the content to be updated based on the identification information of the data fragment.
- the data filtering engine sends a write request to the object file storage system.
- the write request includes administrator account information, the file path of the first file, and the updated first file.
- the object file storage system When the object file storage system receives the write request and determines that the account information included in the write request is the administrator account information, it replaces the first file saved in the file path with the first file included in the write request.
- the first access operation may also be other operations, for example, the first access operation may be deleting the first file, etc., which will not be listed one by one here.
- the linkage permission module receives the second permission information configured by the permission administrator, and generates the first permission information based on the second permission information.
- the permission administrator authorizes access to the content of the file by configuring the second permission information
- the linkage permission module generates the first permission information based on the second permission information, so that the computing engine in the access system uses the second permission information for authentication.
- the file path authentication module uses the first permission information for authentication. This only requires one authorization from the authority administrator to enable two-dimensional authentication.
- the first user accesses the first file in an open source manner, that is, the first user sends authentication information to the file path authentication module through the client, and the authentication information includes the file path of the first file.
- the file path authentication module authenticates the first user's permission to access the file path using the second access operation based on the authentication information.
- After passing the authentication it sends an authentication response to the client.
- the authentication response includes the temporary credentials.
- the storage information includes the temporary certificate, the file path and the second operation type.
- the client receives the authentication response, and based on the temporary credential, the file path and the access requirement information, accesses the first file in the object file storage system. This achieves transparent file access.
- the computing engine determines the granularity of accessing the content in the first file based on the access requirement information
- the computing engine requests the file path authentication module to access the first file for the first user.
- the permissions of the file path of the file are authenticated.
- the temporary credentials assigned by the file path authentication module are obtained.
- the access requirement information and the file path access the object file storage system. Since the computing engine directly accesses the object file storage system, the file reading and writing performance is improved.
- the computing engine requests the data filtering engine to access the object file storage system.
- the data filtering engine includes the specified administrator account information, so that the first file can be read from the object file storage system.
- an embodiment of the present application provides a method 700 for obtaining first permission information.
- the first permission information in the above-mentioned embodiment shown in FIG. 1 or FIG. 3 , or the first permission information in the above-mentioned embodiment shown in FIG. 4 is obtained through the method 700 .
- the method 700 includes the following steps:
- Step 701 The linkage permission module receives second permission information.
- the second permission information is used to indicate the identity of the user who can access the content in the first file and the fourth access operation.
- the linkage authority module obtains the metadata report of the first file from the metadata center, and the metadata of the first file is any metadata stored in the metadata center. Obtain at least one user identity from the identity authentication center, and display a second interface to the authority administrator, where the second interface includes metadata of the first file and the at least one user identity.
- the rights administrator selects the content identification information of the content in the first file from the metadata of the first file, selects a user identity that can access the content from the at least one user identity, and inputs the user identity that can access the content into the second interface.
- the fourth operation type of the fourth access operation of the content is used to obtain the second permission information.
- the second permission information includes content identification information of the content, the selected user identity and the input fourth operation type.
- the linkage permission module reads the second permission information from the second interface.
- the first file is a structured data file
- the metadata of the first file includes a file identifier of the first file and a column identifier of each column of the first file.
- the content identification information of the content selected by the rights administrator includes the file identification of the first file
- the content identification information of the content selected by the rights administrator includes the file identification of the first file and at least one column in the first file.
- the column identification, or content identification information of the content selected by the rights administrator includes a file identification of the first file and a line number of at least one line in the first file.
- the first file is a semi-structured data file
- the metadata of the first file includes a file identification of the first file and identification information of each data fragment of the first file.
- the content identification information of the content selected by the rights administrator includes the file identification of the first file, or the content identification information of the content selected by the rights administrator includes the file identification of the first file and at least one data in the first file. Identification information for the fragment.
- Step 702 The linkage authority module generates first authority information based on the second authority information.
- step 702 the linkage permission module generates the first permission information through the following operations 7021-7023.
- the operations 7021-7023 are:
- the linkage permission module obtains the file path of the first file based on the content identification information of the content in the second permission information.
- the content identification information of the content includes the identification information of the first file
- the linkage permission module obtains metadata including the identification information of the first file from the metadata center
- the metadata is the metadata of the first file. , obtain the file path of the first file from the metadata of the first file.
- the linkage permission module maps the fourth operation type included in the second permission information to obtain the third operation type.
- the access operation corresponding to the fourth operation type is the fourth access operation configured by the administrator to be able to access the content in the first file.
- the fourth access operation may be querying the first file, updating the first file, or deleting the first file, etc.
- the third operation type is an access operation corresponding to the fourth operation type that can access the object file storage system.
- the third operation type includes read operations and/or write operations, etc.
- the linkage permission module reads the user identity from the second permission information, and combines the file path of the first file, the user identity and the third operation type into the second permission information.
- Step 703 The linkage authority module saves the first authority information and the second authority information.
- the above-mentioned steps 701-703 can be repeatedly executed, so that the linkage authority module generates a large amount of first authority information and second authority information.
- the linkage permission module receives the second permission information configured by the permission administrator, and generates the first permission information based on the second permission information.
- the first permission information is used to indicate the identity of the user who can access the file path of the first file. and access operations. Therefore, the first authority information can be automatically generated, which improves the efficiency of obtaining the first authority information and reduces the cost of obtaining the first authority information. Since the permission administrator only needs to configure the second permission information, the linkage permission module automatically generates the first permission information based on the second permission information.
- the second permission information is used to authenticate the user's permission to access the content in the first file.
- the first The permission information is used to authenticate the user's permission to access the file path of the first file. In this way, the authority administrator only needs to authorize once (configure the second authority information), and the access system uses the second authority information and the first authority information for two-dimensional authentication.
- an embodiment of the present application provides a device 800 for accessing files.
- the device 800 can be deployed on the computing engine in the system shown in Figure 1 or Figure 3, or deployed on the system shown in Figure 4, Figure 5 or Figure 6 on the compute engine in the embodiment shown.
- the device 800 includes:
- Communication unit 801 configured to receive a data access request.
- the data access request includes access requirement information.
- the access requirement information is used to indicate the content of the first file that the first user needs to access.
- the first file is stored in the object file storage system. ;
- the processing unit 802 is configured to access the first file based on the account information of the first user and the access requirement information when it is determined that the granularity of accessing the content in the first file is the first granularity based on the access requirement information;
- the processing unit 802 is also configured to access the first file at the second granularity based on the specified administrator account information and the access requirement information when it is determined that the granularity of accessing the content in the first file is the second granularity based on the access requirement information. smaller than the first particle size.
- step 401 of the embodiment shown in Figure 4 please refer to the relevant content of step 401 of the embodiment shown in Figure 4, which will not be described in detail here.
- the processing unit 802 accesses the first file based on the account information of the first user and the access requirement information.
- the processing unit 802 accesses the first file based on the account information of the first user and the access requirement information.
- the processing unit 802 accesses the first file based on the specified administrator account information and the access requirement information.
- the processing unit 802 accesses the first file based on the specified administrator account information and the access requirement information.
- the access requirement information includes identification information of the first file, and the first granularity is file granularity; or,
- the access requirement information includes identification information of the first file and identification information of a partition in the first file, and the first granularity is a partition granularity.
- the communication unit 801 is also configured to send an authentication request to the file path authentication module.
- the authentication request includes authentication information, and the authentication information is used to indicate the first user, the file path of the first file, and the file path of the first file.
- a user accesses the file path.
- the authentication information is obtained based on the access requirement information and the first user's account information.
- the authentication request is used to trigger the file path authentication module based on the first permission information and the authentication
- the permission information authenticates the first user's permission to access the file path using the access operation.
- the file path is used to indicate the storage location of the first file.
- the first permission information is used to indicate the identity and access of the user who can access the file path. operate;
- the communication unit 801 is also used to receive an authentication response sent by the file path authentication module after the permission has been authenticated.
- the authentication response includes a temporary credential.
- the temporary credential, the file path and the operation type of the access operation are in the object.
- the processing unit 802 is configured to access the first file based on the temporary credential, the access requirement information and the file path.
- step 404 of the embodiment shown in Figure 4 please refer to the relevant content of step 404 of the embodiment shown in Figure 4, which will not be described in detail here.
- step 408 of the embodiment shown in Figure 4 please refer to the relevant content of step 408 of the embodiment shown in Figure 4, which will not be described in detail here.
- the processing unit 802 accesses the first file based on the temporary credential, the access requirement information and the file path.
- the processing unit 802 accesses the first file based on the temporary credential, the access requirement information and the file path.
- the first file is a structured data file, and the first file uses a list form to store data.
- the access requirement information includes identification information of the first file and first information, and the first information is used to indicate at least one of the first files.
- a column and/or at least one row of the first file, and the second granularity is the column granularity; or,
- the first file is a semi-structured data file.
- the first file includes at least one data fragment.
- the data fragment is used to save data with the same business attributes.
- the access requirement information includes identification information of the first file and one of the first files. or identification information of multiple data fragments, and the second granularity is the data fragment granularity.
- the communication unit 801 is also used to send an access instruction to the data filtering engine.
- the access instruction includes the access requirement information.
- the data filtering engine includes the administrator account information.
- the access instruction is used to trigger the data filtering engine based on the administrator account. Information and the access requirement information, access the first file.
- step 409 of the embodiment shown in Figure 4 please refer to the relevant content of step 409 of the embodiment shown in Figure 4, which will not be described in detail here.
- processing module 802 is also used to:
- the second permission information Based on the second permission information, the first user's account information and the access requirement information, authenticate the first user's permission to access the content, and the second permission information is used to indicate the identity and access operation of the user who can access the content;
- the granularity of accessing the content in the first file is determined based on the access requirement information.
- processing unit 802 is also used to:
- the first permission information is generated based on the second permission information.
- the first permission information is used to indicate the user identity and access operation that can access the file path of the first file.
- the file path is used to indicate the storage location of the first file.
- the processing unit accesses the first file based on the first user's account information and the access requirement information, so that there is no need to borrow the administrator's permission. Account information is used to access the first file, thereby improving the efficiency of accessing the first file and the performance of reading and writing the first file.
- the processing unit accesses the first file based on the specified administrator account information and the access requirement information, so that the administrator account information is used to replace the first user's account information, and uses the management User account information is used to access the first file, so that there is no need to configure permissions for the first user to access the second granularity, thereby avoiding expansion of the first user's access permissions and facilitating permission management.
- an embodiment of the present application provides a schematic diagram of a device 900 for accessing files.
- the device 900 may be the computing engine in any of the above embodiments, for example, it may be the computing engine provided by the embodiment shown in FIG. 1, FIG. 3, FIG. 4, FIG. 5 or FIG. 6.
- the device 900 includes at least one processor 901, internal connections 902, memory 903 and at least one transceiver 904.
- the device 900 is a device with a hardware structure and can be used to implement the functional modules in the device 800 described in Figure 8 .
- the processing unit 802 in the device 800 shown in Figure 8 can be implemented by calling the code in the memory 903 through the at least one processor 901, and the communication unit 801 in the device 800 shown in Figure 8 can be implemented. This is achieved through the transceiver 904.
- the device 900 can also be used to implement the functions of the computing engine in any of the above embodiments.
- the above-mentioned processor 901 can be a general central processing unit (CPU), a network processor (network processor, NP), a microprocessor, an application-specific integrated circuit (ASIC) , or one or more integrated circuits used to control the execution of the program of this application.
- CPU central processing unit
- NP network processor
- ASIC application-specific integrated circuit
- the internal connection 902 may include a path for transmitting information between the components.
- the internal connection 902 is a single board or a bus, etc.
- the above-mentioned transceiver 904 is used to communicate with other devices or communication networks.
- the above-mentioned memory 903 can be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (random access memory, RAM) or other types that can store information and instructions.
- Type of dynamic storage device it can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc Storage (including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store the desired program code in the form of instructions or data structures and can be used by Any other media accessible by a computer, but not limited to this.
- the memory can exist independently and be connected to the processor through a bus. Memory can also be integrated with the processor.
- the memory 903 is used to store the application program code for executing the solution of the present application, and the processor 901 controls the execution.
- the processor 901 is used to execute the application program code stored in the memory 903, and cooperate with at least one transceiver 904, so that the device 900 implements the functions in the patent method.
- the processor 901 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 9 .
- the device 900 may include multiple processors, such as the processor 901 and the processor 907 in Figure 9 . Each of these processors may be a single-CPU processor or a multi-CPU processor.
- a processor here may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioethics (AREA)
- Storage Device Security (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method, apparatus and system for accessing a file, and a storage medium. The method comprises: receiving a data access request, wherein the data access request comprises access demand information, the access demand information is used for indicating content in a first file that a first user needs to access, and the first file is stored in an object file storage system; when it is determined, on the basis of the access demand information, that the granularity of the content in the first file to be accessed is a first granularity, accessing the first file on the basis of account information of the first user and the access demand information; and when it is determined, on the basis of the access demand information, that the granularity of the content in the first file to be accessed is a second granularity, accessing the first file on the basis of specified administrator account information and the access demand information, wherein the second granularity is less than the first granularity.
Description
本申请要求于2022年3月17日提交的申请号为202210264898.8、发明名称为“一种高效的细粒度的数据湖权限管理方案”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。以及,本申请还要求于2022年5月11日提交的申请号为202210511098.1、发明名称为“访问文件的方法、装置、系统及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application with application number 202210264898.8 and the invention title "An efficient fine-grained data lake authority management solution" submitted on March 17, 2022, the entire content of which is incorporated herein by reference. Applying. And, this application also claims priority to the Chinese patent application with application number 202210511098.1 and the invention title "Method, device, system and storage medium for accessing files" submitted on May 11, 2022, the entire content of which is incorporated by reference. in this application.
本申请涉及计算机领域,特别涉及一种访问文件的方法、装置、系统及存储介质。The present application relates to the field of computers, and in particular to a method, device, system and storage medium for accessing files.
对于存算分离的数据湖而言,数据湖包括结构化查询语言(structured query language,SQL)引擎和对象文件存储系统。对象文件存储系统包括至少一个文件,每个文件用于存储数据。SQL引擎接收来自用户的SQL语句,该SQL语句用于向SQL引擎指示用户需要访问的文件,SQL引擎基于该SQL语句访问对象文件存储系统中的该文件。For a data lake that separates storage and computing, the data lake includes a structured query language (SQL) engine and an object file storage system. The object file storage system includes at least one file, each file is used to store data. The SQL engine receives a SQL statement from the user, and the SQL statement is used to indicate to the SQL engine the file that the user needs to access. The SQL engine accesses the file in the object file storage system based on the SQL statement.
通常对象文件存储系统中的文件为结构化数据文件,该文件采用列表形式来存储数据。例如,对象文件存储系统存在一个文件用于保存员工数据表,该文件包括四列,第一列用于存储员工姓名,第二列用于存储员工住址,第三列用于存储员工的部门,第四列用于存储员工的职位。该文件的每一行用于存储员工的姓名、住址、部门和职位。Usually the files in the object file storage system are structured data files, which store data in list form. For example, the object file storage system has a file used to save an employee data table. The file includes four columns. The first column is used to store the employee's name, the second column is used to store the employee's address, and the third column is used to store the employee's department. The fourth column is used to store the employee's position. Each row of the file stores the employee's name, address, department, and position.
目前用户可以使用SQL引擎访问位于对象文件存储系统中的整个文件,也就是说,SQL引擎向用户提供的访问粒度为整个文件。SQL引擎向用户提供文件粒度的访问服务,为用户提供的访问服务过于单一。Currently, users can use the SQL engine to access the entire file located in the object file storage system. That is to say, the access granularity provided by the SQL engine to the user is the entire file. The SQL engine provides file-granular access services to users, but the access services provided to users are too single.
发明内容Contents of the invention
本申请提供了一种访问文件的方法、装置、系统及存储介质,以丰富向用户提供的访问服务。所述技术方案如下:This application provides a method, device, system and storage medium for accessing files to enrich the access services provided to users. The technical solutions are as follows:
第一方面,本申请实施例提供了一种访问文件的方法,在所述方法中,接收数据访问请求,该数据访问请求包括访问需求信息,该访问需求信息用于指示第一用户需要访问的第一文件中的内容,第一文件存储在对象文件存储系统中。在基于该访问需求信息确定访问第一文件中的内容的粒度为第一粒度时,基于第一用户的账号信息和该访问需求信息,访问第一文件。在基于该访问需求信息确定访问第一文件中的内容的粒度为第二粒度时,基于指定的管理员账号信息和该访问需求信息,访问第一文件,第二粒度小于所述第一粒度。In the first aspect, embodiments of the present application provide a method for accessing files. In the method, a data access request is received. The data access request includes access requirement information. The access requirement information is used to indicate that the first user needs to access. The content of the first file is stored in the object file storage system. When it is determined based on the access requirement information that the granularity of accessing the content in the first file is the first granularity, the first file is accessed based on the account information of the first user and the access requirement information. When the granularity of accessing the content in the first file is determined to be the second granularity based on the access requirement information, the first file is accessed based on the specified administrator account information and the access requirement information, and the second granularity is smaller than the first granularity.
其中,基于该访问需求信息确定访问第一文件中的内容的粒度,在确定的粒度为第一粒度时,基于第一用户的账号信息和该访问需求信息,访问第一文件。在确定的粒度为第二粒 度时,基于指定的管理员账号信息和该访问需求信息,访问第一文件。这样可以向用户提供第一粒度的访问服务以及向用户提供第二粒度的访问服务,丰富了向用户提供的访问服务。Wherein, the granularity of accessing the content in the first file is determined based on the access requirement information. When the determined granularity is the first granularity, the first file is accessed based on the account information of the first user and the access requirement information. When the determined granularity is the second granularity, the first file is accessed based on the specified administrator account information and the access requirement information. In this way, the first-granularity access service and the second-granularity access service can be provided to the user, which enriches the access services provided to the user.
由于第一粒度大于第二粒度,在确定的粒度为第一粒度时,基于第一用户的账号信息和该访问需求信息访问第一文件,这样不用借用管理员账号信息来访问第一文件,提高访问第一文件的效率以及读写第一文件的性能。Since the first granularity is greater than the second granularity, when the determined granularity is the first granularity, the first file is accessed based on the first user's account information and the access requirement information. In this way, there is no need to borrow the administrator account information to access the first file, which improves the efficiency. The efficiency of accessing the first file and the performance of reading and writing the first file.
如果为第一用户配置能够访问第二粒度的内容的权限,第一用户除了能够访问第一文件中的该内容外,还能够访问第一文件中除该内容之外的其他内容,给第一用户配置的权限自动扩大到访问第一文件中的任意内容,导致权限扩大太多,不利于权限管理。然而在本申请中,由于在确定的粒度为第二粒度时,基于指定的管理员账号信息和该访问需求信息访问第一文件,这样借助管理员账号信息来代替第一用户的账号信息,并使用管理员账号信息来访问第一文件,如此不需要为第一用户配置能够访问第二粒度的权限,从而避免扩展第一用户的访问权限,便于权限管理。If the first user is configured with permission to access the content at the second granularity, the first user can not only access the content in the first file, but also access other content in the first file except the content. This gives the first user The permissions configured by the user are automatically expanded to access any content in the first file, which results in too much permission expansion and is not conducive to permission management. However, in this application, when the determined granularity is the second granularity, the first file is accessed based on the specified administrator account information and the access requirement information, so the administrator account information is used to replace the first user's account information, and Using the administrator account information to access the first file eliminates the need to configure permissions for the first user to access the second granularity, thereby avoiding extending the first user's access permissions and facilitating permission management.
在一种可能的实现方式中,该访问需求信息包括第一文件的标识信息,第一粒度为文件粒度;或者,该访问需求信息包括第一文件的标识信息和第一文件中的分区的标识信息,第一粒度为分区粒度。其中,第一粒度可以是文件粒度或分区粒度,从而可以向用户提供文件粒度的访问服务,或者,提供分区粒度的访问服务,丰富了提供的访问服务。另外,分区粒度是新定义的粒度,也就是说,本申请还能够提供分区访问服务。In a possible implementation, the access requirement information includes the identification information of the first file, and the first granularity is the file granularity; or, the access requirement information includes the identification information of the first file and the identification of the partition in the first file. Information, the first granularity is the partition granularity. The first granularity may be file granularity or partition granularity, thereby providing users with file-granular access services, or providing partition-granular access services, thereby enriching the provided access services. In addition, the partition granularity is a newly defined granularity, which means that this application can also provide partition access services.
在一种可能的实现方式中,文件粒度是指需要访问第一文件的全部内容。分区粒度是指需要访问第一文件中的一个分区中的全部内容。In one possible implementation, file granularity refers to the need to access all contents of the first file. Partition granularity refers to the need to access the entire contents of a partition in the first file.
在另一种可能的实现方式中,向文件路径鉴权模块发送鉴权请求,该鉴权请求包括鉴权信息,该鉴权信息用于指示第一用户、第一文件的文件路径和第一用户访问该文件路径的访问操作,该鉴权信息是基于该访问需求信息和第一用户的账号信息得到的,该鉴权请求用于触发文件路径鉴权模块基于第一权限信息和该鉴权信息对第一用户采用该访问操作访问该文件路径的权限进行鉴权,该文件路径用于指示第一文件的存储位置,第一权限信息用于指示能够访问该文件路径的用户身份和访问操作。接收文件路径鉴权模块对该权限鉴权通过后发送的鉴权响应,该鉴权响应包括临时凭证,该临时凭证、该文件路径和该访问操作的操作类型在对象文件存储系统中对应存储。基于该临时凭证、该访问需求信息和该文件路径,访问第一文件。In another possible implementation, an authentication request is sent to the file path authentication module. The authentication request includes authentication information, and the authentication information is used to indicate the first user, the file path of the first file, and the first file path. The user accesses the file path. The authentication information is obtained based on the access requirement information and the first user's account information. The authentication request is used to trigger the file path authentication module based on the first permission information and the authentication The information authenticates the first user's permission to access the file path using the access operation. The file path is used to indicate the storage location of the first file. The first permission information is used to indicate the identity and access operation of the user who can access the file path. . Receive an authentication response sent by the file path authentication module after passing the permission authentication. The authentication response includes a temporary credential. The temporary credential, the file path, and the operation type of the access operation are correspondingly stored in the object file storage system. Based on the temporary credentials, the access requirement information and the file path, access the first file.
由于接收到文件路径鉴权模块在对该权限鉴权通过后发送的临时凭证,使用该临时凭证来访问对象文件存储系统中的第一文件,从而提高访问第一文件的安全性。Since the temporary credential sent by the file path authentication module after passing the permission authentication is received, the temporary credential is used to access the first file in the object file storage system, thereby improving the security of accessing the first file.
在另一种可能的实现方式中,第一文件为结构化数据文件,第一文件采用列表形式来存储数据,该访问需求信息包括第一文件的标识信息和第一信息,第一信息用于指示第一文件的至少一列和/或第一文件的至少一行,第二粒度为行列粒度。或者,第一文件为半结构化数据文件,第一文件包括至少一个数据片段,数据片段用于保存具有相同业务属性的数据,该访问需求信息包括第一文件的标识信息和第一文件中的一个或多个数据片段的标识信息,第 二粒度为数据片段粒度。其中,第二粒度可以是行列粒度或数据片段粒度,从而可以向用户提供行列粒度的访问服务,或者,提供数据片段粒度的访问服务,丰富了提供的访问服务。In another possible implementation, the first file is a structured data file, the first file uses a list form to store data, the access requirement information includes identification information of the first file and first information, and the first information is used to Indicates at least one column of the first file and/or at least one row of the first file, and the second granularity is row and column granularity. Alternatively, the first file is a semi-structured data file, the first file includes at least one data fragment, the data fragment is used to save data with the same business attributes, and the access requirement information includes the identification information of the first file and the Identification information of one or more data fragments, the second granularity is the data fragment granularity. The second granularity may be row-column granularity or data fragment granularity, thereby providing users with row-column granular access services, or providing data fragment granular access services, thereby enriching the provided access services.
在另一种可能的实现方式中,向数据过滤引擎发送访问指令,该访问指令包括该访问需求信息,数据过滤引擎包括管理员账号信息,该访问指令用于触发数据过滤引擎基于该管理员账号信息和该访问需求信息,访问第一文件。In another possible implementation, an access instruction is sent to the data filtering engine. The access instruction includes the access requirement information. The data filtering engine includes administrator account information. The access instruction is used to trigger the data filtering engine based on the administrator account. Information and the access requirement information, access the first file.
由于数据过滤引擎包括管理员账号信息,向数据过滤引擎发送访问指令,使数据过滤引擎基于指定的管理员账号信息和该访问需求信息访问第一文件。如此,实现借助管理员账号信息来代替第一用户的账号信息,并使用管理员账号信息来访问第一文件,不需要为第一用户配置能够访问第二粒度的权限,从而避免扩大了第一用户的访问权限,便于权限管理。Since the data filtering engine includes administrator account information, the access instruction is sent to the data filtering engine, so that the data filtering engine accesses the first file based on the specified administrator account information and the access requirement information. In this way, the administrator account information is used to replace the account information of the first user and the administrator account information is used to access the first file. There is no need to configure the first user with permissions to access the second granularity, thus avoiding the expansion of the first file. User's access rights to facilitate rights management.
在另一种可能的实现方式中,基于第二权限信息、第一用户的账号信息和该访问需求信息,对第一用户访问所述内容的权限进行鉴权,第二权限信息用于指示能够访问该内容的用户身份和访问操作。在对第一用户访问该内容的权限鉴权通过后,基于该访问需求信息确定访问第一文件中的内容的粒度。In another possible implementation, the first user's permission to access the content is authenticated based on the second permission information, the first user's account information and the access requirement information, and the second permission information is used to indicate that the first user can The identity and access actions of the user who accessed the content. After the first user's permission to access the content is authenticated, the granularity of accessing the content in the first file is determined based on the access requirement information.
由于在对第一用户访问该内容的权限鉴权通过后,确定访问第一文件中的内容的粒度,然后基于不同的粒度采用不同方式访问第一文件,从而提高访问第一文件的安全性。After the first user's permission to access the content is authenticated, the granularity of accessing the content in the first file is determined, and then different methods are used to access the first file based on different granularities, thereby improving the security of accessing the first file.
在另一种可能的实现方式中,基于第二权限信息生成第一权限信息,第一权限信息用于指示能够访问第一文件的文件路径的用户身份和访问操作,该文件路径用于指示第一文件的存储位置。从而可以自动生成第一权限信息,提高了得到第一权限信息的效率,降低得到第一权限信息的成本。In another possible implementation, the first permission information is generated based on the second permission information. The first permission information is used to indicate the user identity and access operation of the file path that can access the first file. The file path is used to indicate the third file path. The storage location of a file. Therefore, the first authority information can be automatically generated, which improves the efficiency of obtaining the first authority information and reduces the cost of obtaining the first authority information.
第二方面,本申请提供了一种访问系统,该系统包括:计算引擎和对象文件存储系统。In the second aspect, this application provides an access system, which includes: a computing engine and an object file storage system.
计算引擎,用于接收数据访问请求,该数据访问请求包括访问需求信息,该访问需求信息用于指示第一用户需要访问的第一文件中的内容,第一文件存储在对象文件存储系统中。The computing engine is configured to receive a data access request, where the data access request includes access requirement information. The access requirement information is used to indicate the content of the first file that the first user needs to access. The first file is stored in the object file storage system.
计算引擎,还用于在基于该访问需求信息确定访问第一文件中的内容的粒度为第一粒度时,基于第一用户的账号信息和该访问需求信息,访问第一文件。The computing engine is also configured to access the first file based on the account information of the first user and the access requirement information when it is determined that the granularity of accessing the content in the first file is the first granularity based on the access requirement information.
计算引擎,还用于在基于该访问需求信息确定访问第一文件中的内容的粒度为第二粒度时,基于指定的管理员账号信息和该访问需求信息,访问第一文件,第二粒度小于第一粒度。The computing engine is also configured to access the first file based on the specified administrator account information and the access requirement information, and the second granularity is smaller than First granularity.
其中,计算引擎基于该访问需求信息确定访问第一文件中的内容的粒度,在确定的粒度为第一粒度时,基于第一用户的账号信息和该访问需求信息,访问第一文件。在确定的粒度为第二粒度时,基于指定的管理员账号信息和该访问需求信息,访问第一文件。这样可以向用户提供第一粒度的访问服务以及向用户提供第二粒度的访问服务,丰富了向用户提供的访问服务。The computing engine determines the granularity of accessing the content in the first file based on the access requirement information. When the determined granularity is the first granularity, the first file is accessed based on the account information of the first user and the access requirement information. When the determined granularity is the second granularity, the first file is accessed based on the specified administrator account information and the access requirement information. In this way, the first-granularity access service and the second-granularity access service can be provided to the user, which enriches the access services provided to the user.
由于第一粒度大于第二粒度,计算引擎在确定的粒度为第一粒度时,基于第一用户的账号信息和该访问需求信息访问第一文件,这样不用借用管理员账号信息来访问第一文件,提高访问第一文件的效率以及读写第一文件的性能。Since the first granularity is greater than the second granularity, when the determined granularity is the first granularity, the computing engine accesses the first file based on the first user's account information and the access requirement information, so that there is no need to borrow the administrator account information to access the first file. , improve the efficiency of accessing the first file and the performance of reading and writing the first file.
如果为第一用户配置能够访问第二粒度的内容的权限,第一用户除了能够访问第一文件 中的该内容外,还能够访问第一文件中除该内容之外的其他内容,给第一用户配置的权限自动扩大到访问第一文件中的任意内容,导致权限扩大太多,不利于权限管理。然而在本申请中,由于在确定的粒度为第二粒度时,计算引擎基于指定的管理员账号信息和该访问需求信息访问第一文件,这样借助管理员账号信息来代替第一用户的账号信息,并使用管理员账号信息来访问第一文件,如此不需要为第一用户配置能够访问第二粒度的权限,从而避免扩展第一用户的访问权限,便于权限管理。If the first user is configured with permission to access the content at the second granularity, the first user can not only access the content in the first file, but also access other content in the first file except the content. This gives the first user The permissions configured by the user are automatically expanded to access any content in the first file, which results in too much permission expansion and is not conducive to permission management. However, in this application, when the determined granularity is the second granularity, the computing engine accesses the first file based on the specified administrator account information and the access requirement information, so that the administrator account information is used to replace the first user's account information. , and use the administrator account information to access the first file, so that there is no need to configure permissions for the first user to access the second granularity, thereby avoiding extending the first user's access permissions and facilitating permission management.
在一种可能的实现方式中,该访问需求信息包括第一文件的标识信息,第一粒度为文件粒度;或者,该访问需求信息包括第一文件的标识信息和第一文件中的分区的标识信息,第一粒度为分区粒度。其中,第一粒度可以是文件粒度或分区粒度,从而可以向用户提供文件粒度的访问服务,或者,提供分区粒度的访问服务,丰富了提供的访问服务。另外,分区粒度是新定义的粒度,也就是说,本申请还能够提供分区访问服务。In a possible implementation, the access requirement information includes the identification information of the first file, and the first granularity is the file granularity; or, the access requirement information includes the identification information of the first file and the identification of the partition in the first file. Information, the first granularity is the partition granularity. The first granularity may be file granularity or partition granularity, thereby providing users with file-granular access services, or providing partition-granular access services, thereby enriching the provided access services. In addition, the partition granularity is a newly defined granularity, which means that this application can also provide partition access services.
在一种可能的实现方式中,文件粒度是指需要访问第一文件的全部内容。分区粒度是指需要访问第一文件中的一个分区中的全部内容。In one possible implementation, file granularity refers to the need to access all contents of the first file. Partition granularity refers to the need to access the entire contents of a partition in the first file.
在另一种可能的实现方式中,该系统还包括文件路径鉴权模块,In another possible implementation, the system also includes a file path authentication module,
计算引擎,用于向文件路径鉴权模块发送鉴权请求,该鉴权请求包括鉴权信息,该鉴权信息用于指示第一用户、第一文件的文件路径和第一用户访问该文件路径的访问操作,该鉴权信息是基于该访问需求信息和第一用户的账号信息得到的,该文件路径用于指示第一文件的存储位置。A computing engine configured to send an authentication request to the file path authentication module, where the authentication request includes authentication information, and the authentication information is used to indicate the first user, the file path of the first file, and the first user's access to the file path. For an access operation, the authentication information is obtained based on the access requirement information and the first user's account information, and the file path is used to indicate the storage location of the first file.
文件路径鉴权模块,用于基于第一权限信息和该鉴权信息对第一用户采用该访问操作访问该文件路径的权限进行鉴权,第一权限信息用于指示能够访问该文件路径的用户身份和访问操作,在对该权限鉴权通过后向计算引擎发送鉴权响应,该鉴权响应包括临时凭证。A file path authentication module, configured to authenticate the first user's permission to access the file path using the access operation based on the first permission information and the authentication information. The first permission information is used to indicate users who can access the file path. Identity and access operations, after passing the permission authentication, send an authentication response to the computing engine. The authentication response includes temporary credentials.
对象文件存储系统,用于对应保存该临时凭证、该文件路径和该访问操作的操作类型。The object file storage system is used to store the temporary credentials, the file path, and the operation type of the access operation.
计算引擎,还用于基于该临时凭证、该访问需求信息和该文件路径,访问第一文件。The calculation engine is also used to access the first file based on the temporary credential, the access requirement information and the file path.
由于计算引擎接收到文件路径鉴权模块在对该权限鉴权通过后发送的临时凭证,使用该临时凭证来访问对象文件存储系统中的第一文件,从而提高访问第一文件的安全性。Since the computing engine receives the temporary credential sent by the file path authentication module after passing the permission authentication, it uses the temporary credential to access the first file in the object file storage system, thereby improving the security of accessing the first file.
在另一种可能的实现方式中,第一文件为结构化数据文件,第一文件采用列表形式来存储数据,该访问需求信息包括第一文件的标识信息和第一信息,第一信息用于指示第一文件的至少一列和/或第一文件的至少一行,第二粒度为行列粒度。或者,第一文件为半结构化数据文件,第一文件包括至少一个数据片段,数据片段用于保存具有相同业务属性的数据,该访问需求信息包括第一文件的标识信息和第一文件中的一个或多个数据片段的标识信息,第二粒度为数据片段粒度。其中,第二粒度可以是行列粒度或数据片段粒度,从而可以向用户提供行列粒度的访问服务,或者,提供数据片段粒度的访问服务,丰富了提供的访问服务。In another possible implementation, the first file is a structured data file, the first file uses a list form to store data, the access requirement information includes identification information of the first file and first information, and the first information is used to Indicates at least one column of the first file and/or at least one row of the first file, and the second granularity is row and column granularity. Alternatively, the first file is a semi-structured data file, the first file includes at least one data fragment, the data fragment is used to save data with the same business attributes, and the access requirement information includes the identification information of the first file and the Identification information of one or more data fragments, the second granularity is the data fragment granularity. The second granularity may be row-column granularity or data fragment granularity, thereby providing users with row-column granular access services, or providing data fragment granular access services, thereby enriching the provided access services.
在另一种可能的实现方式中,该系统还包括数据过滤引擎,所述数据过滤引擎包括管理员账号信息。In another possible implementation, the system further includes a data filtering engine, where the data filtering engine includes administrator account information.
计算引擎,用于向数据过滤引擎发送访问指令,该访问指令包括第一文件的文件路径和该访问需求信息,该文件路径用于指示第一文件的存储位置。The computing engine is configured to send an access instruction to the data filtering engine. The access instruction includes the file path of the first file and the access requirement information. The file path is used to indicate the storage location of the first file.
数据过滤引擎,用于基于管理员账号信息、该文件路径和该访问需求信息,访问第一文件。A data filtering engine is used to access the first file based on the administrator account information, the file path and the access requirement information.
由于数据过滤引擎包括管理员账号信息,计算引擎向数据过滤引擎发送访问指令,数据过滤引擎基于指定的管理员账号信息和该访问需求信息访问第一文件。如此,实现借助管理员账号信息来代替第一用户的账号信息,并使用管理员账号信息来访问第一文件,不需要为第一用户配置能够访问第二粒度的权限,从而避免扩大了第一用户的访问权限,便于权限管理。Since the data filtering engine includes administrator account information, the computing engine sends an access instruction to the data filtering engine, and the data filtering engine accesses the first file based on the specified administrator account information and the access requirement information. In this way, the administrator account information is used to replace the account information of the first user and the administrator account information is used to access the first file. There is no need to configure the first user with permissions to access the second granularity, thus avoiding the expansion of the first file. User's access rights to facilitate rights management.
在另一种可能的实现方式中,计算引擎,还用于基于第二权限信息、第一用户的账号信息和该访问需求信息,对第一用户访问该内容的权限进行鉴权,第二权限信息用于指示能够访问该内容的用户身份和访问操作。在对第一用户访问该内容的权限鉴权通过后,基于该访问需求信息确定访问第一文件中的内容的粒度。由于在对第一用户访问该内容的权限鉴权通过后,计算引擎确定访问第一文件中的内容的粒度,然后基于不同的粒度采用不同方式访问第一文件,从而提高访问第一文件的安全性。In another possible implementation, the computing engine is also used to authenticate the first user's permission to access the content based on the second permission information, the first user's account information, and the access requirement information. The second permission Information used to indicate the identity and access actions of users who have access to the content. After the first user's permission to access the content is authenticated, the granularity of accessing the content in the first file is determined based on the access requirement information. After the first user's permission to access the content is authenticated, the computing engine determines the granularity of accessing the content in the first file, and then accesses the first file in different ways based on different granularities, thereby improving the security of accessing the first file. sex.
在另一种可能的实现方式中,该系统还包括联动权限模块,In another possible implementation, the system also includes a linkage authority module,
联动权限模块,用于基于第二权限信息生成第一权限信息,第一权限信息用于指示能够访问第一文件的文件路径的用户身份和访问操作,该文件路径用于指示第一文件的存储位置。从而可以自动生成第一权限信息,提高了得到第一权限信息的效率,降低得到第一权限信息的成本。A linkage permission module, configured to generate first permission information based on the second permission information. The first permission information is used to indicate the user identity and access operation of the file path that can access the first file. The file path is used to indicate the storage of the first file. Location. Therefore, the first authority information can be automatically generated, which improves the efficiency of obtaining the first authority information and reduces the cost of obtaining the first authority information.
第三方面,本申请提供了一种访问文件的装置,用于执行第一方面或第一方面的任意一种可能的实现方式中的方法。具体地,所述装置包括用于执行第一方面或第一方面的任意一种可能的实现方式中的方法的单元。In a third aspect, this application provides a device for accessing a file, for performing the method in the first aspect or any possible implementation of the first aspect. Specifically, the apparatus includes a unit for performing the method in the first aspect or any possible implementation of the first aspect.
第四方面,本申请提供了一种访问文件的装置,所述装置包括处理器和存储器。其中,所述处理器以及所述存储器之间可以通过内部连接相连。所述存储器用于存储程序,所述处理器用于执行所述存储器中的程序,使得所述装置完成第一方面或第一方面的任意可能的实现方式中的方法。In a fourth aspect, the present application provides a device for accessing files, where the device includes a processor and a memory. The processor and the memory may be connected through internal connections. The memory is used to store programs, and the processor is used to execute the programs in the memory, so that the device completes the method in the first aspect or any possible implementation of the first aspect.
第五方面,本申请提供了一种计算机程序产品,所述计算机程序产品包括在计算机可读存储介质中存储的计算机程序,并且所述计算程序通过处理器进行加载来实现上述第一方面或第一方面任意可能的实现方式的方法。In a fifth aspect, the present application provides a computer program product. The computer program product includes a computer program stored in a computer-readable storage medium, and the computing program is loaded by a processor to implement the first aspect or the third aspect. On the one hand any possible implementation method.
第六方面,本申请提供了一种计算机可读存储介质,用于存储计算机程序,所述计算机程序通过处理器进行加载来执行上述第一方面或第一方面任意可能的实现方式的方法。In a sixth aspect, the present application provides a computer-readable storage medium for storing a computer program, which is loaded by a processor to execute the method of the above-mentioned first aspect or any possible implementation of the first aspect.
第七方面,本申请提供了一种芯片,包括存储器和处理器,存储器用于存储计算机指令,处理器用于从存储器中调用并运行该计算机指令,以执行第一方面或第一方面任意可能的实现方式的方法。In a seventh aspect, this application provides a chip, including a memory and a processor. The memory is used to store computer instructions. The processor is used to call and run the computer instructions from the memory to execute the first aspect or any possible method of the first aspect. Ways to implement it.
图1是本申请实施例提供的一种访问系统的结构示意图;Figure 1 is a schematic structural diagram of an access system provided by an embodiment of the present application;
图2是本申请实施例提供的一种文件的示意图;Figure 2 is a schematic diagram of a file provided by an embodiment of the present application;
图3是本申请实施例提供的另一种访问系统的结构示意图;Figure 3 is a schematic structural diagram of another access system provided by an embodiment of the present application;
图4是本申请实施例提供的一种访问文件的方法流程图;Figure 4 is a flow chart of a method for accessing files provided by an embodiment of the present application;
图5是本申请实施例另一种访问系统的结构示意图;Figure 5 is a schematic structural diagram of another access system according to the embodiment of the present application;
图6是本申请实施例另一种访问系统的结构示意图;Figure 6 is a schematic structural diagram of another access system according to an embodiment of the present application;
图7是本申请实施例提供的一种获取第一权限信息的方法流程图;Figure 7 is a flow chart of a method for obtaining first permission information provided by an embodiment of the present application;
图8是本申请实施例提供的一种访问文件的装置结构示意图;Figure 8 is a schematic structural diagram of a device for accessing files provided by an embodiment of the present application;
图9是本申请实施例提供的另一种访问文件的装置结构示意图。Figure 9 is a schematic structural diagram of another device for accessing files provided by an embodiment of the present application.
下面将结合附图对本申请实施方式作进一步地详细描述。The embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
参见图1,本申请实施例提供了一种访问系统100,该访问系统100包括计算引擎101和对象文件存储系统102,计算引擎101和对象文件存储系统102通信。Referring to Figure 1, an embodiment of the present application provides an access system 100. The access system 100 includes a computing engine 101 and an object file storage system 102. The computing engine 101 communicates with the object file storage system 102.
在一些实施例中,该访问系统100是存算分离的数据库系统,其中,对象文件存储系统102用于负责数据存储,计算引擎101用于负责数据计算。In some embodiments, the access system 100 is a database system with separate storage and calculation, in which the object file storage system 102 is responsible for data storage, and the computing engine 101 is responsible for data calculation.
在一些实施例中,该访问系统100应用于存算分离的数据湖等场景,以及应用于大数据处理领域。In some embodiments, the access system 100 is applied in scenarios such as data lakes where storage and computing are separated, and in the field of big data processing.
其中,对象文件存储系统102用于存储至少一个文件,对于对象文件存储系统102中保存的任一个文件,该文件用于存储数据。The object file storage system 102 is used to store at least one file, and for any file saved in the object file storage system 102, the file is used to store data.
在一些实施例中,该文件可能为结构化数据文件,结构化数据文件采用列表形式来存储数据,所以结构化数据文件是一个数据表。对于该文件中的任一列,该列用于保存具有相同业务属性的数据。In some embodiments, the file may be a structured data file. The structured data file uses a list form to store data, so the structured data file is a data table. For any column in the file, the column holds data with the same business attributes.
对于结构化数据文件,该文件实质为数据表,该文件的标识信息为该数据表的标识信息。例如,该文件的标识信息为该文件的文件名,也就是说,该文件的标识信息为该数据表的表名。For a structured data file, the file is essentially a data table, and the identification information of the file is the identification information of the data table. For example, the identification information of the file is the file name of the file, that is, the identification information of the file is the table name of the data table.
例如,参见下表1所示的文件为结构化数据文件,文件是一个数据表,该数据表包括五列数据,该数据表用于存储公司信息,该数据表的表名为“Company information”,该文件的文件名也为“Company information”,该文件是公司信息表,表名和文件名相同。For example, the file shown in Table 1 below is a structured data file. The file is a data table. The data table includes five columns of data. The data table is used to store company information. The table name of the data table is "Company information" , the file name of this file is also "Company information". This file is a company information table, and the table name and file name are the same.
参见下表1,该文件的每列用于保存具有相同业务属性的数据。如下表1所示,第一列存储的数据均为行号,即第一列存储的每个数据具有的业务属性均为行号。第二列存储的数据均为公司名称,即第二列存储的每个数据具有的业务属性均为公司名称。第三列存储的数据均为行业名称,即第三列存储的每个数据具有的业务属性均为行业名称。第四列存储的数 据均为城市,即第四列存储的每个数据具有的业务属性均为城市名称。第五列存储的数据均为国家,即第五列存储的每个数据具有的业务属性均为国家名称。See Table 1 below, each column of this file is used to save data with the same business attributes. As shown in Table 1 below, the data stored in the first column are all row numbers, that is, the business attributes of each data stored in the first column are row numbers. The data stored in the second column are all company names, that is, the business attributes of each data stored in the second column are company names. The data stored in the third column are all industry names, that is, the business attributes of each data stored in the third column are industry names. The data stored in the fourth column are all cities, that is, the business attributes of each data stored in the fourth column are city names. The data stored in the fifth column are all countries, that is, the business attributes of each data stored in the fifth column are country names.
表1:Company information(公司信息)Table 1: Company information (company information)
在一些实施例中,该文件为半结构化数据文件,该文件包括至少一个数据片段,对于该文件中的任一个数据片段,该数据片段用于保存具有相同业务属性的数据。In some embodiments, the file is a semi-structured data file, and the file includes at least one data fragment. For any data fragment in the file, the data fragment is used to save data with the same business attributes.
例如,参见图2所示的半结构化数据文件,该半结构化数据文件包括四个数据片段,分别为第一数据片段、第二数据片段、第三数据片段和第四数据片段。第一数据片段保存的数据均为公司名称,如第一数据片段保存的数据包括“公司1”、“公司2”、“公司3”、“公司4”、“公司5”和“公司6”,即第一数据片段存储的每个数据具有的业务属性均为公司名称。第二数据片段保存的数据均为行业名称,如第二数据片段保存的数据包括“互联网”、“互联网”、“通信”、“物流”、“通信”和“物流”,即第二数据片段存储的每个数据具有的业务属性均为行业名称。第三数据片段保存的数据均为城市,如第三数据片段保存的数据包括“城市1”、“城市1”、“城市2”、“城市2”、“城市1”和“城市3”,即第三数据片段存储的每个数据具有的业务属性均为城市名称。第四数据片段保存的数据均为国家,如第四数据片段保存的数据包括“国家1”、“国家2”、“国家1”、“国家1”、“国家1”和“国家3”,即第四数据片段存储的每个数据具有的业务属性均为国家名称。For example, referring to the semi-structured data file shown in Figure 2, the semi-structured data file includes four data fragments, namely a first data fragment, a second data fragment, a third data fragment and a fourth data fragment. The data saved in the first data fragment are all company names. For example, the data saved in the first data fragment include "Company 1", "Company 2", "Company 3", "Company 4", "Company 5" and "Company 6" , that is, the business attribute of each data stored in the first data fragment is the company name. The data saved in the second data fragment are all industry names. For example, the data saved in the second data fragment include "Internet", "Internet", "Communication", "Logistics", "Communication" and "Logistics", that is, the second data fragment The business attribute of each data stored is the industry name. The data saved in the third data fragment are all cities. For example, the data saved in the third data fragment include "City 1", "City 1", "City 2", "City 2", "City 1" and "City 3". That is, the business attributes of each data stored in the third data fragment are city names. The data stored in the fourth data fragment are all countries. For example, the data stored in the fourth data fragment include "Country 1", "Country 2", "Country 1", "Country 1", "Country 1" and "Country 3". That is, the business attributes of each data stored in the fourth data fragment are country names.
在一些实施例中,半结构化数据文件为可扩展标记语言(extensible markup language,XML)文件等,该XML文件中的标签块为数据片段。In some embodiments, the semi-structured data file is an extensible markup language (XML) file, etc., and the tag blocks in the XML file are data fragments.
在一些实施例中,该文件还可能包括至少一个分区。In some embodiments, the file may also include at least one partition.
在一些实施例中,该文件存储在对象文件存储系统102中,且该文件的文件路径用于指示该文件在对象文件存储系统102中的存储位置。例如,假设如表1所示的文件的文件路径为“C:\windows\system32\Company information”,该文件路径用于指示如表1所示的文件在对象文件存储系统102中的存储位置。In some embodiments, the file is stored in the object file storage system 102, and the file path of the file is used to indicate the storage location of the file in the object file storage system 102. For example, assume that the file path of the file shown in Table 1 is "C:\windows\system32\Company information". This file path is used to indicate the storage location of the file shown in Table 1 in the object file storage system 102.
参见图1,第一用户有访问文件中的内容的需求,该文件为存储在对象文件存储系统102中的文件。第一用户访问该文件中的内容的粒度可能是第一粒度,也可能是第二粒度,第二粒度小于第一粒度。Referring to FIG. 1 , a first user has a need to access content in a file, which is a file stored in the object file storage system 102 . The granularity with which the first user accesses the content in the file may be the first granularity or the second granularity, and the second granularity is smaller than the first granularity.
在一些实施例中,第一粒度可能为文件粒度,即第一用户需要访问该文件的全部内容;或者,第一粒度可能为分区粒度,即第一用户需要访问该文件的分区。文件粒度是指需要访问第一文件的全部内容。分区粒度是指需要访问第一文件中的一个分区中的全部内容。In some embodiments, the first granularity may be a file granularity, that is, the first user needs to access the entire content of the file; or the first granularity may be a partition granularity, that is, the first user needs to access a partition of the file. File granularity refers to the need to access the entire contents of the first file. Partition granularity refers to the need to access the entire contents of a partition in the first file.
在一些实施例中,该文件为结构化数据文件,第二粒度为行列粒度,即第一用户需要访 问该文件的至少一行和/或至少一列。或者,该文件为半结构化数据文件,第二粒度为数据片段粒度,即第一用户需要访问该文件中的数据片段。In some embodiments, the file is a structured data file, and the second granularity is row-column granularity, that is, the first user needs to access at least one row and/or at least one column of the file. Alternatively, the file is a semi-structured data file, and the second granularity is data fragment granularity, that is, the first user needs to access the data fragments in the file.
所以可以向用提供文件粒度的访问服务、或者、向用户提供分区粒度的访问服务、或者、向用户提供行列粒度的访问服务、或者、向用户提供数据片段粒度的访问服务,丰富了向用户提供的访问服务。Therefore, it is possible to provide users with access services at file granularity, or provide users with access services at partition granularity, or provide users with access services at row and column granularity, or provide users with access services at data fragment granularity, which enriches the provision of services to users. access services.
第一用户是具有数据访问需求的用户,又可以称为业务用户。可选地,第一用户可能是应用程序等。The first user is a user with data access requirements, and can also be called a business user. Alternatively, the first user may be an application or the like.
第一用户在需要访问数据时,向计算引擎101发送数据访问请求,该数据访问请求包括访问需求信息,该访问需求信息用于指示第一用户需要访问的第一文件中的内容,第一文件为存储在对象文件存储系统102中的一个文件。When the first user needs to access data, he sends a data access request to the computing engine 101. The data access request includes access requirement information. The access requirement information is used to indicate the content of the first file that the first user needs to access. The first file is a file stored in the object file storage system 102.
计算引擎101,用于接收该数据访问请求,基于该访问需求信息确定访问第一文件中的内容的粒度,在该确定的粒度为第一粒度时,基于第一用户的账号信息和该访问需求信息,访问第一文件。在该确定的粒度为第二粒度时,基于指定的管理员账号信息和该访问需求信息,访问第一文件,第二粒度小于第一粒度。The computing engine 101 is configured to receive the data access request, and determine the granularity of accessing the content in the first file based on the access requirement information. When the determined granularity is the first granularity, based on the account information of the first user and the access requirement Information, access the first file. When the determined granularity is the second granularity, the first file is accessed based on the specified administrator account information and the access requirement information, and the second granularity is smaller than the first granularity.
在一些实施例中,计算引擎101包括接口,第一用户能够调用该接口,通过该接口向计算引擎101发送该数据访问请求。可选地,该接口包括Java数据库连接(Java database connectivity,JDBC)接口或开放数据库连接(open database connectivity,ODBC)接口等。In some embodiments, the computing engine 101 includes an interface that the first user can call to send the data access request to the computing engine 101 through the interface. Optionally, the interface includes a Java database connectivity (JDBC) interface or an open database connectivity (ODBC) interface.
在一些实施例中,第一文件为结构化数据文件,该访问需求信息包括第一文件的标识信息,且该访问需求信息不包括第一文件中的分区的标识信息和第一信息,第一信息用于指示第一文件中的至少一列和/或至少一行。计算引擎101基于该访问需求信息确定的粒度为文件粒度。In some embodiments, the first file is a structured data file, the access requirement information includes identification information of the first file, and the access requirement information does not include identification information and first information of the partition in the first file, and the first The information is used to indicate at least one column and/or at least one row in the first file. The granularity determined by the computing engine 101 based on the access requirement information is file granularity.
在一些实施例中,第一文件为半结构化数据文件,该访问需求信息包括第一文件的标识信息,且该访问需求信息不包括第一文件中的分区的标识信息和第一文件中的数据片段的标识信息。计算引擎101基于该访问需求信息确定的粒度为文件粒度。In some embodiments, the first file is a semi-structured data file, the access requirement information includes the identification information of the first file, and the access requirement information does not include the identification information of the partition in the first file and the identification information of the partition in the first file. Identification information of the data fragment. The granularity determined by the computing engine 101 based on the access requirement information is file granularity.
在一些实施例中,第一文件为结构化数据文件,该访问需求信息包括第一文件的标识信息和第一文件中的分区的标识信息,且该访问需求信息不包括第一信息。计算引擎101基于该访问需求信息确定的粒度为分区粒度。In some embodiments, the first file is a structured data file, the access requirement information includes identification information of the first file and identification information of partitions in the first file, and the access requirement information does not include the first information. The granularity determined by the computing engine 101 based on the access requirement information is partition granularity.
在一些实施例中,第一文件为半结构化数据文件,该访问需求信息包括第一文件的标识信息和第一文件中的分区的标识信息,且该访问需求信息不包括第一文件中的数据片段的标识信息。计算引擎101基于该访问需求信息确定的粒度为分区粒度。In some embodiments, the first file is a semi-structured data file, the access requirement information includes identification information of the first file and identification information of partitions in the first file, and the access requirement information does not include identification information of the first file. Identification information of the data fragment. The granularity determined by the computing engine 101 based on the access requirement information is partition granularity.
在一些实施例中,第一文件为结构化数据文件,该访问需求信息包括第一文件的标识信息和第一信息,第一信息用于指示第一文件中的至少一列和/或至少一行。计算引擎101基于该访问需求信息确定的粒度为行列粒度。In some embodiments, the first file is a structured data file, and the access requirement information includes identification information of the first file and first information, and the first information is used to indicate at least one column and/or at least one row in the first file. The granularity determined by the computing engine 101 based on the access requirement information is row-column granularity.
在一些实施例中,第一文件为半结构化数据文件,该访问需求信息包括第一文件的标识信息和第一文件中的数据片段的标识信息。计算引擎101基于该访问需求信息确定的粒度为数据片段粒度。In some embodiments, the first file is a semi-structured data file, and the access requirement information includes identification information of the first file and identification information of the data fragments in the first file. The granularity determined by the computing engine 101 based on the access requirement information is the data fragment granularity.
在一些实施例中,参见图1和图3,计算引擎101包括计算模块1011和路由模块1012,计算引擎101中的计算模块1011接收该数据访问请求,计算引擎101中的路由模块1012基于该访问需求信息,确定访问第一文件中的内容的粒度。In some embodiments, referring to Figures 1 and 3, the computing engine 101 includes a computing module 1011 and a routing module 1012. The computing module 1011 in the computing engine 101 receives the data access request, and the routing module 1012 in the computing engine 101 is based on the access request. Requirement information determines the granularity of accessing the content in the first file.
在一些实施例中,该访问需求信息还包括第一操作类型,第一操作类型用于指示第一用户访问第一文件的第一访问操作。可选地,第一访问操作包括查询、更新、插入或删除等。In some embodiments, the access requirement information further includes a first operation type, and the first operation type is used to indicate a first access operation for the first user to access the first file. Optionally, the first access operation includes query, update, insert or delete, etc.
在一些实施例中,计算引擎101访问第一文件的操作可能是:查询第一文件中的内容,向第一用户返回查询的内容。或者,该访问需求信息包括待更新内容,计算引擎101访问第一文件的操作可能是:将第一文件中的全部内容或部分内容更新为待更新内容。或者,该访问需求信息包括待插入内容,计算引擎101访问第一文件的操作可能是:是向第一文件中插入待插入内容。或者,计算引擎101访问第一文件的操作可能是:删除第一文件中的全部内容或部分内容等。In some embodiments, the operation of the computing engine 101 to access the first file may be: query the content in the first file, and return the queried content to the first user. Alternatively, the access requirement information includes content to be updated, and the operation of the computing engine 101 to access the first file may be: updating all or part of the content in the first file to the content to be updated. Alternatively, the access requirement information includes content to be inserted, and the operation of the computing engine 101 to access the first file may be: inserting the content to be inserted into the first file. Alternatively, the operation of the computing engine 101 to access the first file may be: deleting all or part of the content in the first file, etc.
在一些实施例中,该数据访问请求还包括第一用户的账号信息。或者,该数据访问请求也可不包括第一用户的账号信息。计算引擎101与第一用户之间的通信连接与第一用户的账号信息绑定,计算引擎101获取与该通信连接绑定的第一用户的账号信息。可选地,该通信连接为第一用户与计算引擎101之间的会话(session)。In some embodiments, the data access request also includes the first user's account information. Alternatively, the data access request may not include the first user's account information. The communication connection between the computing engine 101 and the first user is bound to the first user's account information, and the computing engine 101 obtains the first user's account information bound to the communication connection. Optionally, the communication connection is a session between the first user and the computing engine 101.
在一些实施例中,该访问系统100包括一个或多个计算引擎101。可选地,计算引擎101为Hive引擎或Spark引擎等,即该访问系统100包括一个或多个Hive引擎,和/或,一个或多个Spark引擎等。In some embodiments, the access system 100 includes one or more computing engines 101 . Optionally, the computing engine 101 is a Hive engine or a Spark engine, that is, the access system 100 includes one or more Hive engines, and/or, one or more Spark engines, etc.
在一些实施例中,Hive引擎是基于Hadoop(是一种分布式系统基础架构)的一个数据仓库工具,可以将结构化的数据文件映射为一张表,并提供查询功能。In some embodiments, the Hive engine is a data warehouse tool based on Hadoop (a distributed system infrastructure), which can map structured data files into a table and provide query functions.
在一些实施例中,Spark引擎是专为大规模数据处理而设计的快速通用的计算引擎。In some embodiments, the Spark engine is a fast and general computing engine designed for large-scale data processing.
在一些实施例中,参见图3,该访问系统100还包括文件路径鉴权模块103,文件路径鉴权模块103分别与计算引擎101和对象文件存储系统102通信。In some embodiments, referring to Figure 3, the access system 100 also includes a file path authentication module 103. The file path authentication module 103 communicates with the computing engine 101 and the object file storage system 102 respectively.
计算引擎101,用于在该确定的粒度为第一粒度,向文件路径鉴权模块103发送鉴权请求,该鉴权请求包括鉴权信息,该鉴权信息用于指示第一用户、第一文件的文件路径和第一用户访问该文件路径的第二访问操作,该鉴权信息是基于该访问需求信息和第一用户的账号信息得到的;The computing engine 101 is configured to send an authentication request to the file path authentication module 103 when the determined granularity is the first granularity. The authentication request includes authentication information, and the authentication information is used to indicate the first user, the first The file path of the file and the second access operation of the first user to access the file path, the authentication information is obtained based on the access requirement information and the first user's account information;
文件路径鉴权模块103,用于接收该鉴权请求,基于第一权限信息和该鉴权信息对第一用户采用第二访问操作访问该文件路径的权限进行鉴权,第一权限信息用于指示能够访问该文件路径的用户身份和能够访问该文件路径的第三访问操作,在对该权限鉴权通过后向计算引擎101发送鉴权响应,该鉴权响应包括临时凭证,以及向对象文件存储系统102发送存储信息,该存储信息包括该临时凭证、该文件路径和第二操作类型,第二操作类型为第二访问操作的操作类型;The file path authentication module 103 is configured to receive the authentication request, and authenticate the first user's permission to access the file path using the second access operation based on the first permission information and the authentication information. The first permission information is used to Indicates the identity of the user who can access the file path and the third access operation that can access the file path. After passing the authorization authentication, an authentication response is sent to the computing engine 101. The authentication response includes a temporary credential and a request to the object file. The storage system 102 sends storage information, which includes the temporary credential, the file path, and a second operation type. The second operation type is the operation type of the second access operation;
对象文件存储系统102,用于接收该存储信息,对应保存该临时凭证、该文件路径和第二操作类型;The object file storage system 102 is used to receive the storage information and correspondingly save the temporary voucher, the file path and the second operation type;
计算引擎101,还用于基于该临时凭证、该访问需求信息和该文件路径,访问第一文件。The computing engine 101 is also used to access the first file based on the temporary credential, the access requirement information and the file path.
第二访问操作是对第一访问操作进行映射得到的操作,第二访问操作是能够访问对象文件存储系统102的操作。通常第二访问操作包括读操作和/或写操作。The second access operation is an operation obtained by mapping the first access operation, and the second access operation is an operation capable of accessing the object file storage system 102 . Usually the second access operation includes a read operation and/or a write operation.
例如,第一访问操作为查询,对查询操作映射得到的第二访问操作为读操作。假设需要查询第一文件中的内容,则从对象文件存储系统102中读取第一文件,从读取的第一文件中获取需要查询的内容。For example, the first access operation is a query, and the second access operation mapped to the query operation is a read operation. Assuming that the content in the first file needs to be queried, the first file is read from the object file storage system 102, and the content that needs to be queried is obtained from the read first file.
再例如,第一访问操作为更新,对更新操作映射得到的第二访问操作包括读操作和写操 作。假设需要将第一文件中的部分内容更新为待更新内容,则从对象文件存储系统102中读取第一文件,将第一文件中的该部分内容更新为待更新内容,将更新后的第一文件写入对象文件存储系统102,以覆盖对象文件存储系统102中已保存的第一文件。For another example, the first access operation is an update, and the second access operation mapped to the update operation includes a read operation and a write operation. Assume that part of the content in the first file needs to be updated to the content to be updated, then the first file is read from the object file storage system 102, the part of the content in the first file is updated to the content to be updated, and the updated third file is updated to the content to be updated. A file is written to the object file storage system 102 to overwrite the first file saved in the object file storage system 102 .
在一些实施例中,该鉴权信息包括第一用户的用户身份、第一文件的文件路径和第二操作类型。其中,第一用户的用户身份是计算引擎101基于第一用户的账号信息得到的,第二操作类型是对第一操作类型进行映射得到的,第一文件的文件路径是计算引擎101基于第一文件的标识信息得到的。In some embodiments, the authentication information includes the user identity of the first user, the file path of the first file, and the second operation type. Among them, the user identity of the first user is obtained by the computing engine 101 based on the first user's account information, the second operation type is obtained by mapping the first operation type, and the file path of the first file is obtained by the computing engine 101 based on the first operation type. The identification information of the file is obtained.
在一些实施例中,该鉴权信息包括第一用户的账号信息、第一文件的标识信息和第一操作类型。In some embodiments, the authentication information includes account information of the first user, identification information of the first file, and the first operation type.
在一些实施例中,第一用户的用户身份包括第一用户属于的用户组和/或第一用户的角色等。In some embodiments, the user identity of the first user includes a user group to which the first user belongs and/or a role of the first user, etc.
在一些实施例中,第一权限信息包括该文件路径,能够访问该文件路径的用户身份和第三操作类型,第三操作类型为能够访问该文件路径的第三访问操作的类型。In some embodiments, the first permission information includes the file path, a user identity that can access the file path, and a third operation type, and the third operation type is a type of a third access operation that can access the file path.
在一些实施例中,参见图3,该访问系统100还包括联动权限模块104,联动权限模块104分别与计算引擎101和文件路径鉴权模块103通信。联动权限模块104保存有上述第一权限信息。In some embodiments, referring to Figure 3, the access system 100 also includes a linkage authority module 104, which communicates with the computing engine 101 and the file path authentication module 103 respectively. The linkage authority module 104 stores the above-mentioned first authority information.
文件路径鉴权模块103接收该鉴权请求后,基于该鉴权请求包括的鉴权信息获取第一文件的文件路径、第一用户的用户身份和第二访问操作的第二操作类型,从联动权限模块104中获取包括该文件路径的第一权限信息。如果第一用户的用户身份与第一权限信息包括的用户身份相同以及第二访问操作的第二操作类型与第一权限信息包括的第三访问操作的第三操作类型相同,则对该权限鉴权通过,表示第一用户有权限采用第二访问操作访问该文件路径。After receiving the authentication request, the file path authentication module 103 obtains the file path of the first file, the user identity of the first user and the second operation type of the second access operation based on the authentication information included in the authentication request. The permission module 104 obtains the first permission information including the file path. If the user identity of the first user is the same as the user identity included in the first permission information and the second operation type of the second access operation is the same as the third operation type of the third access operation included in the first permission information, then the permission is authenticated. Passed permission means that the first user has permission to use the second access operation to access the file path.
在一些实施例中,联动权限模块104包括第一读写接口,文件路径鉴权模块103调用联动权限模块104的第一读写接口,通过第一读写接口从联动权限模块104中获取包括该文件路径的第一权限信息。In some embodiments, the linkage permission module 104 includes a first read-write interface. The file path authentication module 103 calls the first read-write interface of the linkage permission module 104, and obtains the information including the linkage permission module 104 from the linkage permission module 104 through the first read-write interface. The first permission information of the file path.
在一些实施例中,该鉴权信息包括第一用户的用户身份、第一文件的文件路径和第二操作类型,文件路径鉴权模块103直接从该鉴权信息中获取第一文件的文件路径、第一用户的用户身份和第二访问操作的第二操作类型。In some embodiments, the authentication information includes the user identity of the first user, the file path of the first file, and the second operation type. The file path authentication module 103 directly obtains the file path of the first file from the authentication information. , the user identity of the first user and the second operation type of the second access operation.
在一些实施例中,该鉴权信息包括第一用户的账号信息、第一文件的标识信息和第一操作类型,文件路径鉴权模块103基于第一用户的账号信息获取第一用户的用户身份,对第一操作类型进行映射得到第二操作类型,基于第一文件的标识信息获取第一文件的文件路径。In some embodiments, the authentication information includes the first user's account information, the first file's identification information, and the first operation type. The file path authentication module 103 obtains the first user's user identity based on the first user's account information. , map the first operation type to obtain the second operation type, and obtain the file path of the first file based on the identification information of the first file.
在一些实施例中,参见图3,该访问系统100还包括数据过滤引擎105,数据过滤引擎105包括指定的管理员账号信息;数据过滤引擎105分别与计算引擎101和对象文件存储系统102通信。In some embodiments, referring to Figure 3, the access system 100 also includes a data filtering engine 105, which includes specified administrator account information; the data filtering engine 105 communicates with the computing engine 101 and the object file storage system 102 respectively.
计算引擎101,用于在该确定的粒度为第二粒度,向数据过滤引擎105发送访问指令,该访问指令包括第一文件的文件路径和该访问需求信息;The computing engine 101 is configured to send an access instruction to the data filtering engine 105 when the determined granularity is the second granularity, where the access instruction includes the file path of the first file and the access requirement information;
数据过滤引擎105,用于基于该管理员账号信息、该文件路径和该访问需求信息,访问第一文件。The data filtering engine 105 is used to access the first file based on the administrator account information, the file path and the access requirement information.
可选地,数据过滤引擎105还与联动权限模块104通信。Optionally, the data filtering engine 105 also communicates with the linkage authority module 104.
参见图3,在一些实施例中,计算引擎101的计算模块1011在接收该数据访问请求后, 基于第二权限信息、第一用户的账号信息和该访问需求信息,对第一用户访问该内容的权限进行鉴权。第二权限信息用于指示能够访问该内容的用户身份和第四访问操作。Referring to Figure 3, in some embodiments, after receiving the data access request, the computing module 1011 of the computing engine 101 allows the first user to access the content based on the second permission information, the first user's account information and the access requirement information. Authentication of permissions. The second permission information is used to indicate the user identity and the fourth access operation that can access the content.
在鉴权通过后,如果计算引擎101的路由模块1012确定的粒度为第一粒度,计算引擎101的路由模块1012向文件路径鉴权模块103发送鉴权请求。如果计算引擎101的路由模块1012确定的粒度为第二粒度,计算引擎101的路由模块1012向数据过滤引擎105发送访问指令。After the authentication is passed, if the granularity determined by the routing module 1012 of the computing engine 101 is the first granularity, the routing module 1012 of the computing engine 101 sends an authentication request to the file path authentication module 103 . If the granularity determined by the routing module 1012 of the computing engine 101 is the second granularity, the routing module 1012 of the computing engine 101 sends an access instruction to the data filtering engine 105 .
在一些实施例中,第二权限信息包括该内容的内容标识信息,能够访问该内容的用户身份和能够访问该内容的第四访问操作的第四操作类型。In some embodiments, the second permission information includes content identification information of the content, a user identity that can access the content, and a fourth operation type of a fourth access operation that can access the content.
在一些实施例中,联动权限模块104保存有第二权限信息,联动权限模块104包括第二读写接口。计算引擎101的计算模块1011接收该数据访问请求后,基于该访问需求信息获取该内容的内容标识信息,基于第一用户的账号信息获取第一用户的用户身份。调用联动权限模块104中的第二读写接口,通过第二读写接口从联动权限模块104中获取包括该内容标识信息的第二权限信息。如果第一用户的用户身份与第二权限信息包括的用户身份相同以及第一访问操作的第一操作类型与第二权限信息包括的第四访问操作的第四操作类型相同,则对第一用户访问该内容的权限鉴权通过,表示第一用户有权限访问该内容。In some embodiments, the linkage permission module 104 stores the second permission information, and the linkage permission module 104 includes a second read-write interface. After receiving the data access request, the computing module 1011 of the computing engine 101 obtains the content identification information of the content based on the access requirement information, and obtains the user identity of the first user based on the first user's account information. The second read-write interface in the linkage permission module 104 is called, and the second permission information including the content identification information is obtained from the linkage permission module 104 through the second read-write interface. If the user identity of the first user is the same as the user identity included in the second permission information and the first operation type of the first access operation is the same as the fourth operation type of the fourth access operation included in the second permission information, then the first user If the authorization to access the content is passed, it means that the first user has the authorization to access the content.
该内容的内容标识信息是该访问需求信息中的部分内容。The content identification information of the content is part of the access requirement information.
在一些实施例中,该内容为第一文件的全部内容时,该内容的内容标识信息包括第一文件的标识信息。或者,该内容为第一文件的分区时,该内容的内容标识信息包括第一文件的标识信息和该分区的标识信息。或者,该内容为第一文件中的至少一列或至少一行时,该内容的内容标识信息包括第一文件的标识信息和该至少一列的列标识,或者,该内容的内容标识信息包括第一文件的标识信息和该至少一行的行号。或者,该内容为第一文件中的至少一个数据片段,该内容的内容标识信息包括第一文件的标识信息和该至少一个数据片段中的每个数据片段的标识信息。In some embodiments, when the content is the entire content of the first file, the content identification information of the content includes identification information of the first file. Alternatively, when the content is a partition of the first file, the content identification information of the content includes identification information of the first file and identification information of the partition. Alternatively, when the content is at least one column or at least one row in the first file, the content identification information of the content includes the identification information of the first file and the column identification of the at least one column, or the content identification information of the content includes the first file The identification information and the line number of at least one line. Alternatively, the content is at least one data fragment in the first file, and the content identification information of the content includes identification information of the first file and identification information of each data fragment in the at least one data fragment.
在一些实施例中,对于对第一用户访问该内容的权限进行鉴权的鉴权操作,以及对于确定访问第一文件中的该内容的粒度的确定操作,计算引擎101的计算模块1011在接收该数据访问请求后,可能先执行该鉴权操作,然后计算引擎101的路由模块102再执行该确定操作,即计算引擎101的计算模块1011可能先对第一用户访问该内容的权限进行鉴权。在鉴权通过后,计算引擎101的路由模块1012基于该访问需求信息确定访问第一文件中的该内容的粒度。或者,In some embodiments, for the authentication operation of authenticating the first user's permission to access the content, and for the determination operation of determining the granularity of accessing the content in the first file, the computing module 1011 of the computing engine 101 receives After the data access request, the authentication operation may be performed first, and then the routing module 102 of the computing engine 101 may perform the determination operation. That is, the computing module 1011 of the computing engine 101 may first authenticate the first user's permission to access the content. . After the authentication is passed, the routing module 1012 of the computing engine 101 determines the granularity of accessing the content in the first file based on the access requirement information. or,
计算引擎101的计算模块1011在接收该数据访问请求后,计算引擎101的路由模块1012可能先执行该确定操作,然后计算引擎101的计算模块1011再执行该鉴权操作。即计算引擎101的路由模块1012可能先基于该访问需求信息确定访问第一文件中的该内容的粒度,然后计算引擎101的计算模块1011对第一用户访问该内容的权限进行鉴权。或者,After the computing module 1011 of the computing engine 101 receives the data access request, the routing module 1012 of the computing engine 101 may first perform the determination operation, and then the computing module 1011 of the computing engine 101 may perform the authentication operation. That is, the routing module 1012 of the computing engine 101 may first determine the granularity of accessing the content in the first file based on the access requirement information, and then the computing module 1011 of the computing engine 101 authenticates the first user's permission to access the content. or,
计算引擎101的计算模块1011在接收该数据访问请求后,计算引擎101的计算模块1011执行该鉴权操作,同时计算引擎101的路由模块1012执行该确定操作,即该鉴权操作和该确定操作同时执行。After the computing module 1011 of the computing engine 101 receives the data access request, the computing module 1011 of the computing engine 101 performs the authentication operation, and at the same time, the routing module 1012 of the computing engine 101 performs the determination operation, that is, the authentication operation and the determination operation. executed simultaneously.
在一些实施例中,参见图3,该访问系统100还包括身份认证中心106,该身份认证中心106用于保存用户的账号信息与用户身份的对应关系,In some embodiments, referring to Figure 3, the access system 100 also includes an identity authentication center 106, which is used to save the corresponding relationship between the user's account information and the user's identity.
在一些实施例中,计算引擎101的计算模块101获取第一用户的用户身份的操作为:计 算引擎101的计算模块101基于第一用户的账号信息,从身份认证中心106中查询第一用户的用户身份。In some embodiments, the operation of the computing module 101 of the computing engine 101 to obtain the user identity of the first user is: the computing module 101 of the computing engine 101 queries the first user's identity from the identity authentication center 106 based on the first user's account information. User ID.
在一些实施例中,文件路径鉴权模块103获取第一用户的用户身份的操作为:文件路径鉴权模块103基于第一用户的账号信息,从身份认证中心106中查询第一用户的用户身份。In some embodiments, the file path authentication module 103 obtains the user identity of the first user as follows: the file path authentication module 103 queries the first user's user identity from the identity authentication center 106 based on the first user's account information. .
在一些实施例中,参见图3,该访问系统100还包括元数据中心107,元数据中心107用于接收并保存第二用户输入的第一文件的元数据,该元数据包括第一文件的标识信息,需要对第一文件进行操作的操作类型和第一文件的文件路径。可选地,该操作类型可能是创建第一文件,删除第一文件、查询第一文件或修改第一文件等。In some embodiments, referring to FIG. 3 , the access system 100 further includes a metadata center 107 , which is configured to receive and save the metadata of the first file input by the second user, where the metadata includes the metadata of the first file. Identification information, the type of operation that needs to be performed on the first file and the file path of the first file. Optionally, the operation type may be creating the first file, deleting the first file, querying the first file, or modifying the first file, etc.
在一些实施例中,第一文件为结构化数据文件,第一文件的元数据还包括如下一个或多个:第一文件中的每列的列标识、第一文件中的每列的列类型,第一文件的行分隔符,或者,第一文件的列分隔符等。In some embodiments, the first file is a structured data file, and the metadata of the first file also includes one or more of the following: a column identifier of each column in the first file, a column type of each column in the first file , the row separator of the first file, or the column separator of the first file, etc.
在一些实施例中,第一文件为半结构化数据文件,第一文件的元数据还包括如下一个或多个:第一文件中的每个数据片段的标识信息,第一文件中的每个数据片段的类型,或者,第一文件的行分隔符等,该行分隔符用于区分第一文件中的任一个数据片段中的每行数据。In some embodiments, the first file is a semi-structured data file, and the metadata of the first file also includes one or more of the following: identification information of each data fragment in the first file, each The type of the data fragment, or the line delimiter of the first file, etc. The line delimiter is used to distinguish each line of data in any data fragment in the first file.
在一些实施例中,计算引擎101获取第一文件的文件路径的操作为:计算引擎101从元数据中心107中获取包括第一文件的标识信息的元数据,从该元数据中获取第一文件的文件路径。In some embodiments, the operation of the computing engine 101 to obtain the file path of the first file is: the computing engine 101 obtains metadata including the identification information of the first file from the metadata center 107, and obtains the first file from the metadata. file path.
在一些实施例中,文件路径鉴权模块103获取第一文件的文件路径的操作为:文件路径鉴权模块103从元数据中心107中获取包括第一文件的标识信息的元数据,从该元数据中获取第一文件的文件路径。In some embodiments, the operation of the file path authentication module 103 to obtain the file path of the first file is: the file path authentication module 103 obtains metadata including the identification information of the first file from the metadata center 107, and obtains the metadata from the metadata center 107. Get the file path of the first file in the data.
在一些实施例中,元数据中心107向第二用户显示第一界面,第二用户可以在第一界面中输入第一文件的元数据,通过第一界面接收第二用户输入的第一文件的元数据。可选地,第一界面包括网络产品界面设计(website user interface,Web UI)等。In some embodiments, the metadata center 107 displays a first interface to the second user, in which the second user can input the metadata of the first file, and receives the metadata of the first file input by the second user through the first interface. metadata. Optionally, the first interface includes network product interface design (website user interface, Web UI), etc.
在一些实施例中,元数据中心107在接收第一文件的元数据时,还获取第二用户的账号信息,基于第二用户的账号信息对第一文件的元数据进行验证。在实现时,In some embodiments, when receiving the metadata of the first file, the metadata center 107 also obtains the account information of the second user, and verifies the metadata of the first file based on the account information of the second user. When implemented,
元数据中心107基于第二用户的账号信息验证第二用户的合法性。在验证第二用户合法时,获取第二用户的用户身份,基于第二用户的用户身份获取第二用户能够操作的操作类型,如果该元数据包括的需要对第一文件进行操作的操作类型是第二用户能够操作的操作类型,对第一文件的元数据验证通过,然后保存第一文件的元数据。The metadata center 107 verifies the legitimacy of the second user based on the second user's account information. When verifying that the second user is legitimate, the user identity of the second user is obtained, and the operation type that the second user can operate is obtained based on the user identity of the second user. If the metadata includes an operation type that needs to operate on the first file, The operation type that the second user can operate is to pass the verification of the metadata of the first file, and then save the metadata of the first file.
在一些实施例中,元数据中心107验证第二用户的合法性以及获取第二用户的用户身份的操作为:In some embodiments, the operation of the metadata center 107 to verify the legitimacy of the second user and obtain the user identity of the second user is:
身份认证中心106中保存有账号信息与用户身份的对应关系,元数据中心107查询身份认证中心106是否保存有第二用户的账号信息,如果身份认证中心106中保存有第二用户的账号信息,验证出第二用户是合法用户。基于第二用户的账号信息从身份认证中心106中查询第二用户的用户身份。The identity authentication center 106 stores the corresponding relationship between the account information and the user's identity. The metadata center 107 queries the identity authentication center 106 to see whether the second user's account information is stored in the identity authentication center 106. If the identity authentication center 106 stores the second user's account information, Verify that the second user is a legitimate user. Query the user identity of the second user from the identity authentication center 106 based on the second user's account information.
在一些实施例中,元数据中心10获取第二用户能够操作的操作类型的操作为:In some embodiments, the metadata center 10 obtains the operation type that the second user can operate as:
元数据中心107中保存有用户身份与操作类型的对应关系,元数据中心107基于第二用户的用户身份,从用户身份与操作类型的对应关系中获取对应的操作类型作为第二用户能够操作的操作类型。The metadata center 107 stores the corresponding relationship between the user identity and the operation type. Based on the user identity of the second user, the metadata center 107 obtains the corresponding operation type from the corresponding relationship between the user identity and the operation type as the operation type that the second user can operate. Operation type.
在一些实施例中,元数据中心107保存第一文件的元数据的操作为:元数据中心107查询是否已保存包括第一文件的标识信息的元数据,如果已保存该元数据,将已保存的元数据更新为第一文件的元数据。如果没有保存该元数据,直接保存第一文件的元数据。In some embodiments, the operation of the metadata center 107 to save the metadata of the first file is: the metadata center 107 queries whether the metadata including the identification information of the first file has been saved, and if the metadata has been saved, it will be saved. The metadata is updated to the metadata of the first file. If the metadata is not saved, the metadata of the first file is saved directly.
元数据中心107包括指定的管理员账号信息。在对第一文件的元数据验证通过后,如果该元数据包括的操作类型为创建第一文件,元数据中心107基于该指定的管理员账号信息,在对象文件存储系统102中创建第一文件的文件路径,该文件路径对应的存储位置用于保存第一文件。如果该元数据包括的操作类型为删除第一文件,元数据中心107基于该指定的管理员账号信息和第一文件的文件路径,在对象文件存储系统102中确定第一文件,并删除确定的第一文件。如果该元数据包括的操作类型为查询第一文件,元数据中心107基于该指定的管理员账号信息和第一文件的文件路径,在对象文件存储系统102中确定第一文件,获取第一文件的描述信息和/或属性信息等内容,向第二用户返回获取的内容。如果该元数据包括的操作类型为修改第一文件,元数据中心107基于该指定的管理员账号信息和第一文件的文件路径,在对象文件存储系统102中确定第一文件,修改第一文件的描述信息和/或属性信息等内容。 Metadata center 107 includes designated administrator account information. After the metadata of the first file is verified, if the operation type included in the metadata is to create the first file, the metadata center 107 creates the first file in the object file storage system 102 based on the specified administrator account information. The file path, the storage location corresponding to the file path is used to save the first file. If the operation type included in the metadata is to delete the first file, the metadata center 107 determines the first file in the object file storage system 102 based on the specified administrator account information and the file path of the first file, and deletes the determined file. First document. If the operation type included in the metadata is to query the first file, the metadata center 107 determines the first file in the object file storage system 102 based on the specified administrator account information and the file path of the first file, and obtains the first file. description information and/or attribute information, etc., and return the obtained content to the second user. If the operation type included in the metadata is to modify the first file, the metadata center 107 determines the first file in the object file storage system 102 based on the specified administrator account information and the file path of the first file, and modifies the first file. description information and/or attribute information.
参见图3,联动权限模块104还用于接收权限管理员配置的第二权限信息,第二权限信息用于指示能够访问第一文件中的内容的用户身份和第四访问操作。基于第二权限信息生成第一权限信息,第一权限信息用于指示能够访问第一文件的文件路径的用户身份和能够访问第一文件的文件路径的第三访问操作。保存第二权限信息和第一权限信息。Referring to FIG. 3 , the linkage permission module 104 is also configured to receive second permission information configured by the permission administrator. The second permission information is used to indicate the identity of the user who can access the content in the first file and the fourth access operation. The first permission information is generated based on the second permission information, and the first permission information is used to indicate a user identity that can access the file path of the first file and a third access operation that can access the file path of the first file. Save the second authority information and the first authority information.
在一些实施例中,联动权限模块104还用于从元数据中心107中获取第一文件的元数报,从身份认证中心106中获取至少一个用户身份,向权限管理员显示第二界面,第二界面包括第一文件的元数据和该至少一个用户身份。In some embodiments, the linkage permission module 104 is also used to obtain the metadata report of the first file from the metadata center 107, obtain at least one user identity from the identity authentication center 106, and display the second interface to the permission administrator. The second interface includes metadata of the first file and the at least one user identity.
这样,权限管理员从第一文件的元数据中选择第一文件中的内容的内容标识信息,从该至少一个用户身份中选择能够访问该内容的用户身份,以及向第二界面输入能够访问该内容的第四访问操作的第四操作类型,如此得到第二权限信息。第二权限信息包括该内容的内容标识信息、选择的用户身份和输入的第四操作类型。可选地,第二界面包括Web UI等。In this way, the rights administrator selects the content identification information of the content in the first file from the metadata of the first file, selects a user identity that can access the content from the at least one user identity, and inputs the user identity that can access the content into the second interface. The fourth operation type of the fourth access operation of the content is used to obtain the second permission information. The second permission information includes content identification information of the content, the selected user identity and the input fourth operation type. Optionally, the second interface includes Web UI, etc.
在一些实施例中,第一文件为结构化数据文件,第一文件的元数据包括第一文件的文件标识和第一文件的每列的列标识,该内容的内容标识信息包括第一文件的文件标识,或者,该内容的内容标识信息包括第一文件的文件标识和第一文件中的至少一列的列标识,或者,该内容的内容标识信息包括第一文件的文件标识和第一文件中的至少一行的行号。In some embodiments, the first file is a structured data file, the metadata of the first file includes a file identifier of the first file and a column identifier of each column of the first file, and the content identification information of the content includes a file identifier of the first file. File identification, or the content identification information of the content includes the file identification of the first file and the column identification of at least one column in the first file, or the content identification information of the content includes the file identification of the first file and the column identification of at least one column in the first file. The line number of at least one line.
在一些实施例中,第一文件为半结构化数据文件,第一文件的元数据包括第一文件的文件标识和第一文件的每个数据片段的标识信息,该内容的内容标识信息包括第一文件的文件标识,或者,该内容标识信息包括第一文件的文件标识和第一文件中的至少一个数据片段的标识信息。In some embodiments, the first file is a semi-structured data file, the metadata of the first file includes a file identification of the first file and identification information of each data fragment of the first file, and the content identification information of the content includes the third file. The file identification of a file, or the content identification information includes the file identification of the first file and the identification information of at least one data fragment in the first file.
在一些实施例中,联动权限模块104生成第一权限信息的操作为:In some embodiments, the operation of the linkage permission module 104 to generate the first permission information is:
(1):联动权限模块104基于第二权限信息中的该内容的内容标识信息,获取第一文件的文件路径。(1): The linkage permission module 104 obtains the file path of the first file based on the content identification information of the content in the second permission information.
在一些实施例中,该内容的内容标识信息包括第一文件的标识信息,从元数据中心107中获取包括第一文件的标识信息的元数据,该元数据为第一文件的元数据,从第一文件的元数据中获取第一文件的文件路径。In some embodiments, the content identification information of the content includes identification information of the first file, and metadata including the identification information of the first file is obtained from the metadata center 107. The metadata is metadata of the first file, from Obtain the file path of the first file from the metadata of the first file.
(2):联动权限模块104对第二权限信息包括的第四操作类型进行映射,得到第三操作类型。(2): The linkage authority module 104 maps the fourth operation type included in the second authority information to obtain the third operation type.
(3):联动权限模块104从第二权限信息中读取用户身份,将第一文件的文件路径、该用户身份和第三操作类型组成第一权限信息。(3): The linkage permission module 104 reads the user identity from the second permission information, and combines the file path of the first file, the user identity and the third operation type into the first permission information.
在本申请实施例中,计算引擎接收数据访问请求,基于该数据访问请求中的访问需求信息确定访问第一文件中的内容的粒度,在确定的粒度为第一粒度时,基于第一用户的账号信息和该访问需求信息,访问第一文件。在确定的粒度为第二粒度时,基于指定的管理员账号信息和该访问需求信息,访问第一文件。这样可以向用户提供第一粒度的访问服务以及向用户提供第二粒度的访问服务,丰富了向用户提供的访问服务。由于第一粒度大于第二粒度,计算引擎在确定的粒度为第一粒度时,基于第一用户的账号信息和该访问需求信息访问第一文件,这样不用借用管理员账号信息来访问第一文件,提高访问第一文件的效率以及读写第一文件的性能。In this embodiment of the present application, the computing engine receives the data access request, and determines the granularity of accessing the content in the first file based on the access requirement information in the data access request. When the determined granularity is the first granularity, based on the first user's Account information and the access requirement information, access the first file. When the determined granularity is the second granularity, the first file is accessed based on the specified administrator account information and the access requirement information. In this way, the first-granularity access service and the second-granularity access service can be provided to the user, which enriches the access services provided to the user. Since the first granularity is greater than the second granularity, when the determined granularity is the first granularity, the computing engine accesses the first file based on the first user's account information and the access requirement information, so that there is no need to borrow the administrator account information to access the first file. , improve the efficiency of accessing the first file and the performance of reading and writing the first file.
另外,如果为第一用户配置访问第二粒度的内容的权限,第一用户除了能够访问第一文件中的该内容外,还能够访问第一文件中除该内容之外的其他内容,第一用户的权限自动扩大到访问第一文件中的任意内容,导致权限扩大太多,不利于权限管理。然而在本申请中,由于在确定的粒度为第二粒度时,计算引擎基于指定的管理员账号信息和该访问需求信息访问第一文件,这样借助管理员账号信息来代替第一用户的账号信息,并使用管理员账号信息来访问第一文件,如此不需要为第一用户配置能够访问第二粒度的权限,从而避免扩大第一用户的访问权限,便于权限管理。In addition, if the permission to access the second granular content is configured for the first user, in addition to the content in the first file, the first user can also access other content in the first file except the content. The user's permissions are automatically expanded to access any content in the first file, which results in too much permission expansion and is not conducive to permission management. However, in this application, when the determined granularity is the second granularity, the computing engine accesses the first file based on the specified administrator account information and the access requirement information, so that the administrator account information is used to replace the first user's account information. , and use the administrator account information to access the first file, so that there is no need to configure the first user with permissions to access the second granularity, thereby avoiding expanding the first user's access permissions and facilitating permission management.
参见图4,本申请实施例提供了一种访问文件的方法400,所述方法400应用于图1或图3所示的访问系统100,所述方法400包括如下步骤401至步骤410。Referring to Figure 4, this embodiment of the present application provides a method 400 for accessing files. The method 400 is applied to the access system 100 shown in Figure 1 or Figure 3. The method 400 includes the following steps 401 to 410.
步骤401:计算引擎接收数据访问请求,该数据访问请求包括访问需求信息,该访问需求信息用于指示第一用户需要访问的第一文件中的内容,第一文件存储在对象文件存储系统中。Step 401: The computing engine receives a data access request. The data access request includes access requirement information. The access requirement information is used to indicate the content of the first file that the first user needs to access. The first file is stored in the object file storage system.
在一些实施例中,第一用户是执行数据访问业务的业务用户,第一用户向计算引擎发送数据访问请求。In some embodiments, the first user is a business user who performs data access services, and the first user sends a data access request to the computing engine.
在一些实施例中,该访问需求信息为用于访问数据库的访问语句,例如该访问需求信息为SQL语句等。In some embodiments, the access requirement information is an access statement for accessing the database, for example, the access requirement information is a SQL statement, etc.
在一些实施例中,该访问需求信息包括该内容的内容标识信息和第一操作类型,第一操作类型用于指示访问第一文件中的该内容的第一访问操作。可选地,第一访问操作为查询第一文件、更新第一文件或删除第一文件等。In some embodiments, the access requirement information includes content identification information of the content and a first operation type, and the first operation type is used to indicate a first access operation for accessing the content in the first file. Optionally, the first access operation is querying the first file, updating the first file, or deleting the first file, etc.
在一些实施例中,在第一操作类型指示的第一访问操作为更新第一文件时,该访问需求信息还包括待更新内容。In some embodiments, when the first access operation indicated by the first operation type is to update the first file, the access requirement information also includes content to be updated.
在一些实施例中,该数据访问请求还可能包括第一用户的账号信息。In some embodiments, the data access request may also include the first user's account information.
在一些实施例中,该访问需求信息可能包括如下几种类型的信息,接下来分别描述该几种类型的访问需求信息。In some embodiments, the access requirement information may include the following types of information. The following types of access requirement information will be described respectively.
类型1,该访问需求信息包括第一文件的标识信息和第一操作类型。Type 1, the access requirement information includes identification information of the first file and the first operation type.
该访问需求信息不包括第一文件的分区的标识信息。在第一文件为结构化数据文件时, 该访问需求信息不包括第一信息,第一信息用于指示第一文件中的至少一列和/或至少一行。在第一文件为半结构化数据文件时,该访问需求信息不包括第一文件中的分区的标识信息。The access requirement information does not include identification information of the partition of the first file. When the first file is a structured data file, the access requirement information does not include the first information, and the first information is used to indicate at least one column and/or at least one row in the first file. When the first file is a semi-structured data file, the access requirement information does not include identification information of partitions in the first file.
在此情况下,该内容的内容标识信息为第一文件的标识信息。In this case, the content identification information of the content is the identification information of the first file.
对于类型1的访问需求信息,该内容为第一文件的全部内容,表示第一用户需要访问第一文件的全部内容,第一用户访问第一文件中的该内容的粒度为文件粒度。For type 1 access requirement information, the content is the entire content of the first file, indicating that the first user needs to access all the content of the first file, and the granularity of the first user's access to the content in the first file is file granularity.
例如,该访问需求信息为:Select*From Company information。该访问需求信息包括如表1所示的第一文件的标识信息“Company information”和第一操作类型“Select”,第一操作类型“Select”为查询第一文件。For example, the access requirement information is: Select*From Company information. The access requirement information includes the identification information "Company information" of the first file as shown in Table 1 and the first operation type "Select". The first operation type "Select" is to query the first file.
类型2,该访问需求信息包括第一文件的标识信息、第一文件中的分区的标识信息和第一操作类型。Type 2: the access requirement information includes identification information of the first file, identification information of the partition in the first file, and the first operation type.
在第一文件为结构化数据文件时,该访问需求信息不包括第一信息,第一信息用于指示第一文件中的至少一列和/或至少一行。在第一文件为半结构化数据文件时,该访问需求信息不包括第一文件中的分区的标识信息。When the first file is a structured data file, the access requirement information does not include the first information, and the first information is used to indicate at least one column and/or at least one row in the first file. When the first file is a semi-structured data file, the access requirement information does not include identification information of partitions in the first file.
在此情况下,该内容的内容标识信息包括第一文件的标识信息和第一文件中的该分区的标识信息。In this case, the content identification information of the content includes identification information of the first file and identification information of the partition in the first file.
对于类型2的访问需求信息,该内容为第一文件的该分区,表示第一用户需要访问第一文件的该分区,第一用户访问第一文件中的该内容的粒度为分区粒度。For type 2 access requirement information, the content is the partition of the first file, indicating that the first user needs to access the partition of the first file, and the granularity of the first user accessing the content in the first file is the partition granularity.
类型3,第一文件为结构化数据文件,该访问需求信息包括第一文件的标识信息、第一信息和第一操作类型,第一信息用于指示第一文件中的至少一列和/或至少一行。Type 3, the first file is a structured data file, the access requirement information includes the identification information of the first file, the first information and the first operation type, the first information is used to indicate at least one column in the first file and/or at least One line.
对于类型3的访问需求信息,该内容为第一文件的该至少一列或该至少一行,表示第一用户需要访问第一文件的该至少一列或该至少一行,第一用户访问第一文件中的该内容的粒度为行列粒度。For type 3 access requirement information, the content is the at least one column or the at least one row of the first file, indicating that the first user needs to access the at least one column or the at least one row of the first file, and the first user accesses the at least one column or the at least one row of the first file. The granularity of this content is row-column granularity.
在一些实施例中,第一信息包括第一文件中的该至少一列的列标识,该内容为第一文件中的该至少一列,表示第一用户需要访问第一文件的该至少一列。在此情况下,该内容的内容标识信息包括第一文件的标识信息和第一文件中的该至少一列的列标识。In some embodiments, the first information includes a column identifier of the at least one column in the first file, the content is the at least one column in the first file, indicating that the first user needs to access the at least one column of the first file. In this case, the content identification information of the content includes identification information of the first file and a column identification of the at least one column in the first file.
例如,该访问需求信息为:Select Name,City From Company information。该访问需求信息包括如表1所示的第一文件的标识信息“Company information”、第一文件的第二列的列标识“Name”、第一文件的第四列的列标识“City”和第一操作类型“Select”,第一操作类型“Select”为查询第一文件。For example, the access requirement information is: Select Name, City From Company information. The access requirement information includes the identification information "Company information" of the first file as shown in Table 1, the column identification "Name" of the second column of the first file, the column identification "City" of the fourth column of the first file, and The first operation type "Select" is to query the first file.
在一些实施例中,第一信息包括第一文件中的至少一列的列标识和该至少一列中的每列对应的行过滤信息,该内容为第一文件中的至少一行,表示第一用户需要访问第一文件的至少一行。在此情况下,该内容的内容标识信息包括第一文件的标识信息和第一文件中的该至少一列的列标识。In some embodiments, the first information includes a column identifier of at least one column in the first file and row filtering information corresponding to each column in the at least one column, and the content is at least one row in the first file, indicating that the first user needs Access at least one line of the first file. In this case, the content identification information of the content includes identification information of the first file and a column identification of the at least one column in the first file.
对于该至少一列中的任一列,可以从第一文件中定位出该列的内容为该行过滤信息的一行或多行。该内容为定位出的该一行或多行的内容。For any column in the at least one column, one or more rows whose content of the column is the row of filter information can be located from the first file. The content is the content of the located row or rows.
例如,参见上述表1,假设第一信息包括表1中的第四列的列标识“City”和第四列对应的行过滤信息“城市1”,可以基于第一信息从表1所示的第一文件中定位出第四列中的City为“城市1”的第一行、第二行和第五行。该访问需求信息为:Select*From Company information Where City=城市1,第一操作类型“Select”为查询第一文件。For example, referring to the above-mentioned Table 1, assuming that the first information includes the column identifier "City" of the fourth column in Table 1 and the row filtering information "City 1" corresponding to the fourth column, the first information can be obtained from the information shown in Table 1 based on the first information. In the first file, the City in the fourth column is located as the first, second and fifth rows of "City 1". The access requirement information is: Select*From Company information Where City=City 1, and the first operation type "Select" is to query the first file.
在一些实施例中,第一信息包括第一文件中的至少一行的行号,该内容为第一文件中的该至少一行,表示第一用户需要访问第一文件的该至少一行。在此情况下,该内容的内容标识信息包括第一文件的标识信息和第一文件中的该至少一行的行号。In some embodiments, the first information includes a line number of at least one line in the first file, the content is the at least one line in the first file, indicating that the first user needs to access the at least one line in the first file. In this case, the content identification information of the content includes identification information of the first file and a line number of the at least one line in the first file.
类型4,第一文件为半结构化数据文件,该访问需求信息包括第一文件的标识信息、第一文件中的至少一个数据片段的标识信息和第一操作类型。Type 4, the first file is a semi-structured data file, and the access requirement information includes identification information of the first file, identification information of at least one data fragment in the first file, and the first operation type.
在此情况下,该内容的内容标识信息包括第一文件的标识信息和第一文件中的该至少一个数据片段的标识信息。In this case, the content identification information of the content includes identification information of the first file and identification information of the at least one data fragment in the first file.
对于类型4的访问需求信息,该内容为第一文件的该至少一个数据片段,表示第一用户需要访问第一文件的该至少一个数据片段,第一用户访问第一文件中的该内容的粒度为数据片段粒度。For type 4 access requirement information, the content is the at least one data fragment of the first file, indicating that the first user needs to access the at least one data fragment of the first file, and the granularity of the first user's access to the content in the first file. is the data fragment granularity.
对于类型4的访问需求信息,该数据访问请求为远程过程调用(remote procedure call,RPC)请求,该RPC请求包括类型4的访问需求信息。For type 4 access requirement information, the data access request is a remote procedure call (RPC) request, and the RPC request includes type 4 access requirement information.
例如,该RPC请求中的数据访问信息包括如图2所示的第一文件的标识信息“Company information”、第一文件的第一数据片段的标识信息“Name”、第一文件的第三数据片段的标识信息“Country”和第一操作类型“查询第一文件”。For example, the data access information in the RPC request includes the identification information "Company information" of the first file, the identification information "Name" of the first data fragment of the first file, and the third data of the first file as shown in Figure 2 The identification information of the fragment "Country" and the first operation type "query the first file".
综上所述,该访问需求信息至少包括第一文件的标识信息和第一操作类型,还可能包括第一信息、第一文件中的分区的标识信息或第一文件中的数据片段的标识信息等。To sum up, the access requirement information at least includes the identification information of the first file and the first operation type, and may also include the first information, the identification information of the partition in the first file, or the identification information of the data fragment in the first file. wait.
步骤402:计算引擎基于第二权限信息、第一用户的账号信息和该访问需求信息,对第一用户访问该内容的权限进行鉴权,在对第一用户访问该内容的权限进行鉴权通过后,执行步骤403。Step 402: The computing engine authenticates the first user's permission to access the content based on the second permission information, the first user's account information and the access requirement information, and then authenticates the first user's permission to access the content. Afterwards, step 403 is executed.
第二权限信息用于指示能够访问该内容的用户身份和能够访问该内容的第四访问操作。The second permission information is used to indicate the user identity that can access the content and the fourth access operation that can access the content.
参见图5,在一些实施例中,计算引擎为Hive引擎,Hive引擎通过如下4021至4023的操作,对第一用户访问该内容的权限进行鉴权。Referring to Figure 5, in some embodiments, the computing engine is a Hive engine, and the Hive engine authenticates the first user's permission to access the content through the following operations 4021 to 4023.
4021:对于该访问需求信息中的该内容的内容标识信息,Hive引擎基于该内容标识信息,确定对象文件存储系统是否存在该内容,如果对象文件存储系统中存在该内容,执行如下4022的操作。4021: Regarding the content identification information of the content in the access requirement information, the Hive engine determines whether the content exists in the object file storage system based on the content identification information. If the content exists in the object file storage system, perform the following operation 4022.
在4021中,该内容标识信息包括第一文件的标识信息,Hive引擎基于第一文件的标识信息,从元数据中心中获取第一文件的元数据,第一文件的元数据包括第一文件的标识信息。在第一文件的元数据包括的操作类型为删除第一文件,则表示元数据中心已在对象文件存储系统中删除第一文件,Hive引擎基于第一文件的元数据确定对象文件存储系统不存在该内容。在第一文件的元数据包括的操作类型为创建第一文件、修改第一文件或查询第一文件,则表示对象文件存储系统保存有第一文件。Hive引擎基于第一文件的元数据和该内容标识信息,确定对象文件存储系统是否存在该内容。In 4021, the content identification information includes the identification information of the first file. The Hive engine obtains the metadata of the first file from the metadata center based on the identification information of the first file. The metadata of the first file includes the metadata of the first file. Identification information. The operation type included in the metadata of the first file is to delete the first file, which means that the metadata center has deleted the first file in the object file storage system. The Hive engine determines that the object file storage system does not exist based on the metadata of the first file. the content. The operation type included in the metadata of the first file is creating the first file, modifying the first file, or querying the first file, which means that the object file storage system stores the first file. The Hive engine determines whether the content exists in the object file storage system based on the metadata of the first file and the content identification information.
在一些实施例中,Hive引擎获取第一文件的元数据的操作为:In some embodiments, the operation of the Hive engine to obtain the metadata of the first file is:
Hive引擎向元数据中心发送第一获取命令,第一获取命令包括第一文件的标识信息。元数据中心接收第一获取命令,从保存的元数据中获取包括第一文件的标识信息的元数据,获取的元数据为第一文件的元数据,向Hive引擎发送第一获取响应,第一获取响应包括第一文件的元数据。或者,The Hive engine sends a first acquisition command to the metadata center, where the first acquisition command includes identification information of the first file. The metadata center receives the first acquisition command, acquires metadata including the identification information of the first file from the saved metadata, and the acquired metadata is the metadata of the first file, and sends a first acquisition response to the Hive engine. The get response includes metadata for the first file. or,
Hive引擎向元数据中心发送第一获取命令。元数据中心接收第一获取命令,获取保存的 每个元数据,向Hive引擎发送第一获取响应,第一获取响应包括该每个元数据。Hive引擎接收第一获取响应,从该每个元数据中获取包括第一文件的标识信息的元数据,获取的元数据为第一文件的元数据。The Hive engine sends the first fetch command to the metadata center. The metadata center receives the first acquisition command, acquires each saved metadata, and sends a first acquisition response to the Hive engine, where the first acquisition response includes each metadata. The Hive engine receives the first acquisition response, and acquires metadata including the identification information of the first file from each metadata, and the acquired metadata is the metadata of the first file.
在一些实施例中,在第一文件的元数据包括的操作类型为创建第一文件、修改第一文件或查询第一文件的情况,Hive引擎基于第一文件的元数据和该内容标识信息,确定对象文件存储系统是否存在该内容的操作为:In some embodiments, when the metadata of the first file includes an operation type of creating the first file, modifying the first file, or querying the first file, the Hive engine based on the metadata of the first file and the content identification information, The operation to determine whether the content exists in the object file storage system is:
如果该内容标识信息包括第一文件的标识信息和第一文件中的分区的标识信息,在第一文件的元数据还包括第一文件中的该分区的标识信息,确定在对象文件存储系统存在该内容。在第一文件的元数据不包括第一文件中的该分区的标识信息,确定在对象文件存储系统不存在该内容。If the content identification information includes the identification information of the first file and the identification information of the partition in the first file, and the metadata of the first file also includes the identification information of the partition in the first file, it is determined that the object file storage system exists the content. The metadata in the first file does not include the identification information of the partition in the first file, and it is determined that the content does not exist in the object file storage system.
如果该内容标识信息包括第一文件的标识信息和第一文件中的至少一列的列标识,在第一文件的元数据还包括该至少一列的列标识,确定对象文件存储系统存在该内容。在第一文件的元数据不包括该至少一列的列标识,确定对象文件存储系统不存在该内容。If the content identification information includes the identification information of the first file and the column identification of at least one column in the first file, and the metadata of the first file also includes the column identification of the at least one column, it is determined that the content exists in the object file storage system. If the metadata of the first file does not include the column identifier of the at least one column, it is determined that the content does not exist in the object file storage system.
如果该内容标识信息包括第一文件的标识信息和第一文件中的数据片段的标识信息,在第一文件的元数据还包括该数据片段的标识信息,确定对象文件存储系统存在该内容。在第一文件的元数据不包括该数据片段的标识信息,确定对象文件存储系统不存在该内容。If the content identification information includes the identification information of the first file and the identification information of the data fragment in the first file, and the metadata of the first file also includes the identification information of the data fragment, it is determined that the content exists in the object file storage system. The metadata of the first file does not include the identification information of the data fragment, and it is determined that the content does not exist in the object file storage system.
如果该内容标识信息包括第一文件的标识信息和第一文件中的至少一行的行号,在确定对象文件存储系统保存有第一文件时,就可认为对象文件存储系统存在该内容。If the content identification information includes the identification information of the first file and the line number of at least one line in the first file, when it is determined that the first file is stored in the object file storage system, the content can be considered to exist in the object file storage system.
第一文件的元数据包括第一文件的文件路径,所以计算引擎从第一文件的元数据中读取第一文件的文件路径。The metadata of the first file includes the file path of the first file, so the calculation engine reads the file path of the first file from the metadata of the first file.
4021的操作是可选的操作,也就是说,可以不执行4021的操作,直接执行如下4022的操作。或者,也可以执行4021的操作,再执行如下4022的操作。The operation of 4021 is an optional operation, that is to say, you can directly perform the following operation of 4022 without performing the operation of 4021. Alternatively, you can also perform the operation 4021, and then perform the following operation 4022.
4022:Hive引擎从联动权限模块中获取包括该内容的内容标识信息的第二权限信息,第二权限信息包括该内容的内容标识信息、能够访问该内容的用户身份和能够访问该内容的第四访问操作的第四操作类型。4022: The Hive engine obtains the second permission information including the content identification information of the content from the linkage permission module. The second permission information includes the content identification information of the content, the identity of the user who can access the content, and the fourth user who can access the content. The fourth operation type of access operation.
在一些实施例中,联动权限模块包括第二读写接口,Hive引擎通过第二读写接口读取联动权限模块中保存的每个第二权限信息,基于该内容的内容标识信息,从每个第二权限信息中获取该内容对应的第二权限信息。In some embodiments, the linkage permission module includes a second read-write interface. The Hive engine reads each second permission information saved in the linkage permission module through the second read-write interface. Based on the content identification information of the content, from each Obtain the second permission information corresponding to the content from the second permission information.
在一些实施例中,Hive引擎向联动权限模块发送第二获取命令,第二获取命令包括该内容的内容标识信息。联动权限模块接收第二获取命令,基于该内容标识信息,从保存的每个第二权限信息中获取该内容对应的第二权限信息,向Hive引擎发送第二获取响应,第二获取响应包括获取的第二权限信息。In some embodiments, the Hive engine sends a second acquisition command to the linkage permission module, and the second acquisition command includes the content identification information of the content. The linkage permission module receives the second acquisition command, obtains the second permission information corresponding to the content from each saved second permission information based on the content identification information, and sends a second acquisition response to the Hive engine. The second acquisition response includes acquisition second permission information.
其中,如果该内容标识信息包括第一文件的标识信息,且该内容标识信息不包括其他信息,则获取的第二权限信息为包括第一文件的标识信息的第二权限信息。Wherein, if the content identification information includes the identification information of the first file, and the content identification information does not include other information, the obtained second permission information is the second permission information including the identification information of the first file.
如果该内容标识信息包括第一文件的标识信息和第一文件中的至少一行的行号,且该内容标识信息不包括其他信息,则获取的第二权限信息为包括第一文件的标识信息的第二权限信息。If the content identification information includes the identification information of the first file and the line number of at least one line in the first file, and the content identification information does not include other information, the obtained second permission information includes the identification information of the first file. Second authority information.
如果该内容标识信息包括第一文件的标识信息和第一文件中的至少一列的列标识,且该内容标识信息不包括其他信息,获取的第二权限信息为包括第一文件的标识信息和该至少一 列的列标识的第二权限信息。If the content identification information includes the identification information of the first file and the column identification of at least one column in the first file, and the content identification information does not include other information, the obtained second permission information includes the identification information of the first file and the column identification of the first file. At least one column identifies the secondary permission information.
如果该内容标识信息包括第一文件的标识信息和第一文件中的分区的标识信息,且该内容标识信息不包括其他信息,则获取的第二权限信息为包括第一文件的标识信息和该分区的标识信息的第二权限信息。If the content identification information includes the identification information of the first file and the identification information of the partition in the first file, and the content identification information does not include other information, the obtained second permission information includes the identification information of the first file and the identification information of the first file. The second permission information of the partition identification information.
如果该内容标识信息包括第一文件的标识信息和第一文件中的数据片段的标识信息,且该内容标识信息不包括其他信息,则获取的第二权限信息为包括第一文件的标识信息和该数据片段的第二权限信息。If the content identification information includes the identification information of the first file and the identification information of the data fragment in the first file, and the content identification information does not include other information, the obtained second permission information includes the identification information of the first file and The second permission information of this data fragment.
4023:Hive引擎基于该第二权限信息,第一用户的账号信息和第一操作类型,对第一用户访问该内容的权限进行鉴权。4023: The Hive engine authenticates the first user's permission to access the content based on the second permission information, the first user's account information and the first operation type.
在4023中,Hive引擎基于第一用户的账号信息确定第一用户的用户身份,比较第一用户的用户身份与第二权限信息包括的用户身份,以及比较第一操作类型与第二权限信息包括的第四操作类型。如果比较出第一用户的用户身份与第二权限信息包括的用户身份相同,以及比较出第一操作类型与第二权限信息包括的第四操作类型相同,则对第一用户访问该内容的权限鉴权通过。如果比较出第一用户的用户身份与第二权限信息包括的用户身份不同,和/或,比较出第一操作类型与第二权限信息包括的第四操作类型不同,则对第一用户访问该内容的权限鉴权未通过。In 4023, the Hive engine determines the user identity of the first user based on the account information of the first user, compares the user identity of the first user with the user identity included in the second permission information, and compares the first operation type with the user identity included in the second permission information. The fourth operation type. If the user identity of the first user is compared with the user identity included in the second permission information, and the first operation type is compared with the fourth operation type included in the second permission information, then the permission for the first user to access the content is Authentication passed. If it is compared that the user identity of the first user is different from the user identity included in the second permission information, and/or it is compared that the first operation type is different from the fourth operation type included in the second permission information, then the first user accesses the Content permission authentication failed.
在一些实施例中,身份认证中心保存有用户的账号信息与用户身份的对应关系。Hive引擎基于第一用户的账号信息,从身份认证中心中查询第一用户的用户身份。In some embodiments, the identity authentication center stores the corresponding relationship between the user's account information and the user's identity. The Hive engine queries the first user's user identity from the identity authentication center based on the first user's account information.
参见图6,在一些实施例中,计算引擎为Spark引擎,Spark引擎通过如下4121至4125的操作,对第一用户访问该内容的权限进行鉴权。Referring to Figure 6, in some embodiments, the computing engine is a Spark engine, and the Spark engine authenticates the first user's permission to access the content through the following operations 4121 to 4125.
4121:Spark引擎向元数据中心发送该访问需求信息。4121: The Spark engine sends the access requirement information to the metadata center.
在一些实施例中,Spark引擎向元数据中心还发送第一用户的账号信息。In some embodiments, the Spark engine also sends the first user's account information to the metadata center.
4122:对于该访问需求信息中的该内容的内容标识信息,元数据中心基于该内容标识信息,确定对象文件存储系统是否存在该内容,如果对象文件存储系统中存在该内容,执行如下4123的操作。4122: Regarding the content identification information of the content in the access requirement information, the metadata center determines whether the content exists in the object file storage system based on the content identification information. If the content exists in the object file storage system, perform the following operations 4123. .
元数据中心确定对象文件存储系统是否存在该内容的详细过程,参见上述4021的操作中Hive引擎确定对象文件存储系统是否存在该内容的详细过程,在此不再详细说明。For the detailed process of the metadata center determining whether the content exists in the object file storage system, please refer to the detailed process of the Hive engine determining whether the content exists in the object file storage system in the above operation 4021, which will not be described in detail here.
如果Spark引擎向元数据中心还发送第一用户的账号信息,元数据中心在确定对象文件存储系统中存在该内容时,从联动权限模块中获取包括该内容的内容标识信息的第二权限信息。基于该第二权限信息,第一用户的账号信息和第一操作类型,对第一用户访问该内容的权限进行鉴权,在对第一用户访问该内容的权限鉴权通过后,执行如下4123的操作。If the Spark engine also sends the first user's account information to the metadata center, when the metadata center determines that the content exists in the object file storage system, it obtains the second permission information including the content identification information of the content from the linkage permission module. Based on the second permission information, the first user's account information and the first operation type, the first user's permission to access the content is authenticated. After the first user's permission to access the content is authenticated, the following 4123 is performed. operation.
元数据中心获取第二权限信息以及对第一用户访问该内容的权限进行鉴权的过程,参见上述4022和4023中Hive引擎获取第二权限信息以及对第一用户访问该内容的权限进行鉴权的过程,在此不再详细说明。For the process of the metadata center obtaining the second permission information and authenticating the first user's permission to access the content, refer to the above 4022 and 4023 for the Hive engine to obtain the second permission information and authenticate the first user's permission to access the content. The process will not be described in detail here.
4123:元数据中心向Spark引擎发送确认信息。4123: The metadata center sends confirmation information to the Spark engine.
其中,如果对象文件存储系统中不存在该内容,向Spark引擎发送否认信息。或者,如果对第一用户访问该内容的权限鉴权不通过,向Spark引擎发送否认信息。Among them, if the content does not exist in the object file storage system, a denial message is sent to the Spark engine. Or, if the first user's permission to access the content fails to be authenticated, a denial message is sent to the Spark engine.
4124:Spark引擎接收该确认信息,从联动权限模块中获取包括该内容的内容标识信息的第二权限信息,第二权限信息包括该内容的内容标识信息、能够访问该内容的用户身份和能 够访问该内容的第四访问操作的第四操作类型。4124: The Spark engine receives the confirmation information and obtains the second permission information including the content identification information of the content from the linkage permission module. The second permission information includes the content identification information of the content, the identity of the user who can access the content, and the identity of the user who can access the content. A fourth operation type of a fourth access operation for the content.
Spark引擎获取第二权限信息的过程,参见上述4022中Hive引擎获取第二权限信息的过程,在此不再详细说明。For the process of the Spark engine obtaining the second permission information, please refer to the process of the Hive engine obtaining the second permission information in 4022 above, which will not be explained in detail here.
如果Spark引擎接收该否认信息,则结束操作。If the Spark engine receives the denial information, the operation ends.
4125:Spark引擎基于该第二权限信息,第一用户的账号信息和第一操作类型,对第一用户访问该内容的权限进行鉴权。4125: The Spark engine authenticates the first user's permission to access the content based on the second permission information, the first user's account information and the first operation type.
Spark引擎对第一用户访问该内容的权限进行鉴权的过程,参见上述4023中Hive引擎对第一用户访问该内容的权限进行鉴权的过程,在此不再详细说明。For the process of the Spark engine authenticating the first user's permission to access the content, please refer to the process of the Hive engine authenticating the first user's permission to access the content in 4023 above, which will not be described in detail here.
其中,上述4121-4123的操作为可选操作,即可以不执行上述4121-4123的操作,Spark直接执行4124-4125的操作,即Spark引擎从联动权限模块中获取包括该内容的内容标识信息的第二权限信息,基于该第二权限信息,第一用户的账号信息和第一操作类型,对第一用户访问该内容的权限进行鉴权。Among them, the above-mentioned operations 4121-4123 are optional operations, that is, you do not need to perform the above-mentioned operations 4121-4123. Spark directly performs the operations 4124-4125, that is, the Spark engine obtains the content identification information including the content from the linkage permission module. The second permission information is used to authenticate the first user's permission to access the content based on the second permission information, the first user's account information and the first operation type.
步骤403:计算引擎基于该访问需要信息确定访问第一文件的内容的粒度,如果确定的粒度为第一粒度,执行步骤404,如果确定的粒度为第二粒度,则执行步骤409。Step 403: The computing engine determines the granularity of accessing the content of the first file based on the access requirement information. If the determined granularity is the first granularity, step 404 is executed. If the determined granularity is the second granularity, step 409 is executed.
在步骤403中,计算引擎基于该访问需求信息包括的该内容的内容标识信息,确定访问第一文件的内容的粒度。In step 403, the computing engine determines the granularity of accessing the content of the first file based on the content identification information of the content included in the access requirement information.
上述介绍了该访问需求信息有四种类型的信息,接下来针对每种类型的访问需求信息,来说明确定该粒度的过程。The above describes the four types of access requirement information. Next, the process of determining the granularity is explained for each type of access requirement information.
对于上述类型1的访问需求信息,该访问需求信息包括第一文件的标识信息。但该访问需求信息不包括第一文件的分区的标识信息。并且,在第一文件为结构化数据文件时,该访问需求信息不包括第一信息。在第一文件为半结构化数据文件时,该访问需求信息不包括第一文件中的数据片段的标识信息。此时该内容的内容标识信息为第一文件的标识信息,计算引擎确定访问第一文件的该内容的粒度为文件粒度。For the above type 1 access requirement information, the access requirement information includes identification information of the first file. However, the access requirement information does not include identification information of the partition of the first file. Moreover, when the first file is a structured data file, the access requirement information does not include the first information. When the first file is a semi-structured data file, the access requirement information does not include identification information of the data fragments in the first file. At this time, the content identification information of the content is the identification information of the first file, and the computing engine determines that the granularity of accessing the content of the first file is file granularity.
对于上述类型2的访问需求信息,该访问需求信息包括第一文件的标识信息和第一文件中的分区的标识信息。并且,在第一文件为结构化数据文件时,该访问需求信息不包括第一信息。在第一文件为半结构化数据文件时,该访问需求信息不包括第一文件中的分区的标识信息。此时该内容的内容标识信息为第一文件的标识信息和第一文件中的该分区的标识信息,计算引擎确定访问第一文件的该内容的粒度为分区粒度。For the above type 2 access requirement information, the access requirement information includes identification information of the first file and identification information of the partition in the first file. Moreover, when the first file is a structured data file, the access requirement information does not include the first information. When the first file is a semi-structured data file, the access requirement information does not include identification information of partitions in the first file. At this time, the content identification information of the content is the identification information of the first file and the identification information of the partition in the first file, and the computing engine determines that the granularity of accessing the content of the first file is the partition granularity.
其中,第一粒度为文件粒度或分区粒度,所以如果确定的粒度为第一粒度,该内容的内容标识信息包括第一文件的标识信息,或者,该内容的内容标识信息包括第一文件的标识信息和第一文件中的该分区的标识信息。The first granularity is file granularity or partition granularity, so if the determined granularity is the first granularity, the content identification information of the content includes the identification information of the first file, or the content identification information of the content includes the identification of the first file. information and the identification information of the partition in the first file.
对于上述类型3的访问需求信息,第一文件为结构化数据文件,该访问需求信息包括第一文件的标识信息和第一信息,第一信息用于指示第一文件中的至少一行和/或至少一列。此时该内容的内容标识信息为第一文件的标识信息和第一文件中的该至少一列的列标识,或者,该内容的内容标识信息为第一文件的标识信息和第一文件中的该至少一行的行号,计算引擎确定访问第一文件的该内容的粒度为行列粒度。For the above type 3 access requirement information, the first file is a structured data file. The access requirement information includes identification information of the first file and first information. The first information is used to indicate at least one line in the first file and/or At least one column. At this time, the content identification information of the content is the identification information of the first file and the column identification of the at least one column in the first file, or the content identification information of the content is the identification information of the first file and the column identification of the first file. The line number of at least one line, the calculation engine determines that the granularity of accessing the content of the first file is the row and column granularity.
对于上述类型4的访问需求信息,第一文件为半结构化数据文件,该访问需求信息包括第一文件的标识信息和第一文件中的数据片段的标识信息。此时该内容的内容标识信息为第一文件的标识信息和第一文件中的该数据片段的标识信息,计算引擎确定访问第一文件的该 内容的粒度为数据片段粒度。For the above type 4 access requirement information, the first file is a semi-structured data file, and the access requirement information includes identification information of the first file and identification information of the data fragments in the first file. At this time, the content identification information of the content is the identification information of the first file and the identification information of the data fragment in the first file, and the computing engine determines that the granularity of accessing the content of the first file is the data fragment granularity.
其中,第二粒度为行列粒度或数据片段粒度,所以如果确定的粒度为第二粒度,该内容的内容标识信息包括第一文件的标识信息和第一文件中的该至少一列的列标识,可选地,该访问需求信息还可能包括该至少一列对应的行过滤信息。或者,该内容的内容标识信息包括第一文件的标识信息和第一文件中的该至少一行的行号。或者,该内容的内容标识信息包括第一文件的标识信息和第一文件中的该数据片段的标识信息。The second granularity is row-column granularity or data fragment granularity. Therefore, if the determined granularity is the second granularity, the content identification information of the content includes the identification information of the first file and the column identification of the at least one column in the first file. Optionally, the access requirement information may also include row filtering information corresponding to the at least one column. Alternatively, the content identification information of the content includes identification information of the first file and the line number of the at least one line in the first file. Alternatively, the content identification information of the content includes identification information of the first file and identification information of the data fragment in the first file.
计算引擎包括计算模块和路由模块。上述步骤402由计算模块来执行,上述步骤403由路由模块来执行。其中,计算模块可以先执行上述步骤402,然后路由模块再执行上述步骤403。或者,路由模块先执行上述步骤403,然后计算模块可以再执行上述步骤402。或者,路由模块在执行上述步骤403的同时,计算模块也执行上述步骤402。计算引擎在对第一用户访问该内容的权限鉴权通过,并且如果路由模块确定的粒度为第一粒度,执行如下步骤404,如果路由模块确定的粒度为第二粒度,执行如下步骤409。The computing engine includes computing module and routing module. The above step 402 is executed by the computing module, and the above step 403 is executed by the routing module. The computing module may first perform the above step 402, and then the routing module may perform the above step 403. Alternatively, the routing module first performs the above step 403, and then the computing module can perform the above step 402. Alternatively, while the routing module performs the above step 403, the computing module also performs the above step 402. After the computing engine passes the authentication of the first user's permission to access the content, and if the granularity determined by the routing module is the first granularity, the following step 404 is executed. If the granularity determined by the routing module is the second granularity, the following step 409 is executed.
步骤404:计算引擎向文件路径鉴权模块发送鉴权请求,该鉴权请求包括鉴权信息,该鉴权信息用于指示第一用户、第一文件的文件路径和第一用户访问该文件路径的第二访问操作。Step 404: The computing engine sends an authentication request to the file path authentication module. The authentication request includes authentication information. The authentication information is used to indicate the first user, the file path of the first file, and the first user's access to the file path. second access operation.
在一些实施例中,该鉴权信息包括第一文件的文件路径、第一用户的用户身份和第二操作类型。第二操作类型是第一操作类型对应的能够访问对象文件存储系统的操作类型,计算引擎对第一操作类型进行映射可得到第二操作类型,第二操作类型包括读操作和/或写操作。In some embodiments, the authentication information includes the file path of the first file, the user identity of the first user, and the second operation type. The second operation type is an operation type corresponding to the first operation type that can access the object file storage system. The computing engine maps the first operation type to obtain the second operation type. The second operation type includes read operations and/or write operations.
在一些实施例中,该鉴权信息包括第一文件的标识信息、第一用户的账号信息和第一操作类型。该鉴权信息还可能是包括其他内容的信息,在此不再一一列举。In some embodiments, the authentication information includes identification information of the first file, account information of the first user, and the first operation type. The authentication information may also include other information, which will not be listed here.
参见图5或图6,在步骤404中,计算引擎(Hive引擎或Spark引擎)的路由模块向文件路径鉴权模块发送鉴权请求。Referring to Figure 5 or Figure 6, in step 404, the routing module of the computing engine (Hive engine or Spark engine) sends an authentication request to the file path authentication module.
步骤405:文件路径鉴权模块接收该鉴权请求,基于第一权限信息和该鉴权信息对第一用户采用第二访问操作访问该文件路径的权限进行鉴权,第二访问操作是第二操作类型对应的访问操作。Step 405: The file path authentication module receives the authentication request, and authenticates the first user's permission to access the file path using the second access operation based on the first permission information and the authentication information. The second access operation is the second The access operation corresponding to the operation type.
联动权限模块中保存有至少一个文件对应的第一权限信息,对于任一个文件,该文件对应的第一权限信息包括该文件的文件路径,能够访问该文件的文件路径的用户身份和能够访问该文件的文件路径的第三访问操作的第三操作类型。The linkage permission module stores the first permission information corresponding to at least one file. For any file, the first permission information corresponding to the file includes the file path of the file, the identity of the user who can access the file path of the file and the user who can access the file path. The third operation type for the third access operation of the file path of the file.
在一些实施例中,该鉴权信息包括第一文件的文件路径、第一用户的用户身份和第二操作类型。在步骤405中,文件路径鉴权模块基于该文件路径,从联动权限模块中读取包括该文件路径的第一权限信息,读取的第一权限信息与第一文件相对应。比较第一用户的用户身份和读取的第一权限信息包括的用户身份,以及比较第二操作类型和读取的第一权限信息包括的第三操作类型。如果比较出第一用户的用户身份和读取的第一权限信息包括的用户身份相同,以及比较出第二操作类型和读取的第一权限信息包括的第三操作类型相同,则对第一用户采用第二访问操作访问该文件路径的权限鉴权通过。如果比较出第一用户的用户身份和读取的第一权限信息包括的用户身份不同,和/或,比较出第二操作类型和读取的第一权限信息包括的第三操作类型不同,则对第一用户采用第二访问操作访问该文件路径的权限鉴权未通过。In some embodiments, the authentication information includes the file path of the first file, the user identity of the first user, and the second operation type. In step 405, the file path authentication module reads the first permission information including the file path from the linkage permission module based on the file path, and the read first permission information corresponds to the first file. Comparing the user identity of the first user with the user identity included in the read first permission information, and comparing the second operation type with the third operation type included in the read first permission information. If the user identity of the first user is compared with the user identity included in the read first permission information, and the second operation type is compared with the third operation type included in the read first permission information, then the first The user's permission to access the file path using the second access operation is authenticated and passed. If the comparison shows that the user identity of the first user is different from the user identity included in the read first permission information, and/or the comparison shows that the second operation type is different from the third operation type included in the read first permission information, then The permission authentication for the first user to access the file path using the second access operation failed.
在一些实施例中,该鉴权信息包括第一文件的标识信息、第一用户的账号信息和第一操 作类型。文件路径鉴权模块先基于第一文件的标识信息获取第一文件的文件路径,基于第一用户的账号信息获取第一用户的用户身份,对第一操作类型进行映射得到第二操作类型。然后对第一用户采用第二访问操作访问该文件路径的权限进行鉴权。In some embodiments, the authentication information includes identification information of the first file, account information of the first user, and the first operation type. The file path authentication module first obtains the file path of the first file based on the identification information of the first file, obtains the user identity of the first user based on the account information of the first user, and maps the first operation type to obtain the second operation type. Then, the first user's permission to access the file path using the second access operation is authenticated.
在一些实施例中,文件路径鉴权模块从元数据中心中获取包括第一文件的标识信息的元数据,该元数据是第一文件的元数据,从第一文件的元数据中读取第一文件的文件路径。以及,文件路径鉴权模块根据第一用户的账号信息,从身份认证中心中获取第一用户的用户身份。In some embodiments, the file path authentication module obtains metadata including identification information of the first file from the metadata center, the metadata is metadata of the first file, and reads the metadata of the first file from the metadata of the first file. The file path of a file. And, the file path authentication module obtains the user identity of the first user from the identity authentication center based on the account information of the first user.
步骤406:文件路径鉴权模块在对第一用户采用第二访问操作访问该文件路径的权限鉴权通过时,向对象文件存储系统发送存储信息以及向计算引擎发送鉴权响应,该存储信息包括临时凭证、该文件路径和第二操作类型,该鉴权响应包括该临时凭证。Step 406: When the file path authentication module passes the permission authentication for the first user to access the file path using the second access operation, it sends storage information to the object file storage system and an authentication response to the computing engine. The storage information includes The temporary credential, the file path and the second operation type, the authentication response includes the temporary credential.
在步骤406中,文件路径鉴权模块在对第一用户采用第二访问操作访问该文件路径的权限鉴权通过时,分配临时凭证。In step 406, the file path authentication module allocates a temporary credential when the first user's permission to access the file path through the second access operation is authenticated.
步骤407:对象文件存储系统接收该存储信息,对应保存该临时凭证、该文件路径和第二操作类型之间的对应关系。Step 407: The object file storage system receives the storage information and saves the corresponding relationship between the temporary certificate, the file path and the second operation type.
对象文件存储系统中保存有临时凭证、文件路径与操作类型的对应关系。在步骤407中,对象文件存储系统接收该存储信息,将该临时凭证、该文件路径和第二操作类型对应保存在临时凭证、文件路径与操作类型的对应关系中。The object file storage system stores the correspondence between temporary credentials, file paths, and operation types. In step 407, the object file storage system receives the storage information and stores the temporary certificate, the file path and the second operation type in the corresponding relationship between the temporary certificate, the file path and the operation type.
在临时凭证、文件路径与操作类型的对应关系中,如果该临时凭证保存的时间长度达到指定时长时,对象文件存储系统从临时凭证、文件路径与操作类型的对应关系中删除包括该临时凭证的记录。In the correspondence between temporary credentials, file paths and operation types, if the storage time of the temporary credentials reaches the specified length, the object file storage system deletes the temporary credentials from the correspondence between temporary credentials, file paths and operation types. Record.
步骤408:计算引擎接收该鉴权响应,基于该临时凭证、该访问需求信息和该文件路径,访问对象文件存储系统中的第一文件。Step 408: The computing engine receives the authentication response, and based on the temporary credential, the access requirement information and the file path, accesses the first file in the object file storage system.
该访问需求信息包括该内容的内容标识信息和第一访问操作的第一操作类型,该内容的内容标识信息包括第一文件的标识信息,或者,该内容的内容标识信息包括第一文件的标识信息和第一文件中的分区的标识信息。The access requirement information includes content identification information of the content and a first operation type of the first access operation. The content identification information of the content includes identification information of the first file, or the content identification information of the content includes the identification of the first file. information and identification information of the partition in the first file.
假设第一访问操作为查询第一文件,由第一访问操作映射得到的第二访问操作包括读操作,在步骤408中,按如下流程访问第一文件。Assume that the first access operation is to query the first file, and the second access operation mapped by the first access operation includes a read operation. In step 408, the first file is accessed according to the following process.
1-1:计算引擎向对象文件存储系统发送读请求,该读请求包括该临时凭证和第一文件的文件路径。1-1: The computing engine sends a read request to the object file storage system. The read request includes the temporary credential and the file path of the first file.
例如,对于上述列举的访问需求信息为:Select*From Company information的例子。在该例子中,第一操作类型“Select”为查询第一文件,内容的内容标识信息为“Company information”,计算引擎对“Select”映射出的第二访问操作包括读操作。For example, the access requirement information listed above is: Select*From Company information. In this example, the first operation type "Select" is to query the first file, the content identification information of the content is "Company information", and the second access operation mapped by the computing engine to "Select" includes a read operation.
假设文件路径鉴权模块分配的临时凭证为“P1”,且在对象文件存储系统中保存如下表2所示的临时凭证、文件路径和操作类型的对应关系。该对应关系中的第一条记录包括临时凭证1,如表1所示的第一文件的文件路径“C:\windows\system32\Company information”和访问该文件路径的第二访问操作的第二操作类型“读操作”。Assume that the temporary credential assigned by the file path authentication module is "P1", and the corresponding relationship between the temporary credential, file path and operation type as shown in Table 2 below is stored in the object file storage system. The first record in this correspondence includes temporary credentials 1, the file path of the first file "C:\windows\system32\Company information" as shown in Table 1 and the second access operation of the second access operation to the file path. Operation type "read operation".
表2Table 2
临时凭证Temporary credentials | 文件路径file path | 操作类型Operation type |
P1P1 | C:\windows\system32\Company informationC:\windows\system32\Company information | 读操作Read operation |
……… | ……… | ……… |
在1-1中,计算引擎向对象文件存储系统发送读请求,该读请求包括临时凭证“P1”和第一文件的文件路径“C:\windows\system32\Company information”。In 1-1, the computing engine sends a read request to the object file storage system. The read request includes the temporary credential "P1" and the file path of the first file "C:\windows\system32\Company information".
1-2:对象文件存储系统接收该读请求,基于该读请求包括的临时凭证,从临时凭证、文件路径与操作类型的对应关系中获取对应的文件路径和第二操作类型。1-2: The object file storage system receives the read request, and based on the temporary credentials included in the read request, obtains the corresponding file path and second operation type from the correspondence between the temporary credentials, file path, and operation type.
例如,对象文件存储系统接收该读请求,该读请求包括临时凭证“P1”和第一文件的文件路径“C:\windows\system32\Company information”。基于临时凭证“P1”,从如表2所示的临时凭证、文件路径与操作类型的对应关系中获取对应的文件路径“C:\windows\system32\Company information”和第二操作类型“读操作”。For example, the object file storage system receives the read request, which includes the temporary credential "P1" and the file path of the first file "C:\windows\system32\Company information". Based on the temporary credential "P1", obtain the corresponding file path "C:\windows\system32\Company information" and the second operation type "read operation" from the corresponding relationship between the temporary credential, file path and operation type shown in Table 2. ".
1-3:如果该读请求包括的文件路径与获取的文件路径相同且第二操作类型对应的第二访问操作包括读操作,对象文件存储系统基于第一文件的文件路径读取第一文件,向计算引擎返回第一文件。1-3: If the file path included in the read request is the same as the obtained file path and the second access operation corresponding to the second operation type includes a read operation, the object file storage system reads the first file based on the file path of the first file, Return the first file to the calculation engine.
该读请求包括的文件路径“C:\windows\system32\Company information”和获取对应的文件路径“C:\windows\system32\Company information”相同,且获取的第二操作类型(读操作)对应的第二访问操作包括读操作,所以对象文件存储系统基于第一文件的文件路径“C:\windows\system32\Company information”读取如表1所示的第一文件,向计算引擎返回如表1所示的第一文件。The file path "C:\windows\system32\Company information" included in the read request is the same as the obtained corresponding file path "C:\windows\system32\Company information", and the obtained second operation type (read operation) corresponds to The second access operation includes a read operation, so the object file storage system reads the first file as shown in Table 1 based on the file path "C:\windows\system32\Company information" of the first file, and returns to the computing engine as shown in Table 1 The first file shown.
1-4:计算引擎接收第一文件,在该内容标识信息包括第一文件的标识信息,向第一用户返回第一文件;在该内容标识信息包括第一文件的标识信息和第一文件中的该分区的标识信息,从第一文件中获取该分区的内容,向第一用户返回该分区的内容。1-4: The computing engine receives the first file, the content identification information includes the identification information of the first file, and returns the first file to the first user; the content identification information includes the identification information of the first file and the first file identification information of the partition, obtain the contents of the partition from the first file, and return the contents of the partition to the first user.
在该内容标识信息包括第一文件的标识信息,表示第一用户需要查询第一文件的全部内容。在该内容标识信息包括第一文件的标识信息和第一文件中的该分区的标识信息,表示第一用户需要查询第一文件中的该分区的内容。The content identification information includes identification information of the first file, indicating that the first user needs to query the entire content of the first file. The content identification information includes identification information of the first file and identification information of the partition in the first file, indicating that the first user needs to query the content of the partition in the first file.
例如,计算引擎接收如表1所示的第一文件,在该内容标识信息包括如表1所示的文件的标识信息“Company information”,向第一用户返回如表1所示的第一文件。For example, the computing engine receives the first file shown in Table 1, where the content identification information includes the identification information "Company information" of the file shown in Table 1, and returns the first file shown in Table 1 to the first user. .
假设第一访问操作为更新第一文件,由第一访问操作映射得到的第二访问操作包括读操作和写操作,该访问需求信息包括待更新内容。在步骤408中,按如下流程访问第一文件。Assume that the first access operation is to update the first file, the second access operation mapped by the first access operation includes a read operation and a write operation, and the access requirement information includes content to be updated. In step 408, the first file is accessed according to the following process.
2-1:计算引擎向对象文件存储系统发送读请求,该读请求包括该临时凭证和该文件路径。2-1: The computing engine sends a read request to the object file storage system, and the read request includes the temporary credential and the file path.
2-2:对象文件存储系统接收该读请求,基于该读请求包括的临时凭证,从临时凭证、文件路径与操作类型的对应关系中获取对应的文件路径和第二操作类型。2-2: The object file storage system receives the read request, and based on the temporary credentials included in the read request, obtains the corresponding file path and second operation type from the correspondence between the temporary credentials, file path, and operation type.
2-3:如果该读请求包括的文件路径与获取的文件路径相同且第二操作类型对应的第二访问操作包括读操作,对象文件存储系统基于该文件路径读取第一文件,向计算引擎返回第一文件。2-3: If the file path included in the read request is the same as the obtained file path and the second access operation corresponding to the second operation type includes a read operation, the object file storage system reads the first file based on the file path and reports it to the computing engine. Return to the first file.
2-4:计算引擎接收第一文件,在该内容标识信息包括第一文件的标识信息,将第一文件中的内容更新为待更新内容;在该内容标识信息包括第一文件的标识信息和第一文件中的该 分区的标识信息,将第一文件中的该分区的内容更新为待更新内容。2-4: The computing engine receives the first file, the content identification information includes the identification information of the first file, and updates the content in the first file to the content to be updated; the content identification information includes the identification information of the first file and The identification information of the partition in the first file is used to update the content of the partition in the first file to the content to be updated.
2-5:计算引擎向对象文件存储系统发送写请求,该写请求包括该临时凭证、第一文件和该文件路径。2-5: The computing engine sends a write request to the object file storage system. The write request includes the temporary certificate, the first file, and the file path.
2-6:对象文件存储系统接收该写请求,基于该写请求包括的临时凭证,从临时凭证、文件路径与操作类型的对应关系中获取对应的文件路径和第二操作类型。2-6: The object file storage system receives the write request, and based on the temporary credentials included in the write request, obtains the corresponding file path and second operation type from the correspondence between the temporary credentials, file path, and operation type.
2-7:如果该写请求包括的文件路径与获取的文件路径相同且第二操作类型对应的第二访问操作包括写操作,对象文件存储系统将该文件路径处保存的第一文件替换为该写请求包括的第一文件。2-7: If the file path included in the write request is the same as the obtained file path and the second access operation corresponding to the second operation type includes a write operation, the object file storage system replaces the first file saved at the file path with this Write the first file included in the request.
第一访问操作还可能是其他的操作,例如第一访问操作还可能是删除第一文件等,在此不再一一列举说明。The first access operation may also be other operations, for example, the first access operation may be deleting the first file, etc., which will not be listed one by one here.
步骤409:计算引擎向数据过滤引擎发送访问指令,该访问指令包括第一文件的文件路径和该访问需求信息。Step 409: The computing engine sends an access instruction to the data filtering engine, where the access instruction includes the file path of the first file and the access requirement information.
参见图5或图6,在步骤409中,计算引擎(Hive引擎或Spark引擎)的路由模块向数据过滤引擎发送访问指令。Referring to Figure 5 or Figure 6, in step 409, the routing module of the computing engine (Hive engine or Spark engine) sends an access instruction to the data filtering engine.
步骤410:数据过滤引擎接收该访问指令,基于该管理员账号信息、该文件路径和该访问需求信息,访问第一文件。Step 410: The data filtering engine receives the access instruction and accesses the first file based on the administrator account information, the file path and the access requirement information.
在一些实施例中,数据过滤引擎基于第二权限信息、第一用户的账号信息和该访问需求信息,对第一用户访问该内容的权限进行鉴权。在对第一用户访问该内容的权限进行鉴权通过后,基于该管理员账号信息、该文件路径和该访问需求信息,访问第一文件。第二权限信息用于指示能够访问该内容的用户身份和能够访问该内容的第四访问操作。In some embodiments, the data filtering engine authenticates the first user's permission to access the content based on the second permission information, the first user's account information, and the access requirement information. After the first user's permission to access the content is authenticated, the first file is accessed based on the administrator account information, the file path and the access requirement information. The second permission information is used to indicate the user identity that can access the content and the fourth access operation that can access the content.
其中,数据过滤引擎对第一用户访问该内容的权限进行鉴权的详细过程,参见上述步骤402中计算引擎对第一用户访问该内容的权限进行鉴权的详细过程,在此不再详细说明。For the detailed process of the data filtering engine authenticating the first user's permission to access the content, please refer to the detailed process of the computing engine authenticating the first user's permission to access the content in the above step 402, which will not be described in detail here. .
第一文件为结构化数据文件,该访问需求信息包括第一文件的标识信息、第一信息和第一访问操作的第一操作类型,第一信息包括第一文件的至少一列的列标识,或者,第一信息包括第一文件的至少一列的列标识和该至少一列对应的行过滤信息,或者,第一信息包括第一文件中的该至少一行的行号。或者,第一文件为半结构化数据文件,该内容的内容标识信息包括第一文件的标识信息和第一文件中的数据片段的标识信息。The first file is a structured data file, and the access requirement information includes identification information of the first file, first information and a first operation type of the first access operation, and the first information includes a column identifier of at least one column of the first file, or , the first information includes a column identifier of at least one column of the first file and row filtering information corresponding to the at least one column, or the first information includes a line number of the at least one line in the first file. Alternatively, the first file is a semi-structured data file, and the content identification information of the content includes identification information of the first file and identification information of the data fragments in the first file.
假设第一访问操作为查询第一文件,由第一访问操作映射得到的第二访问操作包括读操作,在步骤410中,按如下流程访问第一文件。Assume that the first access operation is to query the first file, and the second access operation mapped by the first access operation includes a read operation. In step 410, the first file is accessed according to the following process.
3-1:数据过滤引擎向对象文件存储系统发送读请求,该读请求包括管理员账号信息、第一文件的文件路径。3-1: The data filtering engine sends a read request to the object file storage system. The read request includes the administrator account information and the file path of the first file.
例如,对于上述列举的访问需求信息为:Select*From Company information Where City=城市1的例子。在该例子中,第一操作类型“Select”为查询第一文件,内容的内容标识信息包括“Company information”和第四列的列标识“City”,第四列对应的行过滤信息“城市1”,计算引擎对“Select”映射出的第二访问操作包括读操作。For example, the access requirement information listed above is: Select*From Company information Where City=City 1. In this example, the first operation type "Select" is to query the first file. The content identification information of the content includes "Company information" and the column identification "City" in the fourth column. The row filter information corresponding to the fourth column is "City 1". ”, the second access operation mapped by the calculation engine to “Select” includes a read operation.
数据过滤引擎向对象文件存储系统发送读请求,该读请求包括管理员账号信息“administrators”和第一文件的文件路径“C:\windows\system32\Company information”。The data filtering engine sends a read request to the object file storage system. The read request includes the administrator account information "administrators" and the file path of the first file "C:\windows\system32\Company information".
3-2:对象文件存储系统接收该读请求,确定该读请求包括的账号信息为管理员账号信息时,基于第一文件的文件路径读取第一文件,向数据过滤引擎返回第一文件。3-2: When the object file storage system receives the read request and determines that the account information included in the read request is the administrator account information, it reads the first file based on the file path of the first file and returns the first file to the data filtering engine.
管理员的权限较大,所以对象文件存储系统在确定该读请求包括的账号信息为管理员账号信息时,便可直接基于第一文件的文件路径读取第一文件。The administrator has greater authority, so when the object file storage system determines that the account information included in the read request is the administrator's account information, it can directly read the first file based on the file path of the first file.
例如,对象文件存储系统接收该读请求,基于第一文件的文件路径“C:\windows\system32\Company information”读取如表1所示的第一文件,向数据过滤引擎返回如表1所示的第一文件。For example, the object file storage system receives the read request, reads the first file shown in Table 1 based on the file path "C:\windows\system32\Company information" of the first file, and returns the information shown in Table 1 to the data filtering engine. The first file shown.
3-3:数据过滤引擎接收第一文件,在该访问需求信息包括第一信息,基于第一信息获取第一文件中的内容;在该内容标识信息包括第一文件中的数据片段的标识信息,从第一文件中获取第一文件中的内容,该内容为该数据片段中的内容。3-3: The data filtering engine receives the first file, the access requirement information includes the first information, and obtains the content in the first file based on the first information; the content identification information includes identification information of the data fragments in the first file. , obtain the content in the first file from the first file, and the content is the content in the data fragment.
在一些实施例中,第一信息包括第一文件中的至少一列的列标识,数据过滤引擎基于第一信息从第一文件中获取该至少一列的内容,向计算引擎返回该至少一列的内容。In some embodiments, the first information includes a column identifier of at least one column in the first file, the data filtering engine obtains the content of the at least one column from the first file based on the first information, and returns the content of the at least one column to the computing engine.
在一些实施例中,第一信息包括第一文件中的至少一列的列标识和行过滤信息,数据过滤引擎基于第一信息从第一文件中获取该至少一列中的内容为该行过滤信息的一行或多行的内容,向计算引擎返回该一行或多行的内容。In some embodiments, the first information includes a column identifier and row filtering information of at least one column in the first file, and the data filtering engine obtains the content of the at least one column from the first file based on the first information as the row filtering information. The content of one or more lines is returned to the calculation engine.
例如,数据过滤引擎接收如表1所示的第一文件,基于第四列的列标识“City”和第四列对应的行过滤信息“城市1”,从如表1所示的第一文件中获取第四列的内容为“城市1”的三行,该三行为表1中的第一行、第二行和第五行,向计算引擎返回该三行的内容,计算引擎向第一用户返回该三行内容。For example, the data filtering engine receives the first file shown in Table 1 and filters the information "City 1" based on the column identifier "City" in the fourth column and the row corresponding to the fourth column "City 1". Obtain the three rows whose content in the fourth column is "City 1". These three rows are the first row, the second row and the fifth row in Table 1. Return the contents of the three rows to the calculation engine. The calculation engine returns the content to the first user. Return the three lines of content.
在一些实施例中,第一信息包括第一文件中的至少一行的行号,数据过滤引擎基于第一信息从第一文件中获取该至少一行的内容,向计算引擎返回该至少一行的内容。In some embodiments, the first information includes a line number of at least one line in the first file, the data filtering engine obtains the content of the at least one line from the first file based on the first information, and returns the content of the at least one line to the computing engine.
在一些实施例中,第一信息包括第一文件中的数据片段的标识信息,数据过滤引擎基于第一信息从第一文件中获取该数据片段的内容,向计算引擎返回该数据片段的内容。In some embodiments, the first information includes identification information of the data fragment in the first file, the data filtering engine obtains the content of the data fragment from the first file based on the first information, and returns the content of the data fragment to the computing engine.
3-4:数据过滤引擎向计算引擎返回第一文件中的内容。3-4: The data filtering engine returns the content in the first file to the calculation engine.
3-5:计算引擎接收第一文件中的内容,向第一用户返回第一文件中的内容。3-5: The calculation engine receives the content in the first file and returns the content in the first file to the first user.
假设第一访问操作为更新第一文件,由第一访问操作映射得到的第二访问操作包括读操作和写操作,该访问需求信息包括待更新内容。在步骤410中,按如下流程访问第一文件。Assume that the first access operation is to update the first file, the second access operation mapped by the first access operation includes a read operation and a write operation, and the access requirement information includes content to be updated. In step 410, the first file is accessed according to the following process.
4-1:数据过滤引擎向对象文件存储系统发送读请求,该读请求包括管理员账号信息和第一文件的文件路径。4-1: The data filtering engine sends a read request to the object file storage system. The read request includes the administrator account information and the file path of the first file.
4-2:对象文件存储系统接收该读请求,确定该读请求包括的账号信息为管理员账号信息时,基于第一文件的文件路径读取第一文件,向数据过滤引擎返回第一文件。4-2: When the object file storage system receives the read request and determines that the account information included in the read request is the administrator account information, it reads the first file based on the file path of the first file and returns the first file to the data filtering engine.
4-3:数据过滤引擎接收第一文件,在该访问需求信息包括第一信息,将第一文件中的第一信息指示的至少一列或至少一行更新为待更新内容;在该访问需求信息包括第一文件中的数据片段的标识信息,将第一文件中的该数据片段内的内容更新为待更新内容。4-3: The data filtering engine receives the first file, where the access requirement information includes the first information, and updates at least one column or at least one row indicated by the first information in the first file to the content to be updated; where the access requirement information includes The identification information of the data fragment in the first file updates the content in the data fragment in the first file to the content to be updated.
在一些实施例中,第一信息包括第一文件中的至少一列的列标识,数据过滤引擎将第一文件中的该至少一列更新为待更新内容。In some embodiments, the first information includes a column identifier of at least one column in the first file, and the data filtering engine updates the at least one column in the first file to the content to be updated.
在一些实施例中,第一信息包括第一文件中的至少一列的列标识和行过滤信息,数据过滤引擎基于第一信息从第一文件中确定该至少一列中的内容为该行过滤信息的一行或多行的内容,将该一行或多行的内容更新为待更新内容。In some embodiments, the first information includes a column identifier and row filtering information of at least one column in the first file, and the data filtering engine determines from the first file based on the first information that the content in the at least one column is the row filtering information. The content of one or more rows is updated to the content to be updated.
在一些实施例中,第一信息包括第一文件中的至少一行的行号,数据过滤引擎基于第一信息将第一文件中的该至少一行的内容更新为待更新内容。In some embodiments, the first information includes a line number of at least one line in the first file, and the data filtering engine updates the content of the at least one line in the first file to the content to be updated based on the first information.
在一些实施例中,第一信息包括第一文件中的数据片段的标识信息,数据过滤引擎基于该数据片段的标识信息将第一文件中的该数据片段的内容更新为待更新内容。In some embodiments, the first information includes identification information of the data fragment in the first file, and the data filtering engine updates the content of the data fragment in the first file to the content to be updated based on the identification information of the data fragment.
4-4:数据过滤引擎向对象文件存储系统发送写请求,该写请求包括管理员账号信息、第一文件的文件路径和更新后的第一文件。4-4: The data filtering engine sends a write request to the object file storage system. The write request includes administrator account information, the file path of the first file, and the updated first file.
4-5:对象文件存储系统接收该写请求,确定该写请求包括的账号信息为管理员账号信息时,将该文件路径处保存的第一文件替换为该写请求包括的第一文件。4-5: When the object file storage system receives the write request and determines that the account information included in the write request is the administrator account information, it replaces the first file saved in the file path with the first file included in the write request.
第一访问操作还可能是其他的操作,例如第一访问操作还可能是删除第一文件等,在此不再一一列举说明。The first access operation may also be other operations, for example, the first access operation may be deleting the first file, etc., which will not be listed one by one here.
在一些实施例中,对于联动权限模块中的第一权限信息和第二权限信息,联动权限模块接收权限管理员配置的第二权限信息,基于第二权限信息生成第一权限信息。In some embodiments, for the first permission information and the second permission information in the linkage permission module, the linkage permission module receives the second permission information configured by the permission administrator, and generates the first permission information based on the second permission information.
这样权限管理员通过配置第二权限信息来授权访问文件的内容的权限,而联动权限模块基于第二权限信息生成第一权限信息,这样访问系统中的计算引擎使用第二权限信息进行鉴权,文件路径鉴权模块使用第一权限信息进行鉴权。这样只需求权限管理员一次授权,就能够两维鉴权。In this way, the permission administrator authorizes access to the content of the file by configuring the second permission information, and the linkage permission module generates the first permission information based on the second permission information, so that the computing engine in the access system uses the second permission information for authentication. The file path authentication module uses the first permission information for authentication. This only requires one authorization from the authority administrator to enable two-dimensional authentication.
在一些实施例中,第一用户开源的方式访问第一文件,即第一用户通过客户端向文件路径鉴权模块发送鉴权信息,该鉴权信息包括第一文件的文件路径,第一用户的用户身份和第二访问操作的第二操作类型。文件路径鉴权模块基于该鉴权信息对第一用户采用第二访问操作访问该文件路径的权限进行鉴权,鉴权通过后,向该客户端发送鉴权响应,该鉴权响应包括临时凭证,以及向对象文件存储系统发送存储信息,该存储信息包括该临时任证、该文件路径和第二操作类型。客户端接收该鉴权响应,基于该临时凭证、该文件路径和该访问需求信息,访问对象文件存储系统中的第一文件。如此实现文件透明访问。In some embodiments, the first user accesses the first file in an open source manner, that is, the first user sends authentication information to the file path authentication module through the client, and the authentication information includes the file path of the first file. The user identity and the second operation type of the second access operation. The file path authentication module authenticates the first user's permission to access the file path using the second access operation based on the authentication information. After passing the authentication, it sends an authentication response to the client. The authentication response includes the temporary credentials. , and sending storage information to the object file storage system, where the storage information includes the temporary certificate, the file path and the second operation type. The client receives the authentication response, and based on the temporary credential, the file path and the access requirement information, accesses the first file in the object file storage system. This achieves transparent file access.
在本申请实施例中,由于计算引擎基于访问需求信息确定访问第一文件中的内容的粒度,在确定的粒度为第一粒度时,计算引擎请求文件路径鉴权模块对第一用户访问第一文件的文件路径的权限进行鉴权,在鉴权通过后获取文件路径鉴权模块分配的临时凭证,基于该临时凭证,该访问需求信息和该文件路径访问对象文件存储系统。由于计算引擎直接访问对象文件存储系统,从而提高文件的读写性能。在确定的粒度为第二粒度时,计算引擎请求数据过滤引擎访问对象文件存储系统,数据过滤引擎包括指定的管理员账号信息,从而可以从对象文件存储系统中读取第一文件,基于第二粒度,从第一文件中分割出第一用户需要访问的数据,从而实现向用户提供比文件粒度更小的访问服务,丰富了用户提供的访问服务。In the embodiment of the present application, since the computing engine determines the granularity of accessing the content in the first file based on the access requirement information, when the determined granularity is the first granularity, the computing engine requests the file path authentication module to access the first file for the first user. The permissions of the file path of the file are authenticated. After the authentication is passed, the temporary credentials assigned by the file path authentication module are obtained. Based on the temporary credentials, the access requirement information and the file path access the object file storage system. Since the computing engine directly accesses the object file storage system, the file reading and writing performance is improved. When the determined granularity is the second granularity, the computing engine requests the data filtering engine to access the object file storage system. The data filtering engine includes the specified administrator account information, so that the first file can be read from the object file storage system. Based on the second Granularity: segment the data that the first user needs to access from the first file, thereby providing the user with access services smaller than the file granularity, and enriching the access services provided by the user.
参见图7,本申请实施例提供了一种获取第一权限信息的方法700。其中,上述图1或图3所示实施例中的第一权限信息,或者,为上述图4所示实施例中的第一权限信息通过该方法700获取得到。该方法700包括如下步骤:Referring to Figure 7, an embodiment of the present application provides a method 700 for obtaining first permission information. The first permission information in the above-mentioned embodiment shown in FIG. 1 or FIG. 3 , or the first permission information in the above-mentioned embodiment shown in FIG. 4 is obtained through the method 700 . The method 700 includes the following steps:
步骤701:联动权限模块接收第二权限信息,第二权限信息用于指示能够访问第一文件中的内容的用户身份和第四访问操作。Step 701: The linkage permission module receives second permission information. The second permission information is used to indicate the identity of the user who can access the content in the first file and the fourth access operation.
联动权限模块从元数据中心中获取第一文件的元数报,第一文件的元数据为元数据中心中保存的任一个元数据。从身份认证中心中获取至少一个用户身份,向权限管理员显示第二界面,第二界面包括第一文件的元数据和该至少一个用户身份。The linkage authority module obtains the metadata report of the first file from the metadata center, and the metadata of the first file is any metadata stored in the metadata center. Obtain at least one user identity from the identity authentication center, and display a second interface to the authority administrator, where the second interface includes metadata of the first file and the at least one user identity.
这样,权限管理员从第一文件的元数据中选择第一文件中的内容的内容标识信息,从该 至少一个用户身份中选择能够访问该内容的用户身份,以及向第二界面输入能够访问该内容的第四访问操作的第四操作类型,如此得到第二权限信息。第二权限信息包括该内容的内容标识信息、选择的用户身份和输入的第四操作类型。联动权限模块从第二界面中读取第二权限信息。In this way, the rights administrator selects the content identification information of the content in the first file from the metadata of the first file, selects a user identity that can access the content from the at least one user identity, and inputs the user identity that can access the content into the second interface. The fourth operation type of the fourth access operation of the content is used to obtain the second permission information. The second permission information includes content identification information of the content, the selected user identity and the input fourth operation type. The linkage permission module reads the second permission information from the second interface.
在一些实施例中,第一文件为结构化数据文件,第一文件的元数据包括第一文件的文件标识和第一文件的每列的列标识。可选地,权限管理员选择的内容的内容标识信息包括第一文件的文件标识,或者,权限管理员选择的内容的内容标识信息包括第一文件的文件标识和第一文件中的至少一列的列标识,或者,权限管理员选择的内容的内容标识信息包括第一文件的文件标识和第一文件中的至少一行的行号。In some embodiments, the first file is a structured data file, and the metadata of the first file includes a file identifier of the first file and a column identifier of each column of the first file. Optionally, the content identification information of the content selected by the rights administrator includes the file identification of the first file, or the content identification information of the content selected by the rights administrator includes the file identification of the first file and at least one column in the first file. The column identification, or content identification information of the content selected by the rights administrator includes a file identification of the first file and a line number of at least one line in the first file.
在一些实施例中,第一文件为半结构化数据文件,第一文件的元数据包括第一文件的文件标识和第一文件的每个数据片段的标识信息。可选地,权限管理员选择的内容的内容标识信息包括第一文件的文件标识,或者,权限管理员选择的内容的内容标识信息包括第一文件的文件标识和第一文件中的至少一个数据片段的标识信息。In some embodiments, the first file is a semi-structured data file, and the metadata of the first file includes a file identification of the first file and identification information of each data fragment of the first file. Optionally, the content identification information of the content selected by the rights administrator includes the file identification of the first file, or the content identification information of the content selected by the rights administrator includes the file identification of the first file and at least one data in the first file. Identification information for the fragment.
步骤702:联动权限模块基于第二权限信息生成第一权限信息。Step 702: The linkage authority module generates first authority information based on the second authority information.
在步骤702中,联动权限模块通过如下7021-7023的操作生成第一权限信息,该7021-7023的操作为:In step 702, the linkage permission module generates the first permission information through the following operations 7021-7023. The operations 7021-7023 are:
7021:联动权限模块基于第二权限信息中的该内容的内容标识信息,获取第一文件的文件路径。7021: The linkage permission module obtains the file path of the first file based on the content identification information of the content in the second permission information.
在一些实施例中,该内容的内容标识信息包括第一文件的标识信息,联动权限模块从元数据中心中获取包括第一文件的标识信息的元数据,该元数据为第一文件的元数据,从第一文件的元数据中获取第一文件的文件路径。In some embodiments, the content identification information of the content includes the identification information of the first file, and the linkage permission module obtains metadata including the identification information of the first file from the metadata center, and the metadata is the metadata of the first file. , obtain the file path of the first file from the metadata of the first file.
7022:联动权限模块对第二权限信息包括的第四操作类型进行映射,得到第三操作类型。7022: The linkage permission module maps the fourth operation type included in the second permission information to obtain the third operation type.
第四操作类型对应的访问操作是管理员配置的能够访问第一文件中的内容的第四访问操作,第四访问操作可以为查询第一文件、更新第一文件或删除第一文件等。第三操作类型是第四操作类型对应的能够访问对象文件存储系统的访问操作。第三操作类型包括读操作和/或写操作等。The access operation corresponding to the fourth operation type is the fourth access operation configured by the administrator to be able to access the content in the first file. The fourth access operation may be querying the first file, updating the first file, or deleting the first file, etc. The third operation type is an access operation corresponding to the fourth operation type that can access the object file storage system. The third operation type includes read operations and/or write operations, etc.
7023:联动权限模块从第二权限信息中读取用户身份,将第一文件的文件路径、该用户身份和第三操作类型组成第二权限信息。7023: The linkage permission module reads the user identity from the second permission information, and combines the file path of the first file, the user identity and the third operation type into the second permission information.
步骤703:联动权限模块保存第一权限信息和第二权限信息。Step 703: The linkage authority module saves the first authority information and the second authority information.
可以重复执行上述步骤701-703的过程,使联动权限模块生成大量的第一权限信息和第二权限信息。The above-mentioned steps 701-703 can be repeatedly executed, so that the linkage authority module generates a large amount of first authority information and second authority information.
在本申请实施例中,联动权限模块接收权限管理员配置的第二权限信息,基于第二权限信息生成第一权限信息,第一权限信息用于指示能够访问第一文件的文件路径的用户身份和访问操作。从而可以自动生成第一权限信息,提高了得到第一权限信息的效率,降低得到第一权限信息的成本。由于权限管理员只需要配置第二权限信息,联动权限模块自动基于第二权限信息生成第一权限信息,第二权限信息用于对用户访问第一文件中的内容的权限进行鉴权,第一权限信息用于对用户访问第一文件的文件路径的权限进行鉴权。这样权限管理员只需要授权一次(配置第二权限信息),访问系统使用第二权限信息和第一权限信息进行两维鉴权。In the embodiment of this application, the linkage permission module receives the second permission information configured by the permission administrator, and generates the first permission information based on the second permission information. The first permission information is used to indicate the identity of the user who can access the file path of the first file. and access operations. Therefore, the first authority information can be automatically generated, which improves the efficiency of obtaining the first authority information and reduces the cost of obtaining the first authority information. Since the permission administrator only needs to configure the second permission information, the linkage permission module automatically generates the first permission information based on the second permission information. The second permission information is used to authenticate the user's permission to access the content in the first file. The first The permission information is used to authenticate the user's permission to access the file path of the first file. In this way, the authority administrator only needs to authorize once (configure the second authority information), and the access system uses the second authority information and the first authority information for two-dimensional authentication.
参见图8,本申请实施例提供了一种访问文件的装置800,所述装置800可以部署图1或图3所示的系统中的计算引擎上,或部署在图4、图5或图6所示实施例中的计算引擎上。所述装置800包括:Referring to Figure 8, an embodiment of the present application provides a device 800 for accessing files. The device 800 can be deployed on the computing engine in the system shown in Figure 1 or Figure 3, or deployed on the system shown in Figure 4, Figure 5 or Figure 6 on the compute engine in the embodiment shown. The device 800 includes:
通信单元801,用于接收数据访问请求,该数据访问请求包括访问需求信息,该访问需求信息用于指示第一用户需要访问的第一文件中的内容,第一文件存储在对象文件存储系统中; Communication unit 801, configured to receive a data access request. The data access request includes access requirement information. The access requirement information is used to indicate the content of the first file that the first user needs to access. The first file is stored in the object file storage system. ;
处理单元802,用于在基于该访问需求信息确定访问第一文件中的内容的粒度为第一粒度时,基于第一用户的账号信息和该访问需求信息,访问第一文件;The processing unit 802 is configured to access the first file based on the account information of the first user and the access requirement information when it is determined that the granularity of accessing the content in the first file is the first granularity based on the access requirement information;
处理单元802,还用于在基于该访问需求信息确定访问第一文件中的内容的粒度为第二粒度时,基于指定的管理员账号信息和该访问需求信息,访问第一文件,第二粒度小于第一粒度。The processing unit 802 is also configured to access the first file at the second granularity based on the specified administrator account information and the access requirement information when it is determined that the granularity of accessing the content in the first file is the second granularity based on the access requirement information. smaller than the first particle size.
可选地,通信单元801接收数据访问请求的详细实现过程,参见图4所示实施例的步骤401的相关内容,在此不再详细说明。Optionally, for the detailed implementation process of the communication unit 801 receiving the data access request, please refer to the relevant content of step 401 of the embodiment shown in Figure 4, which will not be described in detail here.
可选地,处理单元802基于第一用户的账号信息和该访问需求信息,访问第一文件的详细实现过程,参见图4所示实施例的步骤405-408的相关内容,在此不再详细说明。Optionally, the processing unit 802 accesses the first file based on the account information of the first user and the access requirement information. For a detailed implementation process, please refer to the relevant content of steps 405-408 of the embodiment shown in Figure 4, which will not be detailed here. illustrate.
可选地,处理单元802基于指定的管理员账号信息和该访问需求信息,访问第一文件的详细实现过程,参见图4所示实施例的步骤409-410的相关内容,在此不再详细说明。Optionally, the processing unit 802 accesses the first file based on the specified administrator account information and the access requirement information. For a detailed implementation process, please refer to the relevant content of steps 409-410 of the embodiment shown in Figure 4, which will not be detailed here. illustrate.
可选地,该访问需求信息包括第一文件的标识信息,第一粒度为文件粒度;或者,Optionally, the access requirement information includes identification information of the first file, and the first granularity is file granularity; or,
可选地,该访问需求信息包括第一文件的标识信息和第一文件中的分区的标识信息,第一粒度为分区粒度。Optionally, the access requirement information includes identification information of the first file and identification information of a partition in the first file, and the first granularity is a partition granularity.
可选地,通信单元801,还用于向文件路径鉴权模块发送鉴权请求,该鉴权请求包括鉴权信息,该鉴权信息用于指示第一用户、第一文件的文件路径和第一用户访问该文件路径的访问操作,该鉴权信息是基于该访问需求信息和第一用户的账号信息得到的,该鉴权请求用于触发文件路径鉴权模块基于第一权限信息和该鉴权信息对第一用户采用该访问操作访问该文件路径的权限进行鉴权,该文件路径用于指示第一文件的存储位置,第一权限信息用于指示能够访问该文件路径的用户身份和访问操作;Optionally, the communication unit 801 is also configured to send an authentication request to the file path authentication module. The authentication request includes authentication information, and the authentication information is used to indicate the first user, the file path of the first file, and the file path of the first file. A user accesses the file path. The authentication information is obtained based on the access requirement information and the first user's account information. The authentication request is used to trigger the file path authentication module based on the first permission information and the authentication The permission information authenticates the first user's permission to access the file path using the access operation. The file path is used to indicate the storage location of the first file. The first permission information is used to indicate the identity and access of the user who can access the file path. operate;
通信单元801,还用于接收文件路径鉴权模块对该权限鉴权通过后发送的鉴权响应,该鉴权响应包括临时凭证,该临时凭证、该文件路径和该访问操作的操作类型在对象文件存储系统中对应存储;The communication unit 801 is also used to receive an authentication response sent by the file path authentication module after the permission has been authenticated. The authentication response includes a temporary credential. The temporary credential, the file path and the operation type of the access operation are in the object. Corresponding storage in the file storage system;
处理单元802,用于基于该临时凭证、该访问需求信息和该文件路径,访问第一文件。The processing unit 802 is configured to access the first file based on the temporary credential, the access requirement information and the file path.
可选地,通信单元801向文件路径鉴权模块发送鉴权请求的详细实现过程,参见图4所示实施例的步骤404的相关内容,在此不再详细说明。Optionally, for the detailed implementation process of the communication unit 801 sending the authentication request to the file path authentication module, please refer to the relevant content of step 404 of the embodiment shown in Figure 4, which will not be described in detail here.
可选地,通信单元801接收鉴权响应的详细实现过程,参见图4所示实施例的步骤408的相关内容,在此不再详细说明。Optionally, for the detailed implementation process of the communication unit 801 receiving the authentication response, please refer to the relevant content of step 408 of the embodiment shown in Figure 4, which will not be described in detail here.
可选地,处理单元802基于该临时凭证、该访问需求信息和该文件路径,访问第一文件的详细实现过程,参见图4所示实施例的步骤408的相关内容,在此不再详细说明。Optionally, the processing unit 802 accesses the first file based on the temporary credential, the access requirement information and the file path. For a detailed implementation process, refer to the relevant content of step 408 of the embodiment shown in Figure 4, which will not be described in detail here. .
可选地,第一文件为结构化数据文件,第一文件采用列表形式来存储数据,该访问需求信息包括第一文件的标识信息和第一信息,第一信息用于指示第一文件的至少一列和/或第一 文件的至少一行,第二粒度为行列粒度;或者,Optionally, the first file is a structured data file, and the first file uses a list form to store data. The access requirement information includes identification information of the first file and first information, and the first information is used to indicate at least one of the first files. A column and/or at least one row of the first file, and the second granularity is the column granularity; or,
第一文件为半结构化数据文件,第一文件包括至少一个数据片段,该数据片段用于保存具有相同业务属性的数据,该访问需求信息包括第一文件的标识信息和第一文件中的一个或多个数据片段的标识信息,第二粒度为数据片段粒度。The first file is a semi-structured data file. The first file includes at least one data fragment. The data fragment is used to save data with the same business attributes. The access requirement information includes identification information of the first file and one of the first files. or identification information of multiple data fragments, and the second granularity is the data fragment granularity.
可选地,通信单元801,还用于向数据过滤引擎发送访问指令,该访问指令包括该访问需求信息,数据过滤引擎包括管理员账号信息,该访问指令用于触发数据过滤引擎基于管理员账号信息和该访问需求信息,访问第一文件。Optionally, the communication unit 801 is also used to send an access instruction to the data filtering engine. The access instruction includes the access requirement information. The data filtering engine includes the administrator account information. The access instruction is used to trigger the data filtering engine based on the administrator account. Information and the access requirement information, access the first file.
可选地,通信单元801向数据过滤引擎发送访问指令的详细实现过程,参见图4所示实施例的步骤409的相关内容,在此不再详细说明。Optionally, for the detailed implementation process of the communication unit 801 sending the access instruction to the data filtering engine, please refer to the relevant content of step 409 of the embodiment shown in Figure 4, which will not be described in detail here.
可选地,处理模块802,还用于:Optionally, the processing module 802 is also used to:
基于第二权限信息、第一用户的账号信息和该访问需求信息,对第一用户访问该内容的权限进行鉴权,第二权限信息用于指示能够访问该内容的用户身份和访问操作;Based on the second permission information, the first user's account information and the access requirement information, authenticate the first user's permission to access the content, and the second permission information is used to indicate the identity and access operation of the user who can access the content;
在对第一用户访问该内容的权限鉴权通过后,基于该访问需求信息确定访问第一文件中的内容的粒度。After the first user's permission to access the content is authenticated, the granularity of accessing the content in the first file is determined based on the access requirement information.
可选地,处理单元802还用于:Optionally, the processing unit 802 is also used to:
基于第二权限信息生成第一权限信息,第一权限信息用于指示能够访问第一文件的文件路径的用户身份和访问操作,该文件路径用于指示第一文件的存储位置。The first permission information is generated based on the second permission information. The first permission information is used to indicate the user identity and access operation that can access the file path of the first file. The file path is used to indicate the storage location of the first file.
可选地,处理单元802生成第一权限信息的详细实现过程,参见图7所示实施例的步骤702的相关内容,在此不再详细说明。Optionally, for a detailed implementation process of the processing unit 802 generating the first permission information, please refer to the relevant content of step 702 of the embodiment shown in FIG. 7, which will not be described in detail here.
在本申请实施例中,由于第一粒度大于第二粒度,处理单元在确定的粒度为第一粒度时,基于第一用户的账号信息和该访问需求信息访问第一文件,这样不用借用管理员账号信息来访问第一文件,提高访问第一文件的效率以及读写第一文件的性能。由于处理单元在确定的粒度为第二粒度时,处理单元基于指定的管理员账号信息和该访问需求信息访问第一文件,这样借助管理员账号信息来代替第一用户的账号信息,并使用管理员账号信息来访问第一文件,如此不需要为第一用户配置能够访问第二粒度的权限,从而避免扩展第一用户的访问权限,便于权限管理。In the embodiment of this application, since the first granularity is larger than the second granularity, when the determined granularity is the first granularity, the processing unit accesses the first file based on the first user's account information and the access requirement information, so that there is no need to borrow the administrator's permission. Account information is used to access the first file, thereby improving the efficiency of accessing the first file and the performance of reading and writing the first file. Because when the determined granularity of the processing unit is the second granularity, the processing unit accesses the first file based on the specified administrator account information and the access requirement information, so that the administrator account information is used to replace the first user's account information, and uses the management User account information is used to access the first file, so that there is no need to configure permissions for the first user to access the second granularity, thereby avoiding expansion of the first user's access permissions and facilitating permission management.
参见图9,本申请实施例提供了一种访问文件的装置900示意图。该装置900可以是上述任一实施例中的计算引擎,例如可以是图1、图3、图4、图5或图6所示实施例提供的计算引擎。该装置900包括至少一个处理器901,内部连接902,存储器903以及至少一个收发器904。Referring to Figure 9, an embodiment of the present application provides a schematic diagram of a device 900 for accessing files. The device 900 may be the computing engine in any of the above embodiments, for example, it may be the computing engine provided by the embodiment shown in FIG. 1, FIG. 3, FIG. 4, FIG. 5 or FIG. 6. The device 900 includes at least one processor 901, internal connections 902, memory 903 and at least one transceiver 904.
该装置900是一种硬件结构的装置,可以用于实现图8所述的装置800中的功能模块。例如,本领域技术人员可以想到图8所示的装置800中的处理单元802可以通过该至少一个处理器901调用存储器903中的代码来实现,图8所示的装置800中的通信单元801可以通过该收发器904来实现。The device 900 is a device with a hardware structure and can be used to implement the functional modules in the device 800 described in Figure 8 . For example, those skilled in the art can imagine that the processing unit 802 in the device 800 shown in Figure 8 can be implemented by calling the code in the memory 903 through the at least one processor 901, and the communication unit 801 in the device 800 shown in Figure 8 can be implemented. This is achieved through the transceiver 904.
可选的,该装置900还可用于实现上述任一实施例中计算引擎的功能。Optionally, the device 900 can also be used to implement the functions of the computing engine in any of the above embodiments.
可选的,上述处理器901可以是一个通用中央处理器(central processing unit,CPU),网络处理器(network processor,NP),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。Optionally, the above-mentioned processor 901 can be a general central processing unit (CPU), a network processor (network processor, NP), a microprocessor, an application-specific integrated circuit (ASIC) , or one or more integrated circuits used to control the execution of the program of this application.
上述内部连接902可包括一通路,在上述组件之间传送信息。可选的,内部连接902为单板或总线等。The internal connection 902 may include a path for transmitting information between the components. Optionally, the internal connection 902 is a single board or a bus, etc.
上述收发器904,用于与其他设备或通信网络通信。The above-mentioned transceiver 904 is used to communicate with other devices or communication networks.
上述存储器903可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。The above-mentioned memory 903 can be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (random access memory, RAM) or other types that can store information and instructions. Type of dynamic storage device, it can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc Storage (including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store the desired program code in the form of instructions or data structures and can be used by Any other media accessible by a computer, but not limited to this. The memory can exist independently and be connected to the processor through a bus. Memory can also be integrated with the processor.
其中,存储器903用于存储执行本申请方案的应用程序代码,并由处理器901来控制执行。处理器901用于执行存储器903中存储的应用程序代码,以及配合至少一个收发器904,从而使得该装置900实现本专利方法中的功能。Among them, the memory 903 is used to store the application program code for executing the solution of the present application, and the processor 901 controls the execution. The processor 901 is used to execute the application program code stored in the memory 903, and cooperate with at least one transceiver 904, so that the device 900 implements the functions in the patent method.
在具体实现中,作为一种实施例,处理器901可以包括一个或多个CPU,例如图9中的CPU0和CPU1。In specific implementation, as an embodiment, the processor 901 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 9 .
在具体实现中,作为一种实施例,该装置900可以包括多个处理器,例如图9中的处理器901和处理器907。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In specific implementation, as an embodiment, the device 900 may include multiple processors, such as the processor 901 and the processor 907 in Figure 9 . Each of these processors may be a single-CPU processor or a multi-CPU processor. A processor here may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps to implement the above embodiments can be completed by hardware, or can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable storage medium. The above-mentioned The storage media mentioned can be read-only memory, magnetic disks or optical disks, etc.
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above are only optional embodiments of the present application and are not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc. made within the principles of the present application shall be included in the protection scope of the present application. Inside.
Claims (23)
- 一种访问文件的方法,其特征在于,所述方法包括:A method for accessing files, characterized in that the method includes:接收数据访问请求,所述数据访问请求包括访问需求信息,所述访问需求信息用于指示第一用户需要访问的第一文件中的内容,所述第一文件存储在对象文件存储系统中;Receive a data access request, the data access request includes access requirement information, the access requirement information is used to indicate the content in the first file that the first user needs to access, and the first file is stored in the object file storage system;在基于所述访问需求信息确定访问所述第一文件中的内容的粒度为第一粒度时,基于所述第一用户的账号信息和所述访问需求信息,访问所述第一文件;When it is determined that the granularity of accessing the content in the first file is the first granularity based on the access requirement information, access the first file based on the account information of the first user and the access requirement information;在基于所述访问需求信息确定访问所述第一文件中的内容的粒度为第二粒度时,基于指定的管理员账号信息和所述访问需求信息,访问所述第一文件,所述第二粒度小于所述第一粒度。When it is determined based on the access requirement information that the granularity of accessing the content in the first file is the second granularity, the first file is accessed based on the specified administrator account information and the access requirement information, and the second granularity is determined based on the access requirement information. The particle size is smaller than said first particle size.
- 如权利要求1所述的方法,其特征在于,所述访问需求信息包括所述第一文件的标识信息,所述第一粒度为文件粒度;或者,The method of claim 1, wherein the access requirement information includes identification information of the first file, and the first granularity is file granularity; or,所述访问需求信息包括所述第一文件的标识信息和所述第一文件中的分区的标识信息,所述第一粒度为分区粒度。The access requirement information includes identification information of the first file and identification information of partitions in the first file, and the first granularity is partition granularity.
- 如权利要求2所述的方法,其特征在于,所述基于所述第一用户的账号信息和所述访问需求信息,访问所述第一文件,包括:The method of claim 2, wherein accessing the first file based on the first user's account information and the access requirement information includes:向文件路径鉴权模块发送鉴权请求,所述鉴权请求包括鉴权信息,所述鉴权信息用于指示所述第一用户、所述第一文件的文件路径和所述第一用户访问所述文件路径的访问操作,所述鉴权信息是基于所述访问需求信息和所述第一用户的账号信息得到的,所述鉴权请求用于触发所述文件路径鉴权模块基于第一权限信息和所述鉴权信息对所述第一用户采用所述访问操作访问所述文件路径的权限进行鉴权,所述文件路径用于指示所述第一文件的存储位置,所述第一权限信息用于指示能够访问所述文件路径的用户身份和访问操作;Send an authentication request to the file path authentication module, where the authentication request includes authentication information, and the authentication information is used to indicate the first user, the file path of the first file, and the first user's access For the access operation of the file path, the authentication information is obtained based on the access requirement information and the account information of the first user, and the authentication request is used to trigger the file path authentication module based on the first user's account information. The permission information and the authentication information authenticate the first user's permission to access the file path using the access operation. The file path is used to indicate the storage location of the first file. The first Permission information is used to indicate the user identity and access operations that can access the file path;接收所述文件路径鉴权模块对所述权限鉴权通过后发送的鉴权响应,所述鉴权响应包括临时凭证,所述临时凭证、所述文件路径和所述访问操作的操作类型在所述对象文件存储系统中对应存储;Receive an authentication response sent by the file path authentication module after passing the permission authentication. The authentication response includes a temporary credential, and the temporary credential, the file path and the operation type of the access operation are in the Corresponding storage in the above object file storage system;基于所述临时凭证、所述访问需求信息和所述文件路径,访问所述第一文件。Based on the temporary credentials, the access requirement information and the file path, the first file is accessed.
- 如权利要求1所述的方法,其特征在于,所述第一文件为结构化数据文件,所述第一文件采用列表形式来存储数据,所述访问需求信息包括所述第一文件的标识信息和第一信息,所述第一信息用于指示所述第一文件的至少一列和/或所述第一文件的至少一行,所述第二粒度为行列粒度;或者,The method of claim 1, wherein the first file is a structured data file, the first file uses a list form to store data, and the access requirement information includes identification information of the first file. and first information, the first information is used to indicate at least one column of the first file and/or at least one row of the first file, and the second granularity is row and column granularity; or,所述第一文件为半结构化数据文件,所述第一文件包括至少一个数据片段,所述数据片段用于保存具有相同业务属性的数据,所述访问需求信息包括所述第一文件的标识信息和所述第一文件中的一个或多个数据片段的标识信息,所述第二粒度为数据片段粒度。The first file is a semi-structured data file, the first file includes at least one data fragment, the data fragment is used to save data with the same business attributes, and the access requirement information includes the identification of the first file. information and identification information of one or more data fragments in the first file, and the second granularity is data fragment granularity.
- 如权利要求4所述的方法,其特征在于,所述基于指定的管理员账号信息和所述访问 需求信息,访问所述第一文件,包括:The method of claim 4, wherein accessing the first file based on the specified administrator account information and the access requirement information includes:向数据过滤引擎发送访问指令,所述访问指令包括所述访问需求信息,所述数据过滤引擎包括所述管理员账号信息,所述访问指令用于触发所述数据过滤引擎基于所述管理员账号信息和所述访问需求信息,访问所述第一文件。Send an access instruction to the data filtering engine. The access instruction includes the access requirement information. The data filtering engine includes the administrator account information. The access instruction is used to trigger the data filtering engine based on the administrator account. information and the access requirement information to access the first file.
- 如权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 5, characterized in that the method further includes:基于第二权限信息、所述第一用户的账号信息和所述访问需求信息,对所述第一用户访问所述内容的权限进行鉴权,所述第二权限信息用于指示能够访问所述内容的用户身份和访问操作;Based on the second permission information, the first user's account information and the access requirement information, the first user's permission to access the content is authenticated, and the second permission information is used to indicate that the first user can access the content. User identity and access operations for content;在对所述第一用户访问所述内容的权限鉴权通过后,基于所述访问需求信息确定访问所述第一文件中的内容的粒度。After the first user's permission to access the content is authenticated, the granularity of accessing the content in the first file is determined based on the access requirement information.
- 如权利要求6所述的方法,其特征在于,所述方法还包括:The method of claim 6, further comprising:基于所述第二权限信息生成第一权限信息,所述第一权限信息用于指示能够访问所述第一文件的文件路径的用户身份和访问操作,所述文件路径用于指示所述第一文件的存储位置。First permission information is generated based on the second permission information. The first permission information is used to indicate the user identity and access operation that can access the file path of the first file. The file path is used to indicate the first The location where the file is stored.
- 一种访问系统,其特征在于,所述系统包括:计算引擎和对象文件存储系统;An access system, characterized in that the system includes: a computing engine and an object file storage system;所述计算引擎,用于接收数据访问请求,所述数据访问请求包括访问需求信息,所述访问需求信息用于指示第一用户需要访问的第一文件中的内容,所述第一文件存储在所述对象文件存储系统中;The computing engine is configured to receive a data access request. The data access request includes access requirement information. The access requirement information is used to indicate the content of the first file that the first user needs to access. The first file is stored in In the object file storage system;所述计算引擎,还用于在基于所述访问需求信息确定访问所述第一文件中的内容的粒度为第一粒度时,基于所述第一用户的账号信息和所述访问需求信息,访问所述第一文件;The computing engine is further configured to, when it is determined based on the access requirement information that the granularity of accessing the content in the first file is the first granularity, based on the account information of the first user and the access requirement information, access the first document;所述计算引擎,还用于在基于所述访问需求信息确定访问所述第一文件中的内容的粒度为第二粒度时,基于指定的管理员账号信息和所述访问需求信息,访问所述第一文件,所述第二粒度小于所述第一粒度。The computing engine is further configured to, when it is determined based on the access requirement information that the granularity of accessing the content in the first file is the second granularity, access the content based on the specified administrator account information and the access requirement information. For a first file, the second granularity is smaller than the first granularity.
- 如权利要求8所述的系统,其特征在于,所述访问需求信息包括所述第一文件的标识信息,所述第一粒度为文件粒度;或者,The system of claim 8, wherein the access requirement information includes identification information of the first file, and the first granularity is file granularity; or,所述访问需求信息包括所述第一文件的标识信息和所述第一文件中的分区的标识信息,所述第一粒度为分区粒度。The access requirement information includes identification information of the first file and identification information of partitions in the first file, and the first granularity is partition granularity.
- 如权利要求9所述的系统,其特征在于,所述系统还包括文件路径鉴权模块,The system of claim 9, wherein the system further includes a file path authentication module,所述计算引擎,用于向所述文件路径鉴权模块发送鉴权请求,所述鉴权请求包括鉴权信息,所述鉴权信息用于指示所述第一用户、所述第一文件的文件路径和所述第一用户访问所述文件路径的访问操作,所述鉴权信息是基于所述访问需求信息和所述第一用户的账号信息得到的,所述文件路径用于指示所述第一文件的存储位置;The computing engine is configured to send an authentication request to the file path authentication module. The authentication request includes authentication information, and the authentication information is used to indicate the first user and the first file. The file path and the access operation of the first user to access the file path. The authentication information is obtained based on the access requirement information and the account information of the first user. The file path is used to indicate the The storage location of the first file;所述文件路径鉴权模块,用于基于第一权限信息和所述鉴权信息对所述第一用户采用所述访问操作访问所述文件路径的权限进行鉴权,所述第一权限信息用于指示能够访问所述文件路径的用户身份和访问操作,在对所述权限鉴权通过后向所述计算引擎发送鉴权响应,所 述鉴权响应包括临时凭证;The file path authentication module is used to authenticate the first user's permission to access the file path using the access operation based on the first permission information and the authentication information. The first permission information is In order to indicate the user identity and access operation that can access the file path, send an authentication response to the computing engine after passing the authentication of the permission, and the authentication response includes a temporary credential;所述对象文件存储系统,用于对应保存所述临时凭证、所述文件路径和所述访问操作的操作类型;The object file storage system is used to correspondingly store the temporary credentials, the file path and the operation type of the access operation;所述计算引擎,还用于基于所述临时凭证、所述访问需求信息和所述文件路径,访问所述第一文件。The computing engine is also configured to access the first file based on the temporary credentials, the access requirement information and the file path.
- 如权利要求8所述的系统,其特征在于,所述第一文件为结构化数据文件,所述第一文件采用列表形式来存储数据,所述访问需求信息包括所述第一文件的标识信息和第一信息,所述第一信息用于指示所述第一文件的至少一列和/或所述第一文件的至少一行,所述第二粒度为行列粒度;或者,The system of claim 8, wherein the first file is a structured data file, the first file uses a list form to store data, and the access requirement information includes identification information of the first file. and first information, the first information is used to indicate at least one column of the first file and/or at least one row of the first file, and the second granularity is row and column granularity; or,所述第一文件为半结构化数据文件,所述第一文件包括至少一个数据片段,所述数据片段用于保存具有相同业务属性的数据,所述访问需求信息包括所述第一文件的标识信息和所述第一文件中的一个或多个数据片段的标识信息,所述第二粒度为数据片段粒度。The first file is a semi-structured data file, the first file includes at least one data fragment, the data fragment is used to save data with the same business attributes, and the access requirement information includes the identification of the first file. information and identification information of one or more data fragments in the first file, and the second granularity is data fragment granularity.
- 如权利要求11所述的系统,其特征在于,所述系统还包括数据过滤引擎,所述数据过滤引擎包括所述管理员账号信息;The system of claim 11, wherein the system further includes a data filtering engine, and the data filtering engine includes the administrator account information;所述计算引擎,用于向所述数据过滤引擎发送访问指令,所述访问指令包括所述第一文件的文件路径和所述访问需求信息,所述文件路径用于指示所述第一文件的存储位置;The computing engine is configured to send an access instruction to the data filtering engine. The access instruction includes the file path of the first file and the access requirement information. The file path is used to indicate the access requirement of the first file. storage location;所述数据过滤引擎,用于基于所述管理员账号信息、所述文件路径和所述访问需求信息,访问所述第一文件。The data filtering engine is used to access the first file based on the administrator account information, the file path and the access requirement information.
- 如权利要求8-12任一项所述的系统,其特征在于,所述计算引擎,还用于:The system according to any one of claims 8-12, characterized in that the calculation engine is also used to:基于第二权限信息、所述第一用户的账号信息和所述访问需求信息,对所述第一用户访问所述内容的权限进行鉴权,所述第二权限信息用于指示能够访问所述内容的用户身份和访问操作;Based on the second permission information, the first user's account information and the access requirement information, the first user's permission to access the content is authenticated, and the second permission information is used to indicate that the first user can access the content. User identity and access operations for content;在对所述第一用户访问所述内容的权限鉴权通过后,基于所述访问需求信息确定访问所述第一文件中的内容的粒度。After the first user's permission to access the content is authenticated, the granularity of accessing the content in the first file is determined based on the access requirement information.
- 如权利要求13所述的系统,其特征在于,所述系统还包括联动权限模块,The system according to claim 13, characterized in that the system further includes a linkage authority module,所述联动权限模块,用于基于所述第二权限信息生成第一权限信息,所述第一权限信息用于指示能够访问所述第一文件的文件路径的用户身份和访问操作,所述文件路径用于指示所述第一文件的存储位置。The linkage permission module is used to generate first permission information based on the second permission information. The first permission information is used to indicate the user identity and access operation that can access the file path of the first file. The path is used to indicate the storage location of the first file.
- 一种访问文件的装置,其特征在于,所述装置包括:A device for accessing files, characterized in that the device includes:通信单元,用于接收数据访问请求,所述数据访问请求包括访问需求信息,所述访问需求信息用于指示第一用户需要访问的第一文件中的内容,所述第一文件存储在对象文件存储系统中;A communication unit configured to receive a data access request, the data access request including access requirement information, the access requirement information being used to indicate the content of the first file that the first user needs to access, the first file being stored in the object file in the storage system;处理单元,用于在基于所述访问需求信息确定访问所述第一文件中的内容的粒度为第一粒度时,基于所述第一用户的账号信息和所述访问需求信息,访问所述第一文件;A processing unit configured to, when it is determined based on the access requirement information that the granularity of accessing the content in the first file is a first granularity, based on the account information of the first user and the access requirement information, access the third a document;所述处理单元,还用于在基于所述访问需求信息确定访问所述第一文件中的内容的粒度为第二粒度时,基于指定的管理员账号信息和所述访问需求信息,访问所述第一文件,所述第二粒度小于所述第一粒度。The processing unit is further configured to, when it is determined based on the access requirement information that the granularity of accessing the content in the first file is the second granularity, access the content based on the specified administrator account information and the access requirement information. For a first file, the second granularity is smaller than the first granularity.
- 如权利要求15所述的装置,其特征在于,所述访问需求信息包括所述第一文件的标识信息,所述第一粒度为文件粒度;或者,The device of claim 15, wherein the access requirement information includes identification information of the first file, and the first granularity is file granularity; or,所述访问需求信息包括所述第一文件的标识信息和所述第一文件中的分区的标识信息,所述第一粒度为分区粒度。The access requirement information includes identification information of the first file and identification information of partitions in the first file, and the first granularity is partition granularity.
- 如权利要求16所述的装置,其特征在于,The device according to claim 16, characterized in that:所述通信单元,还用于向文件路径鉴权模块发送鉴权请求,所述鉴权请求包括鉴权信息,所述鉴权信息用于指示所述第一用户、所述第一文件的文件路径和所述第一用户访问所述文件路径的访问操作,所述鉴权信息是基于所述访问需求信息和所述第一用户的账号信息得到的,所述鉴权请求用于触发所述文件路径鉴权模块基于第一权限信息和所述鉴权信息对所述第一用户采用所述访问操作访问所述文件路径的权限进行鉴权,所述文件路径用于指示所述第一文件的存储位置,所述第一权限信息用于指示能够访问所述文件路径的用户身份和访问操作;The communication unit is also configured to send an authentication request to the file path authentication module, where the authentication request includes authentication information, and the authentication information is used to indicate the first user, the file of the first file path and the access operation of the first user to access the file path. The authentication information is obtained based on the access requirement information and the account information of the first user. The authentication request is used to trigger the The file path authentication module authenticates the first user's permission to access the file path using the access operation based on the first permission information and the authentication information. The file path is used to indicate the first file The storage location, the first permission information is used to indicate the user identity and access operation that can access the file path;所述通信单元,还用于接收所述文件路径鉴权模块对所述权限鉴权通过后发送的鉴权响应,所述鉴权响应包括临时凭证,所述临时凭证、所述文件路径和所述访问操作的操作类型在所述对象文件存储系统中对应存储;The communication unit is also configured to receive an authentication response sent by the file path authentication module after passing the permission authentication. The authentication response includes a temporary credential, the temporary credential, the file path and the The operation type of the access operation is correspondingly stored in the object file storage system;所述处理单元,用于基于所述临时凭证、所述访问需求信息和所述文件路径,访问所述第一文件。The processing unit is configured to access the first file based on the temporary credentials, the access requirement information and the file path.
- 如权利要求15所述的装置,其特征在于,所述第一文件为结构化数据文件,所述第一文件采用列表形式来存储数据,所述访问需求信息包括所述第一文件的标识信息和第一信息,所述第一信息用于指示所述第一文件的至少一列和/或所述第一文件的至少一行,所述第二粒度为行列粒度;或者,The device of claim 15, wherein the first file is a structured data file, the first file uses a list form to store data, and the access requirement information includes identification information of the first file. and first information, the first information is used to indicate at least one column of the first file and/or at least one row of the first file, and the second granularity is row and column granularity; or,所述第一文件为半结构化数据文件,所述第一文件包括至少一个数据片段,所述数据片段用于保存具有相同业务属性的数据,所述访问需求信息包括所述第一文件的标识信息和所述第一文件中的一个或多个数据片段的标识信息,所述第二粒度为数据片段粒度。The first file is a semi-structured data file, the first file includes at least one data fragment, the data fragment is used to save data with the same business attributes, and the access requirement information includes the identification of the first file. information and identification information of one or more data fragments in the first file, and the second granularity is data fragment granularity.
- 如权利要求18所述的装置,其特征在于,The device according to claim 18, characterized in that:所述通信单元,还用于向数据过滤引擎发送访问指令,所述访问指令包括所述访问需求信息,所述数据过滤引擎包括所述管理员账号信息,所述访问指令用于触发所述数据过滤引擎基于所述管理员账号信息和所述访问需求信息,访问所述第一文件。The communication unit is also used to send an access instruction to the data filtering engine. The access instruction includes the access requirement information. The data filtering engine includes the administrator account information. The access instruction is used to trigger the data. The filtering engine accesses the first file based on the administrator account information and the access requirement information.
- 如权利要求15-19任一项所述的装置,其特征在于,所述处理模块,还用于:The device according to any one of claims 15 to 19, characterized in that the processing module is also used to:基于第二权限信息、所述第一用户的账号信息和所述访问需求信息,对所述第一用户访问所述内容的权限进行鉴权,所述第二权限信息用于指示能够访问所述内容的用户身份和访 问操作;Based on the second permission information, the first user's account information and the access requirement information, the first user's permission to access the content is authenticated, and the second permission information is used to indicate that the first user can access the content. User identity and access operations for content;在对所述第一用户访问所述内容的权限鉴权通过后,基于所述访问需求信息确定访问所述第一文件中的内容的粒度。After the first user's permission to access the content is authenticated, the granularity of accessing the content in the first file is determined based on the access requirement information.
- 如权利要求20所述的装置,其特征在于,所述处理模块,还用于:The device according to claim 20, characterized in that the processing module is also used to:基于所述第二权限信息生成第一权限信息,所述第一权限信息用于指示能够访问所述第一文件的文件路径的用户身份和访问操作,所述文件路径用于指示所述第一文件的存储位置。First permission information is generated based on the second permission information. The first permission information is used to indicate the user identity and access operation that can access the file path of the first file. The file path is used to indicate the first The location where the file is stored.
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被计算机执行时,实现如权利要求1-7任一项所述的方法。A computer-readable storage medium on which a computer program is stored, characterized in that when the computer program is executed by a computer, the method according to any one of claims 1-7 is implemented.
- 一种计算机程序产品,其特征在于,所述计算机程序产品包括在计算机可读存储介质中存储的计算机程序,并且所述计算程序通过处理器进行加载来实现如权利要求1-7任一项所述的方法。A computer program product, characterized in that the computer program product includes a computer program stored in a computer-readable storage medium, and the computing program is loaded by a processor to implement any one of claims 1-7 method described.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210264898 | 2022-03-17 | ||
CN202210264898.8 | 2022-03-17 | ||
CN202210511098.1A CN116821921A (en) | 2022-03-17 | 2022-05-11 | Method, device, system and storage medium for accessing file |
CN202210511098.1 | 2022-05-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023173908A1 true WO2023173908A1 (en) | 2023-09-21 |
Family
ID=88022179
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/070167 WO2023173908A1 (en) | 2022-03-17 | 2023-01-03 | Method, apparatus and system for accessing file, and storage medium |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023173908A1 (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170286650A1 (en) * | 2016-03-30 | 2017-10-05 | International Business Machines Corporation | Tiered code obfuscation in a development environment |
CN107895123A (en) * | 2017-11-13 | 2018-04-10 | 医渡云(北京)技术有限公司 | Data access authority control method and device, method for managing user right |
CN108737371A (en) * | 2018-04-08 | 2018-11-02 | 努比亚技术有限公司 | Hive data access control methods, server and computer storage media |
CN110188573A (en) * | 2019-05-27 | 2019-08-30 | 深圳前海微众银行股份有限公司 | Subregion authorization method, device, equipment and computer readable storage medium |
CN112559994A (en) * | 2020-12-25 | 2021-03-26 | 北京百度网讯科技有限公司 | Access control method, device, equipment and storage medium |
CN112651001A (en) * | 2020-12-30 | 2021-04-13 | 中国平安财产保险股份有限公司 | Access request authentication method, device, equipment and readable storage medium |
CN114647825A (en) * | 2020-12-17 | 2022-06-21 | 中移(苏州)软件技术有限公司 | Access right control method, device, electronic equipment and computer storage medium |
-
2023
- 2023-01-03 WO PCT/CN2023/070167 patent/WO2023173908A1/en unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170286650A1 (en) * | 2016-03-30 | 2017-10-05 | International Business Machines Corporation | Tiered code obfuscation in a development environment |
CN107895123A (en) * | 2017-11-13 | 2018-04-10 | 医渡云(北京)技术有限公司 | Data access authority control method and device, method for managing user right |
CN108737371A (en) * | 2018-04-08 | 2018-11-02 | 努比亚技术有限公司 | Hive data access control methods, server and computer storage media |
CN110188573A (en) * | 2019-05-27 | 2019-08-30 | 深圳前海微众银行股份有限公司 | Subregion authorization method, device, equipment and computer readable storage medium |
CN114647825A (en) * | 2020-12-17 | 2022-06-21 | 中移(苏州)软件技术有限公司 | Access right control method, device, electronic equipment and computer storage medium |
CN112559994A (en) * | 2020-12-25 | 2021-03-26 | 北京百度网讯科技有限公司 | Access control method, device, equipment and storage medium |
CN112651001A (en) * | 2020-12-30 | 2021-04-13 | 中国平安财产保险股份有限公司 | Access request authentication method, device, equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11762970B2 (en) | Fine-grained structured data store access using federated identity management | |
CN111698228B (en) | System access authority granting method, device, server and storage medium | |
CN111338766B (en) | Transaction processing method and device, computer equipment and storage medium | |
US10757106B2 (en) | Resource access control method and device | |
US11574070B2 (en) | Application specific schema extensions for a hierarchical data structure | |
US20220053028A1 (en) | Data access policies | |
US11044257B1 (en) | One-time access to protected resources | |
CN110543545B (en) | File management method, device and storage medium based on block chain | |
US20120290592A1 (en) | Federated search apparatus, federated search system, and federated search method | |
CN103067463B (en) | user root authority centralized management system and management method | |
US7548918B2 (en) | Techniques for maintaining consistency for different requestors of files in a database management system | |
US20070011136A1 (en) | Employing an identifier for an account of one domain in another domain to facilitate access of data on shared storage media | |
JP2008524707A (en) | Infrastructure for performing file operations by database server | |
JP2023532959A (en) | A privacy-preserving architecture for permissioned blockchains | |
AU2005317196A1 (en) | Techniques for providing locks for file operations in a database management system | |
TW201439792A (en) | System and method for accessing database | |
JP6578356B2 (en) | Access control for objects with attributes defined for a hierarchically organized domain containing a fixed number of values | |
US11409781B1 (en) | Direct storage loading for adding data to a database | |
WO2024103714A1 (en) | Data processing method and system, apparatus, and related device | |
WO2023173908A1 (en) | Method, apparatus and system for accessing file, and storage medium | |
US10997160B1 (en) | Streaming committed transaction updates to a data store | |
WO2024021417A1 (en) | Data account creation method and apparatus | |
US11106667B1 (en) | Transactional scanning of portions of a database | |
CN111104408A (en) | Data exchange method and device based on map data and storage medium | |
US12034712B2 (en) | Communication between server systems in different network regions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23769419 Country of ref document: EP Kind code of ref document: A1 |