CN112182023B - Big data access control method and device, electronic equipment and storage medium - Google Patents

Big data access control method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112182023B
CN112182023B CN202011025582.0A CN202011025582A CN112182023B CN 112182023 B CN112182023 B CN 112182023B CN 202011025582 A CN202011025582 A CN 202011025582A CN 112182023 B CN112182023 B CN 112182023B
Authority
CN
China
Prior art keywords
data
data processing
access control
determining
processing logic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011025582.0A
Other languages
Chinese (zh)
Other versions
CN112182023A (en
Inventor
文雨
薛涛
张博洋
郑阳
杨纯
张东雪
杜莹莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202011025582.0A priority Critical patent/CN112182023B/en
Publication of CN112182023A publication Critical patent/CN112182023A/en
Application granted granted Critical
Publication of CN112182023B publication Critical patent/CN112182023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/24569Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

Embodiments of the present invention provide a big data access control method, apparatus, electronic device, and storage medium, which can automatically determine a data processing purpose in a data processing logic through a data processing optimizer, and make an access decision based on the data processing purpose, so that not only can safe big data sharing analysis be supported, but also privacy awareness can be provided, and a data protection function is provided for a data provider.

Description

Big data access control method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of big data processing technologies, and in particular, to a big data access control method and apparatus, an electronic device, and a storage medium.
Background
In the big data era, vast amounts of data are collected, stored, and used for a variety of platforms and applications. Therefore, people are required to design a large data processing platform which can drive mass data, is convenient for data analysis and explores data value. Generally, a large data processing platform can access various data sources, support a hybrid data analysis engine such as a Structured Query Language (SQL), machine learning and graph (graph), and effectively process a large amount of data. However, the security and privacy of data is not fully considered when designing such large data platforms. Therefore, the data protection function provided by the platform itself is very limited. For example, because platforms do not have the capability to fine-grained access control and cannot meet complex access control requirements, existing platforms can only allow users to have full access to sensitive raw data, typically in order to meet their data analysis needs, when the user's needs are simply to perform aggregated or statistical queries. Furthermore, in large scale cross-organizational data sharing environments, data is more vulnerable to attacks. In such an environment, a user may perform data correlation analysis on multiple data sources or platforms to further extract sensitive information.
In data management and sharing application scenarios, access control is an important data protection mechanism that can prevent data users from accessing unauthorized sensitive data. A simple access control scheme is that the big data platform directly utilizes the underlying access mechanism: access control functions provided by the underlying data source or operating system. However, such solutions typically do not meet the access control needs of the user: 1) The underlying mechanisms typically do not support fine-grained (attribute-level, record-level, or element-level) access control, which is required by advanced data management applications. For example, file-level access control provided by a Hadoop Distributed File System (HDFS) is often insufficient to meet the fine-grained security requirements of users. 2) The security models and mechanisms of various data sources have heterogeneity, and the heterogeneity means that the problems of incompatible access control characteristics and inconsistent access control functions exist. For example, a user aggregates data of multiple data sources, and the data sources all have sensitive attributes, then data security is determined by the access control mechanism with the worst data protection function (the data access policy of each data source is set by the data provider).
A Purpose-Based Access Control (PBAC) model is used for privacy preserving Access Control. PBAC originates from a conventional Relational Database Management System (RDBMS). PBAC defines specific data usage purposes and these data usage purposes can be expressed in SQL queries, whereby authorizing a particular SQL query is equivalent to authorizing a particular data usage purpose. For example, "shipping" destination means that the query "shipping address for the active order" can be authorized. However, in big data analytics applications, it is difficult to directly link abstract data usage purposes like "shipping" to system-level data processing logic and database operations. For example, in order to analyze sales trends based on a large amount of data, a wide variety of algorithms may be used on the data, including regression analysis, time series analysis, stochastic models, and the like; however, for the purpose of "analyzing sales trends" at an abstract level, it is almost impossible to associate a set of static database operations with it. In addition, in actual practice, the same data operation may occur for multiple data usage purposes. It is difficult for the database engine to automatically recognize the correct purpose of the data operation, and it is also difficult to allow or deny the data operation according to the purpose.
Disclosure of Invention
The embodiment of the invention provides a big data access control method and device, electronic equipment and a storage medium, which are used for overcoming the defects in the prior art.
The embodiment of the invention provides a big data access control method, which comprises the following steps:
after receiving a data access request of a data user, determining a data processing purpose in the data processing logic based on a data processing optimizer;
based on the data processing objective, making an access decision corresponding to the data access request.
According to the big data access control method of one embodiment of the present invention, the determining a data processing purpose in a data processing logic based on a data processing optimizer specifically includes:
in the data processing optimizer, a data processing objective in the data processing logic is determined based on an objective analysis algorithm.
According to an embodiment of the big data access control method, the determining of the data processing purpose in the data processing logic based on the purpose analysis algorithm specifically includes:
and determining data operation purposes in the data processing logic based on the purpose analysis algorithm, and marking the data operation purpose with the highest importance in the data operation purposes as the data processing purpose.
According to the big data access control method of one embodiment of the present invention, the data operation purpose specifically includes: the method comprises a calculation operation purpose, an assistance operation purpose, a retrieval operation purpose, a carrying operation purpose and an output operation purpose, wherein the importance of the calculation operation purpose, the assistance operation purpose, the retrieval operation purpose and the carrying operation purpose is reduced in sequence.
According to the big data access control method of an embodiment of the present invention, the determining the data operation purpose in the data processing logic based on the purpose analysis algorithm specifically includes:
and based on the purpose analysis algorithm, regarding the life cycle of each data object in the data access request as a data operation pipeline, operating each data object in the pipeline by a data operation operator in sequence, and taking the data operation purpose corresponding to the data operation as the data operation purpose in the data processing logic.
According to an embodiment of the present invention, the method for controlling big data access, based on the purpose analysis algorithm, determines a data processing purpose in the data processing logic, and then further includes:
and carrying out consistency detection on the data processing purpose and the data processing purpose allowed by a data provider, and determining data which can be used by each data processing purpose in the data processing logic. An embodiment of the present invention further provides a big data access control device, including:
the data processing purpose determining module is used for determining the data processing purpose in the data processing logic based on the data processing optimizer after receiving the data access request of the data user;
and the access decision determining module is used for making an access decision corresponding to the data access request based on the data processing purpose.
According to the big data access control device of an embodiment of the present invention, the data processing purpose determining module is specifically configured to:
in the data processing optimizer, a data processing objective in the data processing logic is determined based on an objective analysis algorithm.
The embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the program, the steps of the big data access control method as described in any of the above are implemented.
Embodiments of the present invention also provide a non-transitory computer readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the big data access control method as described in any one of the above.
According to the big data access control method and device, the electronic equipment and the storage medium, the data processing purpose in the data processing logic can be automatically determined through the data processing optimizer, and the access decision can be made according to the data processing purpose, so that not only can safe big data sharing analysis be supported, but also privacy awareness can be achieved, and a data protection function is provided for a data provider.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a big data access control method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a big data access control device according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a big data access control method provided in an embodiment of the present invention, and as shown in fig. 1, the big data access control method includes:
s1, after receiving a data access request of a data user, determining a data processing purpose in data processing logic based on a data processing optimizer;
and S2, making an access decision corresponding to the data access request based on the data processing purpose.
Specifically, in the big data access control method provided in the embodiment of the present invention, an execution subject is a big data processing platform, and the platform may specifically be a SQL-based general big data processing platform, or a Spark-based big data processing platform. The big data access control method relies on an optimizer of a big data processing platform, and the data processing optimizer is taken as an example for explanation.
Firstly, step S1 is executed, a data access request of a data user is received, and after the data access request is received, a data processing purpose in the data processing logic is determined according to a data processing optimizer. The data processing purpose is a data processing purpose of a data provider, is used for representing privacy awareness of the data provider, and can provide a data protection function.
And then executing step S2, and making an access decision corresponding to the data access request in step S1 according to the data processing purpose.
According to the big data access control method provided by the embodiment of the invention, the data processing purpose in the data processing logic can be automatically determined through the data processing optimizer, and the access decision can be made according to the data processing purpose, so that not only can safe big data sharing analysis be supported, but also privacy awareness can be realized, and a data protection function is provided for a data provider.
On the basis of the foregoing embodiment, the method for controlling big data access provided in the embodiment of the present invention, which determines a data processing purpose in a data processing logic based on a data processing optimizer, specifically includes:
in the data processing optimizer, a data processing objective in the data processing logic is determined based on an objective analysis algorithm.
Specifically, in the embodiment of the present invention, in the data processing optimizer, the analysis of the data processing logic is specifically realized through a purpose analysis algorithm, so as to determine the data processing purpose in the data processing logic.
On the basis of the foregoing embodiment, the big data access control method provided in the embodiment of the present invention, where the determining a data processing purpose in the data processing logic based on the purpose analysis algorithm specifically includes:
and determining data operation purposes in the data processing logic based on the purpose analysis algorithm, and marking the data operation purpose with the highest importance in the data operation purposes as the data processing purpose.
Specifically, in the embodiment of the present invention, when determining the data processing purpose, the data processing purpose in the data processing logic is determined by using a purpose analysis algorithm, and then the data operation purpose with the highest importance among the data operation purposes is marked as the data processing purpose.
The big data Access Control method provided in the embodiment of the present invention may be understood as introducing concepts of a data processing Purpose and a data operation Purpose into a conventional goal-Based Access Control (PBAC) model to obtain a goal-aware Access Control (PAAC) model. The purpose of data processing can be automatically identified from the data processing logic by the PAAC model, and access decisions can be made accordingly. And applying the PAAC model to an optimizer of a large data processing platform, and adding an access control execution phase between the analysis phase and the optimization phase of the optimization pipeline. Data manipulation purposes (DOP) define the purpose of each data manipulation in the data processing logic from which a DOPs sequence is extracted. The data processing purpose of the data processing logic is identified as the most important DOP in the sequence. The PAAC policy specifies which data processing purposes the data access agent (user or application) may perform on which data.
On the basis of the foregoing embodiment, the big data access control method provided in the embodiment of the present invention specifically includes: the method comprises the steps of calculating an operation purpose, an assisting operation purpose, a retrieval operation purpose, a carrying operation purpose and an output operation purpose, wherein the priorities of the calculating operation purpose, the assisting operation purpose, the retrieving operation purpose and the carrying operation purpose are sequentially reduced.
In particular, data manipulation purposes are defined by the data management and sharing platform and provided to data providers, who are expected to use them to specify access control policies. At the same time, the platform also needs to automatically identify the predefined purpose from the query plan in order to enforce the corresponding access policy. Next, five examples of data manipulation purposes are used in the present invention to illustrate the functionality of PAAC:
1) Search for the manipulation purpose (DOP-R). If the data object is retrieved from the data source or processed by a selection (filtering) function, the data manipulation destination is DOP-R. DOP-R is a prerequisite for many other purposes, since the data must pass through DOP-R before the data is used.
2) Calculate the purpose of the operation (DOP-C). If the data object is an operand of a computational operation and the operation transforms the data object, then the data operation is destined for DOP-C. For example, the sum function in SQL accepts an operand and performs an aggregate operation on this operand, so the corresponding data operation destination is DOP-C.
3) For facilitating handling purposes (DOP-A). If a data object relates to a data operation but its value has not changed, then the data operation is intended to be DOP-A. For example, a "groupBy" operation in SQL would perform a grouping operation on data based on data object a, but the value of a would not change.
4) For carrying manipulation purposes (DOP-Ca). If the data object is carried during a certain operation but is not involved in the operation, the data operation is aimed at DOP-Ca. For example, the "groupBy" operation in 3) is based on data object a, but other data objects are carried by the groupBy operation. Note that: DOP-Ca does not affect data manipulation or access decisions, so omitting DOP-Ca when implementing access control does not affect data security.
5) Outputting the operation purpose (DOP-O). When the data object is returned to the user or the application program, the data operation purpose is DOP-O. Furthermore, in big data applications, whether DOP-O is allowed or not is often dependent on previous operations on the data object. For example, DOP-R-O may be rejected, while DOP-R-C-O may be allowed.
Among the five data manipulation purposes, DOP-C has the highest priority, DOP-A and DOP-R have lower priorities, and DOP-Ca has a priority of NULL. It should be noted that the method of processing DOP-O herein is different from other uses in the data manipulation sequence. Data manipulation purposes are represented by doublets: { DPP, DOP-O | NULL }, where DPP denotes a data manipulation destination having the highest priority in a sequence of data manipulation destinations, and "DOP-O | NULL" denotes whether a data object is output to a user after a series of data manipulations.
On the basis of the foregoing embodiment, the big data access control method provided in the embodiment of the present invention, where the determining a data operation purpose in the data processing logic based on the purpose analysis algorithm specifically includes:
and based on the objective analysis algorithm, regarding the life cycle of each data object in the data access request as a data operation pipeline, operating each data object in the pipeline by a data operation operator in sequence, and taking the data operation objective corresponding to the data operation as the data operation objective in the data processing logic.
Specifically, in the embodiment of the present invention, the data processing logic is checked by the purpose analysis algorithm, and the data operation purpose and the data processing purpose are automatically extracted. The target analysis algorithm treats the life cycle of the data object in the application program as a pipeline of data operation. The operation pipeline can be directly obtained here, the application code contains all data operation information, and the logic plan in the data processing optimizer also contains all data operation information. Each data object in turn operates on the data in the pipeline. For example, although different data objects in an SQL operation may have the same data operation pipeline, the data operation destinations of different data objects may be different when the different data objects undergo the same data operation, so that the different data objects respectively have their own data operation destination sequences.
On the basis of the foregoing embodiment, the big data access control method provided in the embodiment of the present invention, where the purpose-based analysis algorithm determines a data processing purpose in the data processing logic, further includes:
and carrying out consistency detection on the data processing purpose and the data processing purpose allowed by the data provider, and determining the data which can be used by each data processing purpose in the data processing logic.
Specifically, in the embodiment of the present invention, the adopted data processing optimizer includes not only four stages of conversion, analysis, optimization, and materialization, but also a stage added between the analysis stage and the optimization stage, that is, a safety logic plan generation stage. The added stage implements PAAC by comparing the expected goals extracted from the logic plan with all allowed goals. In this way, the analyzed logical plan can be converted into a safe logical plan (which conforms to the PAAC policy specified by the compliance data owner); subsequently, the security logic plan may be further optimized to eliminate the potential overhead incurred by access control. The embodiment may also provide a predefined secure logic template to a Structured stream engine (Structured Streaming) through the secure logic plan (which would otherwise have the analyzed logic plan as the predefined logic template).
In general, the access control model contains the following core components { subject, action, object, [ context ], allow | dent }. Different access control models specify these components in different ways, e.g., the ABAC policy specifies authorization by combining attributes. In a destination-aware access control (PAAC) model, an object is any data object that can be referenced in a structured data model. In a common big data processing platform, an object may be a data object from any structured data source that the platform has access to, including, 1) a table in a relational database, 2) structured files that are treated as tables in a distributed storage system (e.g., HDFS) or a local file system, 3) data streams that are treated as tables, and 4) other data that is explicitly defined for a column structure. The protected fine-grained data objects may be column-level, row-level, and cell-level. The data owner may define the protected object using any attribute, such as owner or source.
The initiated query or data analysis algorithm is represented internally within the system by a data processing plan, and the system typically organizes the data processing plan in a query tree. The leaf nodes in the query tree encapsulate data objects, paths from the leaf nodes to the root nodes represent data processing logics of the data objects, and data operation purposes of the data objects are sequentially connected along the paths. Ideally, the data owner can specify all acceptable (or denied) data manipulation purpose modes, such as: "data object" DOP-R-C-O ", wherein the word indicates wildcard; meanwhile, the aim is realized that the big data platform can execute the strategy by the following modes: the specified patterns compare the patterns corresponding to each path identified from the query tree. However, in actual use, the data owners may not wish to define every permitted or denied pattern, or they may not have the ability to specify a pattern, and thus availability is a concern in actual use. In embodiments of the present invention, a simplified model is presented that allows key objectives in each data manipulation logic to be selected as data usage objectives for the dominant data manipulation logic. Consider that: 1) All data manipulation purposes are applied to the data object in turn, and only part of the data manipulation can modify the data, e.g., DOP-C; 2) The overall modifications made to the data object are no inferior to the most important modifications in the sequence of operations; 3) From a data protection perspective, the data owner may specify how much a data object is modified before being output, or whether the data object can be directly output. In the embodiment of the invention, a new attribute, namely priority, is added for each predefined data operation purpose. This priority represents the extent to which data operations modify data.
The overhead and scalability of the embodiments were tested with Spark as a benchmark. The identification of data manipulation/processing purposes by the platform was evaluated by case studies of 5 data sources and 4 structured data analysis engines.
(1) Experimental setup
Software and hardware configuration: the experiment was performed on a 7-node cluster (1 master node and 6 worker nodes). Each node is provided with 32Intel Xeon CPUs E5-2630 v3@2.40GHz,130GB memory, 4TB disk capacity and 64-bit Cent OS. The data (backup number 2) was stored with HDFS (v.2.6.0). The examples were constructed on Spark v.2.4.0.
Data set and baseline: the efficiency and scalability of the platform were tested with a TPC-DS benchmark-this benchmark covers various query types in decision support systems and has been used for Spark SQL performance testing in Spark 2.2 +. Furthermore, the ML Pipelines engine may be selected to show case studies in which the common Iris and big data bench are used.
(2) Effect of the experiment
The following retail data sets are generated using the TPC-DS data generator: 2GB, 4GB, 8GB, 16GB, 32GB, 64GB and 128GB, and these data sets were used to evaluate the efficiency of the platform. From the TPC-DS packet, the following queries are selected that contain various data operations and that the data processing logic is sufficiently complex: query02, query27, query35, and Query93.Query93, query27, and Query02 have the fewest, fewer, and most expressions (specifying the purpose of the operation), respectively, and Query35 has three sub-queries. These queries contain the following data processing purposes: { DOP-R, DOP-O }, { DOP-C, DOP-O }, and { DOP-A, NULL }. And on the basis of Spark, customizing an access control strategy according to the operation information of the query to ensure the consistency of the size of the data set. For example, if the WHERE clause condition in a query is that the value of column 5 of a certain data table is equal to 2, and that column is retrieved and output, then the policy may state as: the allowable value for { DOP-R, DOP-O } is greater than 0 for data processing purposes.
The overhead introduced is as follows: 8.44% (2 GB), 6.40% (4 GB), 4.69% (8 GB), 3.52% (16 GB), 2.17% (32 GB), 1.65% (64 GB) and 0.89% (128 GB). Obviously, all relative overheads are below 10%, and the overheads gradually decrease with increasing data set. This is because the query execution time becomes longer as the data set increases, while the time to generate the safety logic plan remains stable. The results show that this embodiment has a high degree of scalability, can be used in large data sharing scenarios, and can evaluate larger data sets.
In the embodiment of the invention, a K-means clustering algorithm on an ML pipeline engine is selected to show individual case research. The square Sum of the inner Set Error (WSSSE) is used as a measure. 1) The access control policy for the Iris dataset is: the value in column 1 is greater than 5.5, without limitation to the use of the other columns. Running the embodiment, automatically executing the strategy, and obtaining WSSSE sets under different K values (2, 3,4,5 and 6); manually deleting corresponding data in the data set, and obtaining a WSSSE set according to the same K value; running Spark without performing a policy, based on the same value of K, results in a set of WSSSE as the baseline for evaluation. 2) The test was performed using a 4GB data set generated by BigDataBench, in a manner similar to that of 1). Experimental results show that embodiments can constrain data usage of ML algorithms. Applying access control to the ML engine is crucial because when data providers share their data, they may customize policies to restrict sensitive data for the ML algorithm (the ML algorithm may indirectly expose more private information about the data owner).
Fig. 2 is a schematic structural diagram of a big data access control device provided in an embodiment of the present invention, and as shown in fig. 2, the big data access control device includes: a data processing purpose determining module 21 and an access decision determining module 22. Wherein the content of the first and second substances,
the data processing purpose determining module 21 is used for determining a data processing purpose in the data processing logic based on the data processing optimizer after receiving a data access request of a data consumer;
the access decision determination module 22 is configured to make an access decision corresponding to the data access request based on the data processing purpose.
Specifically, the functions of the modules in the big data access control device provided in the embodiment of the present invention correspond to the processing flows of the steps in the embodiments of the method class one to one, and the achieved effects are also consistent.
On the basis of the foregoing embodiment, in the big data access control apparatus provided in the embodiment of the present invention, the data processing purpose determining module is specifically configured to:
in the data processing optimizer, a data processing objective in the data processing logic is determined based on an objective analysis algorithm.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor) 310, a communication Interface (communication Interface) 320, a memory (memory) 330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. Processor 310 may call logic instructions in memory 330 to perform a big data access control method comprising: after receiving a data access request of a data user, determining a data processing purpose in the data processing logic based on a data processing optimizer; based on the data processing objective, making an access decision corresponding to the data access request.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the big data access control method provided by the above-mentioned method embodiments, where the method includes: after receiving a data access request of a data user, determining a data processing purpose in the data processing logic based on a data processing optimizer; based on the data processing objective, making an access decision corresponding to the data access request.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the big data access control method provided by the foregoing embodiments, where the method includes: after receiving a data access request of a data user, determining a data processing purpose in the data processing logic based on a data processing optimizer; based on the data processing purpose, making an access decision corresponding to the data access request.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (5)

1. A big data access control method is characterized by comprising the following steps:
after receiving a data access request of a data user, determining a data processing purpose in the data processing logic based on a data processing optimizer;
making an access decision corresponding to the data access request based on the data processing purpose;
determining, in the data processing optimizer, a data processing objective in the data processing logic based on an objective analysis algorithm;
the determining a data processing purpose in the data processing logic based on the purpose analysis algorithm specifically includes:
determining data operation purposes in the data processing logic based on the purpose analysis algorithm, and marking the data operation purpose with the highest importance in the data operation purposes as the data processing purpose;
the data operation purpose specifically includes: the method comprises the steps of calculating an operation purpose, an assisting operation purpose, a retrieval operation purpose, a carrying operation purpose and an output operation purpose, wherein the importance of the calculating operation purpose, the assisting operation purpose, the retrieving operation purpose and the carrying operation purpose is reduced in sequence;
the determining a data operation purpose in the data processing logic based on the purpose analysis algorithm specifically includes:
and based on the purpose analysis algorithm, regarding the life cycle of each data object in the data access request as a data operation pipeline, operating each data object in the pipeline by a data operation operator in sequence, and taking the data operation purpose corresponding to the data operation as the data operation purpose in the data processing logic.
2. The big data access control method according to claim 1, wherein the purpose-based analysis algorithm determines a purpose of data processing in the data processing logic, and thereafter further comprises:
and carrying out consistency detection on the data processing purpose and the data processing purpose allowed by a data provider, and determining the data which can be used by each data processing purpose in the data processing logic.
3. A big data access control apparatus, comprising:
the data processing purpose determining module is used for determining the data processing purpose in the data processing logic based on the data processing optimizer after receiving the data access request of the data user;
an access decision determination module for making an access decision corresponding to the data access request based on the data processing purpose;
the data processing purpose determining module is specifically configured to:
determining, in the data processing optimizer, a data processing objective in the data processing logic based on an objective analysis algorithm;
determining data operation purposes in the data processing logic based on the purpose analysis algorithm, and marking the data operation purpose with the highest importance in the data operation purposes as the data processing purpose;
the data operation purpose specifically includes: the method comprises the steps of calculating an operation purpose, an assisting operation purpose, a retrieval operation purpose, a carrying operation purpose and an output operation purpose, wherein the importance of the calculating operation purpose, the assisting operation purpose, the retrieving operation purpose and the carrying operation purpose is reduced in sequence;
and based on the objective analysis algorithm, regarding the life cycle of each data object in the data access request as a data operation pipeline, operating each data object in the pipeline by a data operation operator in sequence, and taking the data operation objective corresponding to the data operation as the data operation objective in the data processing logic.
4. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the big data access control method according to any of claims 1 to 2 when executing the program.
5. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the big data access control method according to any of claims 1 to 2.
CN202011025582.0A 2020-09-25 2020-09-25 Big data access control method and device, electronic equipment and storage medium Active CN112182023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011025582.0A CN112182023B (en) 2020-09-25 2020-09-25 Big data access control method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011025582.0A CN112182023B (en) 2020-09-25 2020-09-25 Big data access control method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112182023A CN112182023A (en) 2021-01-05
CN112182023B true CN112182023B (en) 2023-04-11

Family

ID=73944006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011025582.0A Active CN112182023B (en) 2020-09-25 2020-09-25 Big data access control method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112182023B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10254751A (en) * 1997-03-14 1998-09-25 Hitachi Inf Syst Ltd Database managing system
CN106407832A (en) * 2015-08-03 2017-02-15 阿里巴巴集团控股有限公司 A method and an apparatus for data access control
CN106610991A (en) * 2015-10-23 2017-05-03 北京国双科技有限公司 Data processing method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11281667B2 (en) * 2015-01-08 2022-03-22 Microsoft Technology Licensing, Llc Distributed storage and distributed processing policy enforcement utilizing virtual identifiers

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10254751A (en) * 1997-03-14 1998-09-25 Hitachi Inf Syst Ltd Database managing system
CN106407832A (en) * 2015-08-03 2017-02-15 阿里巴巴集团控股有限公司 A method and an apparatus for data access control
CN106610991A (en) * 2015-10-23 2017-05-03 北京国双科技有限公司 Data processing method and device

Also Published As

Publication number Publication date
CN112182023A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
US11625501B2 (en) Masking sensitive information in records of filtered accesses to unstructured data
US8250048B2 (en) Access control for graph data
US10438008B2 (en) Row level security
US9965641B2 (en) Policy-based data-centric access control in a sorted, distributed key-value data store
US10650032B1 (en) Filtering pipeline optimizations for unstructured data
US10936478B2 (en) Fast change impact analysis tool for large-scale software systems
US11853329B2 (en) Metadata classification
JP2018506775A (en) Identifying join relationships based on transaction access patterns
US11687512B2 (en) Index suggestion engine for relational databases
US20180365291A1 (en) Optimizations for a behavior analysis engine
KR20110037889A (en) Mutual search and alert between structured and unstructured data sources
US20220043927A1 (en) Dynamic monitoring of movement of data
US20180210910A1 (en) Relational database instruction validation
US11636124B1 (en) Integrating query optimization with machine learning model prediction
US20230153455A1 (en) Query-based database redaction
CN112182023B (en) Big data access control method and device, electronic equipment and storage medium
US11657069B1 (en) Dynamic compilation of machine learning models based on hardware configurations
CN115664785A (en) Big data platform data desensitization system
CN114969819A (en) Data asset risk discovery method and device
CN106383855A (en) Static authentication method capable of aiming at SQL (Structured Query Language) analytical query
Papanikolaou Distributed algorithms for skyline computation using apache spark
US11868496B1 (en) Nested row access policies
CN108932258A (en) Data directory processing method and processing device
Safaee et al. StreamFilter: a framework for distributed processing of range queries over streaming data with fine-grained access control
Chapman et al. Supporting Better Insights of Data Science Pipelines with Fine-grained Provenance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant