CN112639786B - Intelligent landmark - Google Patents

Intelligent landmark Download PDF

Info

Publication number
CN112639786B
CN112639786B CN201880097090.7A CN201880097090A CN112639786B CN 112639786 B CN112639786 B CN 112639786B CN 201880097090 A CN201880097090 A CN 201880097090A CN 112639786 B CN112639786 B CN 112639786B
Authority
CN
China
Prior art keywords
data field
derived
data
level
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201880097090.7A
Other languages
Chinese (zh)
Other versions
CN112639786A (en
Inventor
李东
朱怀宇
陈璟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Didi Infinity Technology and Development Co Ltd
Original Assignee
Beijing Didi Infinity Technology and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/036,865 external-priority patent/US10922430B2/en
Application filed by Beijing Didi Infinity Technology and Development Co Ltd filed Critical Beijing Didi Infinity Technology and Development Co Ltd
Publication of CN112639786A publication Critical patent/CN112639786A/en
Application granted granted Critical
Publication of CN112639786B publication Critical patent/CN112639786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

Systems and methods of data security classification are provided. An exemplary computer-implementable method for data security classification may include: receiving a request to access a query data field; searching the query data field from a security level table; in response to searching the query data field from the security level table, obtaining a security level corresponding to the query data field from the security level table; and responsive to not searching for the query data field from the security level table, determining a security level corresponding to the query data field based at least on a lineage tree and the security level table. The lineage tree can trace the query data field back to one or more source data fields, and the security data level table can include one or more security levels corresponding to the one or more source data fields.

Description

Intelligent landmark
RELATED APPLICATIONS
The present application claims the benefit of U.S. non-provisional application No. 16/036,865 entitled "System and method for data Security ranking (SYSTEM AND Method for Data Security Grading)", filed on 7.7.16, 2018, the contents of which are incorporated herein by reference in their entirety.
Technical Field
The present disclosure relates generally to methods and apparatus for data security classification.
Background
Data security is critical for most large-scale online operations where new data can be continuously generated. As the amount of data expands rapidly, it becomes extremely challenging to assign an appropriate security level to the data field. Conventional rule-based approaches are inadequate to handle the ever-increasing new data sets generated daily. Among many other challenges, existing tables or data fields have names created by different users and are often not intuitive with respect to their content. Further, the contents of the data set may change from time to time, which requires overwriting the created rules. Thus, current technology cannot timely rank data fields to an appropriate security level, which exposes online data to significant risk.
Disclosure of Invention
Various embodiments of the present disclosure may include systems, methods, and non-transitory computer-readable media for data security ranking. According to one aspect, a computer-implemented exemplary method for data security classification may include: receiving a request to access a query data field; searching the query data field from a security level table; in response to searching the query data field from the security level table, obtaining a security level corresponding to the query data field from the security level table; and responsive to not searching for the query data field from the security level table, determining a security level corresponding to the query data field based at least on a lineage tree and the security level table. The lineage tree can trace the query data field back to one or more source data fields, and the security data level table can include one or more security levels corresponding to the one or more source data fields.
In some embodiments, the lineage tree can trace back a derived data field to one or more first level parent data fields, each first level parent data field being the source data field or a first level derived data field, and the derived data field being derived from the one or more first level parent data fields based on a derived function; for the first-level derived data field, the lineage tree can trace the first-level derived data field to one or more second-level parent data fields, each second-level parent data field being the source data field or a second-level derived data field, and the first-level derived data field being derived from the one or more second-level parent data fields based on another derived function; the trace back may be replicated on any derived data field until one or more of the source data fields are traced back; and the lineage tree includes the derivative function.
In some embodiments, determining the security level corresponding to the query data field based at least on the lineage tree and the security level table may include: for each derived data field of any level, obtaining a security level of the derived data field based at least on (1) the derived function of the derived data field and (2) the security level of the first level parent data field of the derived data field, and determining that one of the derived data fields corresponds to the query data field, and using the determined security level of the derived data field as the security level of the query data field.
In some embodiments, the security level of the derived data field obtained is the highest of the security levels of the first level parent data field of the derived data field.
According to another aspect, a computer-implementable method for data security classification may comprise: classifying one or more source data fields in a data space according to one or more rules into one or more security levels, the data space further comprising one or more derivative data fields, each derivative data field containing data derived from one or more parent data fields, wherein the parent data field is the source data field or another derivative data field; for a query data field, recording an SQL (structured query language) statement for generating the query data field; analyzing the SQL statement; constructing a lineage tree based on the parsed SQL statement, the lineage tree tracing the query data field back to one or more parent data fields; and determining a security level of the query data field based at least on the lineage tree and one or more security levels of the one or more parent data fields of the query data field.
In some embodiments, determining the security level of the query data field based at least on the lineage tree and the one or more security levels of the one or more parent data fields of the query data field may include: (1) Determining, based on one or more corresponding nth derivative functions and the security levels of the one or more source data fields, one or more nth security levels corresponding to one or more nth derivative data fields derived from the one or more source data fields in the lineage tree; (2) Determining one or more nth-1 security levels corresponding to one or more nth-1 derived data fields derived from the one or more source data fields and/or the nth derived data fields in the lineage tree based on one or more corresponding nth-1 derived functions and security levels of the one or more source data fields and/or the one or more nth security levels; and repeating step (2) in the lineage tree toward a query data field to determine the security level of the query data field.
In some embodiments, parsing the SQL statement may include converting the SQL statement into an abstract syntax tree, and constructing the lineage tree based on the parsed SQL statement may include: traversing the abstract syntax tree to identify a derivative relationship between the query data field and one or more of the source data field and/or derivative data field.
In some embodiments, the security level of the query data field and the derived data field may be determined based on a direct parent data field.
According to another aspect, a system for data security classification may include a processor configured to: receiving a request to access a query data field; searching the query data field from a security level table; in response to searching the query data field from the security level table, obtaining a security level corresponding to the query data field from the security level table; and responsive to not searching for the query data field from the security level table, determining a security level corresponding to the query data field based at least on a lineage tree and the security level table. The lineage tree can trace back the query data field to one or more source data fields, and the security data level table can include one or more security levels corresponding to the one or more source data fields.
According to another aspect, a system for data security classification may include a processor configured to: classifying one or more source data fields in a data space according to one or more rules into one or more security levels, the data space further comprising one or more derivative data fields, each derivative data field containing data derived from one or more parent data fields, wherein the parent data field is the source data field or another derivative data field; for a query data field, recording an SQL (structured query language) statement for generating the query data field; analyzing the SQL statement; constructing a lineage tree based on the parsed SQL statement, the lineage tree tracing the query data field to one or more parent data fields; and determining a security level of the query data field based at least on the lineage tree and one or more security levels of the one or more parent data fields of the query data field.
According to another aspect, a non-transitory computer-readable storage medium coupled to a processor may include instructions that, when executed by the processor, cause the processor to perform a method for data security classification. The method may include: receiving a request to access a query data field; searching the query data field from a security level table; in response to searching the query data field from the security level table, obtaining a security level corresponding to the query data field from the security level table; and responsive to not searching for the query data field from the security level table, determining a security level corresponding to the query data field based at least on a lineage tree and the security level table. The lineage tree can trace back the query data field to one or more source data fields, and the security data level table can include one or more security levels corresponding to the one or more source data fields.
According to another aspect, a non-transitory computer-readable storage medium coupled to a processor may include instructions that, when executed by the processor, cause the processor to perform a method for data security classification. The method may include: classifying one or more source data fields in a data space according to one or more rules into one or more security levels, the data space further comprising one or more derivative data fields, each derivative data field containing data derived from one or more parent data fields, wherein the parent data field is the source data field or another derivative data field; for a query data field, recording an SQL (structured query language) statement for generating the query data field; analyzing the SQL statement; constructing a lineage tree based on the parsed SQL statement, the lineage tree tracing the query data field to one or more parent data fields; and determining a security level of the query data field based at least on the lineage tree and one or more security levels of the one or more parent data fields of the query data field.
According to another aspect, a computer-implementable method for data security classification may comprise: (1) Receiving a query associated with a query data field, and (2) determining a security level of the query data field based on one or more respective security levels of one or more directly superior data fields of the query data field, and deriving a derivation function of data in the query data field from data in the one or more directly superior data fields.
In some embodiments, step (2) may comprise: (3) Applying a security level associated with each of the one or more directly upper level data fields that are source data fields to the determination of step (2); and (4) repeating step (2) for each of the one or more directly upper data fields that are not source data fields, taking the directly upper data fields as query data fields until only source data fields are obtained for the directly upper data fields, and applying step (3).
These and other features of the systems, methods, and non-transitory computer readable media disclosed herein, as well as methods of operation and function of related elements of structure and combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the invention.
Drawings
Certain features of various embodiments of the inventive technique are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present technology will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
FIG. 1 illustrates an exemplary environment for data security classification in accordance with various embodiments.
FIG. 2 illustrates an exemplary system for data security classification in accordance with various embodiments.
FIG. 3A illustrates an exemplary lineage tree for data security ranking according to various embodiments.
FIG. 3B illustrates another exemplary lineage tree for data security ranking according to various embodiments.
FIG. 4A illustrates a flowchart of an exemplary method for data security classification in accordance with various embodiments.
FIG. 4B illustrates a flowchart of another exemplary method for data security classification in accordance with various embodiments.
FIG. 4C illustrates a flowchart of another exemplary method for data security classification in accordance with various embodiments.
FIG. 5 illustrates a block diagram of an exemplary computer system in which any of the embodiments described herein may be implemented.
Detailed Description
Existing rule-based data security classification methods are inadequate to provide protection for large amounts of data that may be altered. Traditional rules, which are strictly based on the characteristics or properties of data fields, often lose their effectiveness when the data content is altered. Further, the table or field names typically invoked in the rules may be incorrect or insufficient to capture the corresponding data content. Thus, existing data security classification techniques require that rules be continually updated, but still put stored data at potential risk.
The disclosed system and method for data security classification at least alleviates the above-mentioned disadvantages in the prior art. In various embodiments, some derived data sets including derived data fields may be created from a number of source data sets including source data fields, and more derived data sets may be created from one or more existing derived data sets and/or source data sets. An exemplary data space may include thousands of source data fields and millions of descendent data fields. To effectively rank these derived data fields, the disclosed methods may incorporate a data lineage into the ranking process, where the data lineage may trace each derived data field back to one or more source data fields of a determined security level, either directly or via one or more intermediate derived data fields. With the disclosed method, security levels for source data field hierarchies can be automatically propagated through intermediate descendent data fields in various levels according to their descendent relationships to reach any descendent data fields queried, and can be dynamically adjusted in response to any change in descendent relationships or data content. Thus, the disclosed method avoids constantly creating traditional grading rules for newly added or newly changed derived data fields, and can prevent erroneous grading that depends on table or field names. Overall, data security can be significantly improved.
FIG. 1 illustrates an exemplary environment 100 for data security classification in accordance with various embodiments. As shown in FIG. 1, an exemplary system 100 may include at least one computing system 102 that includes one or more processors 104 and memory 106. Memory 106 may be non-transitory and computer readable. The memory 106 may store instructions that, when executed by the one or more processors 104, cause the one or more processors 104 to perform various operations described herein. The system 102 may be implemented as or on various devices such as a mobile phone, tablet, server, computer, wearable device (smart watch), and the like. The system 102 may be installed with suitable software (e.g., data transfer programs, etc.) and/or hardware (e.g., wired connections, wireless connections, etc.) to access other devices of the system 100.
The system 100 may include one or more data stores (e.g., data store 108) and one or more computing devices (e.g., computing device 109) accessible to the system 102. In some embodiments, the system 102 may be configured to obtain data from a data store 108 (e.g., cloud database) and/or a computing device 109 (e.g., server, mobile phone, vehicle computer).
The system 100 may further include one or more computing devices (e.g., computing device 107) coupled to the system 102. In some embodiments, the system 107 may be implemented as a server, mobile phone, vehicle computer, tablet, wearable device (smart watch), or the like.
In some embodiments, one or more of system 102 and computing device (e.g., computing device 109) may be integrated in a single device or system. Alternatively, the system 102 and one or more computing devices may operate as separate apparatuses. The one or more data stores (e.g., data store 108) may be anywhere accessible to system 102, e.g., in memory 106, in computing device 109, in another device coupled to system 102 (e.g., a network storage device) or another storage location (e.g., a cloud-based storage system, a network file system, etc.), and so forth. The system 102 may be implemented as a single system or as multiple systems coupled to one another. The data space comprising the various source data sets and the derivative data sets may be provided by a single system or distributed across multiple systems. In general, system 102, computing device 109, data store 108, and system 107 can be capable of communicating with each other over one or more wired or wireless networks (e.g., the Internet) that conduct data communications.
FIG. 2 illustrates an exemplary system 200 for data security classification in accordance with various embodiments. The operations shown in fig. 2 and presented below are intended to be illustrative. In various embodiments, system 102 may be configured to implement a data space (e.g., a data warehouse). The data space may include data collected from systems external to the data warehouse (e.g., from data store 108 or computing device 109) and organized according to source data fields (e.g., employee names, salaries, or other data fields stored in a table). In some embodiments, the data fields may be considered as columns in a data table storing various data entries. The system 102 may be configured to implement the various data security ranking steps and methods described herein.
In various embodiments, computing device 107 may query system 102 for a security level of a certain data or a certain data field in the data space. Alternatively, computing device 107 may query system 102 for certain data in the data space. The system 102 may authenticate a current user of the computing device 107, for example, based on the login information, and determine whether the authentication of the current user meets a security level of the queried data. Correspondingly, the system 102 may return data or deny access and return the corresponding results.
In some embodiments, system 102 may receive a request to access a certain data field ("query data field"). The data fields may be associated with categories of data items of a data space (e.g., a data warehouse) stored in the memory 106 and/or various other storage spaces. The data space may store data in various formats (e.g., a tabular dataset) and organize the data by data fields. For example, for the data field "employee name," the corresponding data in the data field may include "John Doe," "Luke Webb," and the like. When a user queries some or all of the data in the data field, the security level of the data field is required to determine whether the authenticated user has access to the data. Here, the system 102 may search for the query data field from the security level table. The security level table may be stored in the memory 106 or otherwise accessible to the processor 104. The security level table may include security levels of various source data fields. For example, raw data collected from systems external to the data warehouse may be considered source data and assigned corresponding source data fields. The security level of the source data field may be determined based on rules. For example, one rule for ranking source data fields may be that any data field that contains the word "ID" in its field name is assigned the highest security level. Those of ordinary skill in the art will appreciate that various other alternative methods may be employed to rank the security level of the source data field.
Here, after searching the security level table, the system 102 may obtain a security level corresponding to the query data field from the security level table in response to searching the query data field from the security level table. In response to no search for the query data field from the security level table, a security level corresponding to the query data field is determined based at least on the lineage tree (LINEAGE TREE) and the security level table. The lineage tree can trace back the query data field to one or more source data fields, and the security data level table can include one or more security levels corresponding to the one or more source data fields. As described, the data space may include a source data field and a derived data field. Some of the derived data fields may each be obtained from one or more of the source data fields, and more of the derived data fields may each be obtained from one or more of the existing derived data fields and/or one or more of the source data fields. Thus, any derived data field may be traced back through the layers of intermediate derived data fields at one or more ends, if desired, to ultimately reach one or more source data fields in its lineage tree. Further, each derivative may correspond to a derivative function, and the derivative function may be associated with a conversion of one or more security levels of one or more parent data fields to a security level of a child data field. Thus, the security level of one or more source data fields may be used to obtain the security level of the next layer of derived data fields, and recursively applied to the next layer to obtain the security levels of all derived data fields.
Referring to fig. 3A, fig. 3A illustrates an exemplary lineage tree for data security ranking according to various embodiments. The operations shown in fig. 3A and presented below are intended to be illustrative. This figure shows a lineage tree of the derived data field D0. In this figure, all source data fields are represented by circles and all derived data fields are represented by squares. The source data field may be considered the root of the lineage tree and the derived data field may be considered a branch or leaf. In some embodiments, the lineage tree can trace back the derived data field (e.g., data field D0) to one or more first level parent data fields immediately above data field D0 (e.g., data fields D1A, D B and O1). Each first level parent data field may be a source data field (e.g., O1) or a first level derivative data field (e.g., data fields D1A and D1B), and the derivative data field may be derived from one or more first level parent data fields based on a derivative function. For the first level derivative data fields, the lineage tree can trace the first level derivative data fields to one or more second level parent data fields (e.g., trace data fields D1A through D2A, trace data fields D1B through D2A and O2), each second level parent data field is a source data field or a second level derivative data field, and the first level derivative data fields are derived from one or more second level parent data fields based on another derivative function. The trace back may be replicated on any derivative data field until one or more of the source data fields are traced back (e.g., data field D2A may be traced back further to O3 and O1, and no further trace back from the source data fields is required). Various levels of the derived data field are associated with data field D0. That is, the level may be the number of arrow paths in the lineage tree required to reach data field D0. Each arrow may refer to a derivative corresponding to a derivative function, pointing from a parent data field to a child data field (any source or derivative data field may be a parent data field, and only the derivative data field may be a child data field). The derivative function may be any algorithm or equivalent as specified by the user, such as summation, addition, multiplication, counting, etc. For example, data field D1A may be the sum of the data entries in data field D2A. For another example, data field D2A and data field O2 may have the same number of entries, and data field D1B may be a one-to-one addition of data entries of D2A and data entries of O2.
Further, the lineage tree can include derivative functions, each derivative function associated with a security level transition from a corresponding one or more parent data fields to a child data field. Such correspondence may be stored in a table. For example, for a summing function (summing entries in two or more parent data fields into a child data field), the highest data security level in the parent data field is passed to the security level of the child data field. That is, the security level of the derived data field obtained may be the highest security level among the security levels of the first-level parent data field. For another example, for a count function (counting data entries in a parent data field to obtain a child data field), the security level of the child data field may be set to an arbitrarily low level. Accordingly, determining a security level corresponding to the query data field based at least on the lineage tree and the security level table may include: for each derived data field of any level, a security level of the derived data field is obtained based at least on (1) a derived function of the derived data field and (2) a security level of a first level parent data field of the derived data field. Such a determination algorithm may be applied recursively layer by layer to ultimately link to a known security level of the source data field in the security level table located at the root of the lineage tree, so the security level of the data field D0 is obtainable by inserting the security level of the source field. Further, the security level of any derived data field may be automatically and dynamically updated with any change in the data (e.g., change in data entry, change in derivative function, addition of data field, etc.).
In some embodiments, determining the security level corresponding to the query data field based at least on the lineage tree and the security level table may further include: one of the derived data fields (e.g., data field D0) corresponding to the query data field is determined, and the determined security level of the derived data field is used as the security level of the query data field. That is, if the query data field is D0, then the security level of D0 may be invoked to process the query. In some other embodiments, as described above, if the query data field matches a stored data field in the security level table, the security level corresponding to the stored data field may be obtained directly from the security level table as the security level corresponding to the query data field.
Referring back to fig. 2, to accurately return the security level of the query data field, the system 102 may be configured to pre-perform the steps. In some embodiments, system 102 can rank one or more source data fields in a data space into one or more security levels according to one or more rules. As described above, the data space may further include one or more derived data fields in addition to the source data field, each derived data field containing data derived from one or more parent data fields, wherein a parent data field is the source data field or another derived data field. Upon receiving a query for a query data field, system 102 can record SQL (structured query language) statements used to generate the query data field. For example, an SQL statement may include counting the number of data entries in an existing descendent data field, and the counted number may become a new descendent data field. The recording may be performed in real time. The system 102 can then parse the SQL statement (e.g., by converting the SQL statement into an abstract syntax tree) and construct a lineage tree based on the parsed SQL statement, the lineage tree tracing the query data fields back to the one or more parent data fields. For example, to parse the SQL statement, the system 102 can convert the SQL statement into an abstract syntax tree, and to construct the lineage tree, the system 102 can traverse the abstract syntax tree to identify a derivative relationship between the query data field and one or more of the source data field and/or the derivative data field. Thus, the system 102 may determine a security level for the query data field based at least on the lineage tree and one or more security levels for one or more parent data fields of the query data field.
Referring to fig. 3B, fig. 3B illustrates another exemplary lineage tree for data security ranking according to various embodiments. The operations shown in fig. 3B and presented below are intended to be illustrative. The symbols in fig. 3B are similar to those in fig. 3A as described above. The lineage tree shown in fig. 3B can be considered as a generalization of the lineage tree shown in fig. 3A. In some embodiments, to determine the security level of the query data field based at least on the lineage tree and one or more security levels of one or more parent data fields of the query data field, the system 102 may: (1) Determining, based on the one or more corresponding nth derivative functions and the security levels of the one or more source data fields, one or more nth security levels corresponding to one or more nth derivative data fields derived from the one or more source data fields in the lineage tree; (2) Determining, based on the one or more corresponding N-1 th derivation functions and the security level of the one or more source data fields and/or the one or more N-1 th security levels, one or more N-1 th security levels corresponding to one or more N-1 th derivation data fields derived from the one or more source data fields and/or the N-th derivation data fields in the lineage tree; and repeating step (2) in the lineage tree toward the query data field to determine a security level of the query data field. Here, the security level of the query data field and the derived data field may be determined based on the direct parent data field (the parent data field that points directly to the child data field). For example, the security level of the nth data field may be obtained from the security level of the source data field, the security level of the N-1 st data field may be obtained from the security level of the nth data field, and so on to obtain the security level of D0. In this figure, the level of the child level data field is one level lower than the lowest level of its parent level data field. For example, data field D (N-1) B is derived from source data field OC and derived data field DNB. Since the nth data field DNB is the lowest among all parent data fields of the data field D (N-1) B (here, the original level > nth level > N-1 level … > 1 st level), the level of the data field D (N-1) B is the N-1 th level.
In this manner, the security level of any derived data field may be determined based on the security level of its lineage tree traced back to one or more source data fields. In the data space, each individual lineage tree can capture dynamic changes of the corresponding derived data fields. Since the number of source data fields may be more stable and less numerous than the derived data fields, the source data fields may be pre-staged and applied in the lineage tree to obtain the security level of the derived data fields according to the derived functions of each level.
Fig. 4A illustrates a flow chart of an exemplary method 400 according to various embodiments of the present disclosure. The method 400 may be implemented in a variety of environments including, for example, the environment 100 of fig. 1. The example method 400 may be implemented by one or more components of the system 102 (e.g., the processor 104, the memory 106). The exemplary method 400 may be implemented by a plurality of systems (e.g., computers) similar to the system 102. The operations of method 400 presented below are intended to be illustrative. Depending on the implementation, the example method 400 may include additional, fewer, or alternative steps performed in various orders or in parallel.
At block 402, a request to access a query data field may be received. At block 404, the query data field may be searched from a security level table. In response to searching the query data field from the security level table, a security level corresponding to the query data field may be obtained from the security level table at block 406. In response to not searching for the query data field from the security level table, a security level corresponding to the query data field may be determined based at least on the lineage tree and the security level table, at block 408. The lineage tree can trace the query data field back to one or more source data fields, and the security data level table can include one or more security levels corresponding to the one or more source data fields.
Fig. 4B illustrates a flowchart of an exemplary method 410 according to various embodiments of the present disclosure. The method 410 may be implemented in a variety of environments including, for example, the environment 100 of fig. 1. The example method 410 may be implemented by one or more components of the system 102 (e.g., the processor 104, the memory 106). The exemplary method 410 may be implemented by a plurality of systems (e.g., computers) similar to the system 102. The operations of method 410 presented below are intended to be illustrative. Depending on the implementation, the exemplary method 410 may include additional, fewer, or alternative steps performed in various orders or in parallel.
At block 412, one or more source data fields in the data space may be classified into one or more security levels according to one or more rules, the data space further including one or more derivative data fields, each containing data derived from one or more parent data fields, wherein a parent data field is a source data field or another derivative data field. At block 414, for the query data field, SQL (structured query language) statements may be recorded that are used to generate the query data field. At block 416, the SQL statement may be parsed. At block 418, a lineage tree can be constructed based on the parsed SQL statement, with the lineage tree tracing back the query data fields to one or more parent data fields. At block 419, a security level for the query data field may be determined based at least on the lineage tree and one or more security levels for one or more parent data fields of the query data field.
Fig. 4C illustrates a flowchart of an exemplary method 420 according to various embodiments of the present disclosure. Method 420 may be implemented in a variety of environments including, for example, environment 100 of fig. 1. The example method 420 may be implemented by one or more components of the system 102 (e.g., the processor 104, the memory 106). The example method 420 may be implemented by a plurality of systems (e.g., computers) similar to the system 102. The operations of method 420 presented below are intended to be illustrative. Depending on the implementation, the example method 420 may include additional, fewer, or alternative steps performed in various orders or in parallel.
At block 422, a query associated with a query data field may be received. For example, a query may call a certain data in a query data field, a certain attribute of a query data field, and so on. At block 424, the security level of the query data field may be determined based on: one or more corresponding security levels of one or more immediate upper data fields of the query data field, and deriving a derivation function of the data in the query data field from the data in the one or more immediate upper data fields. As described above, the query data field may be traced back to its first level parent data field (one or more directly superior data fields), which may contain the source data field and/or the first level derivative data field. The first level derivative data field may be similarly traced back to its one or more directly upper level data fields, respectively, and each forward trace back may end at the source data field. Here, when the query data field is received, the security level of the intermediate data field (the data field of the query data field and not the source data field in the lineage tree of the query data field) may be dynamically generated and obtained, and may not be stored and searched in advance. In some embodiments, for each of one or more directly superior data fields that are source data fields, the security level associated with each source data field may be applied to the block 424 determination. For each of the one or more direct upper data fields that are not source data fields, block 424 may be repeated with the direct upper data field as the query data field until the direct upper data field is the only source data field obtained, and block 424 may be applied to determine the security level associated with each source data field.
The techniques described herein are implemented by one or more special purpose computing devices. The special purpose computing device may be hardwired to perform the techniques, or may include circuitry or digital electronics, such as one or more Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) that are continuously programmed to perform the techniques, or may include one or more hardware processors that are programmed to perform the techniques according to program instructions in firmware, memory, other storage, or a combination. Such special purpose computing devices may also incorporate custom hardwired logic, ASICs, or FPGAs in combination with custom programming to implement techniques. The special purpose computing device may be a desktop computer system, a server computer system, a portable computer system, a handheld device, a network device, or any other device or combination of devices that implement techniques in connection with hard-wired and/or program logic. One or more computing devices are typically controlled and coordinated by operating system software. Conventional operating systems control and schedule computer processes for execution, perform memory management, provide file systems, networks, I/O services, and provide user interface functionality, such as a graphical user interface ("GUI"), and the like.
FIG. 5 is a block diagram illustrating a computer system 500 upon which any of the embodiments described herein may be implemented. System 500 may correspond to system 102 as described above. Computer system 500 includes a bus 502 or other communication mechanism for communicating information, and one or more hardware processors 504 coupled with bus 502 for processing information. For example, the one or more hardware processors 504 may be one or more general purpose microprocessors. The one or more processors 504 may correspond to the processors 104 as described above.
Computer system 500 also includes a main memory 506, such as a Random Access Memory (RAM), cache memory, and/or other dynamic storage device, coupled to bus 502 for storing information and instructions to be executed by processor 504. Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504. When stored in a storage medium accessible to the processor 504, the instructions make the computer system 500 a special purpose machine that is customized to perform the operations specified in the instructions. Computer system 500 further includes a Read Only Memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504. A storage device 510, such as a magnetic disk, optical disk, or USB thumb drive (flash drive), is provided and coupled to bus 502 for storing information and instructions. The main memory 506, ROM 508, and/or storage 510 may correspond to the memory 106 described above.
Computer system 500 may use custom hardwired logic, one or more ASICs or FPGAs, firmware, and/or program logic to implement the techniques described herein, in conjunction with the computer system, to make computer system 500 a special purpose machine or to program the system as a special purpose machine. According to one embodiment, the techniques herein are performed by computer system 500 in response to one or more processors 504 executing one or more sequences of one or more instructions contained in main memory 506. Such instructions may be read into main memory 506 from another storage medium, such as storage device 510. Execution of the sequences of instructions contained in main memory 506 causes one or more processors 504 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
Main memory 506, ROM 508, and/or storage 510 may include non-transitory storage media. The term "non-transitory medium" and similar terms as used herein refer to any medium that stores data and/or instructions that cause a machine to operate in a specific manner. Such non-transitory media may include non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 510. Volatile media includes dynamic memory, such as main memory 506. For example, common forms of non-transitory media include, for example: floppy disk, flexible disk, hard disk, solid state disk, magnetic tape, or any other magnetic data storage medium, CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, RAM, PROM, and EPROM, FLASH-EPROM, NVRAM, any other memory chip or cartridge, and network versions thereof.
Computer system 500 also includes a network interface 518 coupled to bus 502. Network interface 518 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks. For example, network interface 518 may be an Integrated Services Digital Network (ISDN) card, a cable modem, a satellite modem, or a modem to provide a communication connection to data on a corresponding type of telephone line. As another example, network interface 518 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component in communication with a WAN). Wireless links may also be used. In any such implementation, network interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
Computer system 500 can send messages and receive data, including sending and receiving program code, through the network(s), network link and network interface 518. In the Internet example, a server might transmit a requested code for an application program through the Internet, ISP, local network and network interface 518.
The received code may be executed by processor 504 as it is received, and/or stored in storage device 510, or other non-volatile storage for later execution.
Each of the flows, methods, and algorithms described in the foregoing sections may be implemented by code modules executed by one or more computer systems or computer processors, including computer hardware, and may be fully or partially automated. The processes and algorithms may be implemented in part or in whole in dedicated circuitry.
The various features and processes described above may be used independently of one another or may be used in various combinations. All possible combinations and subcombinations are intended to fall within the scope of this disclosure. In addition, certain methods or flow blocks may be omitted in some embodiments. Nor is the method and process described herein limited to any particular order, and blocks or states associated therewith may be performed in other orders as appropriate. For example, the described blocks or states may be performed in a different order than specifically disclosed, or multiple blocks or states may be combined in a single block or state. The exemplary blocks or states may be performed serially, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed exemplary embodiments. The example systems and components described herein may be configured differently than described. For example, elements may be added, removed, or rearranged as compared to the disclosed example embodiments.
Various operations of the exemplary methods described herein may be performed, at least in part, by algorithms. The algorithms may be included in program code or instructions stored in a memory (e.g., the non-transitory computer readable storage medium described above). The algorithm may include a machine learning algorithm. In some embodiments, the machine learning algorithm may not explicitly program the computer to perform the function, but rather learn from training data to make a predictive model to perform the function.
Various operations of the example methods described herein may be performed, at least in part, by one or more processors that are temporarily configured (e.g., via software configuration) or permanently configured to perform the relevant operations. Such a processor, whether configured temporarily or permanently, may constitute a processor-implemented engine that operates to perform one or more of the operations or functions described herein.
Similarly, the methods described herein may be implemented, at least in part, by processors, with one or more particular processors being examples of hardware. For example, at least some of the operations of the method may be performed by one or more processors or processor-implemented engines. In addition, one or more processors may also operate in a "cloud computing" environment or as a "software as a service" (SaaS) to support the execution of related operations. For example, at least some of the operations may be performed by a set of computers (as examples of machines that include processors), which may be accessed via a network (e.g., the internet) and via one or more suitable interfaces (e.g., application Program Interfaces (APIs)).
The performance of certain operations may be distributed among processors, not only residing in a single machine, but also deployed across multiple machines. In some example embodiments, the processor or processor-implemented engine may be located in a separate geographic location (e.g., within a home environment, an office environment, or a server farm). In other exemplary embodiments, the processor or processor-implemented engine may be distributed across multiple geographic locations.
Throughout this specification, multiple examples may implement components, operations, or structures described as a single example. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the separate operations may be performed concurrently and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in the exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functions illustrated as a single component may be implemented as multiple separate components. Such structural and functional and additional variations, modifications, additions and improvements are within the scope of the subject matter herein.
Although an overview of the subject matter has been described with reference to specific exemplary embodiments, various modifications and changes may be made to these embodiments without departing from the broader scope of embodiments of the disclosure. These embodiments of the inventive subject matter may be referred to, individually or collectively, by the term "application" merely for convenience and without intending to voluntarily limit the scope of this application to any single disclosure or concept if more than one is in fact disclosed.
The embodiments illustrated herein are described in sufficient detail to enable those skilled in the art to practice the disclosed teachings. Other embodiments may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. The detailed description is, therefore, not to be taken in a limiting sense, and the scope of the various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.
Any process descriptions, elements, or blocks in the flowcharts described herein and/or depicted in the figures should be understood as potentially representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process. Alternative implementations, as will be appreciated by those skilled in the art, are included within the scope of the embodiments described herein, wherein elements or functions may be deleted from those shown or discussed, and the order of execution of the elements or functions may be disturbed (including substantially simultaneously or in reverse order) depending on the functionality involved.
As used herein, the term "or" may be interpreted in an inclusive or exclusive sense. Further, multiple examples of resources, operations, or structures described herein as a single example may be provided. In addition, boundaries between various resources, operations, engines, and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are contemplated and may fall within the scope of various embodiments of the present disclosure. Generally, structures and functionality represented as separate resources in the exemplary configurations may be implemented as a combined structure or resource. Similarly, structures and functionality represented as a single resource may be implemented as multiple separate resources. These and other variations, modifications, additions, and improvements fall within the scope of embodiments of the disclosure as expressed in the claims that follow. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Conditional language (e.g., "capable," "might," "perhaps," or "may," etc.) is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements, and/or steps unless explicitly stated otherwise or otherwise understood in the context of use. Thus, such conditional language is not generally intended to imply that features, elements and/or steps must be in any way for one or more embodiments or that one or more embodiments necessarily contain logic for determining: whether such features, elements, and/or steps are included in or are to be performed in any particular embodiment with or without user input or prompting.

Claims (16)

1. A computer-implementable method for data security classification, the method comprising:
receiving a request to access a query data field;
searching the query data field from a security level table;
In response to searching the query data field from the security level table, obtaining a security level corresponding to the query data field from the security level table; and
In response to not searching for the query data field from the security level table, determining a security level corresponding to the query data field based at least on a lineage tree and the security level table, wherein:
the lineage tree tracing the query data field to one or more source data fields; and
The security level table includes one or more security levels corresponding to the one or more source data fields;
Wherein the lineage tree traces a derived data field to one or more first level parent data fields, each first level parent data field being the source data field or a first level derived data field, and the derived data field is derived from the one or more first level parent data fields based on a derived function;
For the first-level derived data field, the lineage tree traces the first-level derived data field to one or more second-level parent data fields, each second-level parent data field being the source data field or a second-level derived data field, and the first-level derived data field being derived from the one or more second-level parent data fields based on another derived function;
the trace back is replicated on any derived data fields until one or more of the source data fields are traced back; and
The lineage tree includes the derivative function.
2. The method of claim 1, wherein determining a security level corresponding to the query data field based at least on a lineage tree and the security level table comprises:
For each derived data field of any level, obtaining a security level of the derived data field based at least on (1) the derived function of the derived data field and (2) a security level of the first level parent data field of the derived data field; and
One of the derived data fields corresponding to the query data field is determined, and the determined security level of the derived data field is taken as the security level of the query data field.
3. The method according to claim 2, wherein:
The determined security level of the derived data field is the highest security level of the security levels of the first-level parent data fields of the derived data field.
4. A computer-implementable method for data security classification, the method comprising:
Classifying one or more source data fields in a data space according to one or more rules into one or more security levels, the data space further comprising one or more derivative data fields, each derivative data field containing data derived from one or more parent data fields, wherein the parent data field is the source data field or another derivative data field;
For a query data field, recording an SQL (structured query language) statement for generating the query data field;
analyzing the SQL statement;
Constructing a lineage tree based on the parsed SQL statement, the lineage tree tracing the query data field to one or more parent data fields; and
Determining a security level of the query data field based at least on the lineage tree and one or more security levels of the one or more parent data fields of the query data field;
Wherein parsing the SQL statement includes converting the SQL statement into an abstract syntax tree; and
Constructing a lineage tree based on the parsed SQL statement includes: traversing the abstract syntax tree to identify a derivative relationship between the query data field and one or more of the source data field and/or derivative data field.
5. The method of claim 4, wherein determining a security level of the query data field based at least on the lineage tree and the one or more security levels of the one or more parent data fields of the query data field comprises:
(1) Determining one or more nth security levels corresponding to one or more nth derived data fields derived from the one or more source data fields in the lineage tree based on one or more corresponding nth derived functions and security levels of the one or more source data fields;
(2) Determining one or more N-1 th security levels corresponding to one or more N-1 th derived data fields derived from the one or more source data fields and/or the N-th derived data fields in the lineage tree based on one or more corresponding N-1 th derived functions and the security levels of the one or more source data fields and/or the one or more N-th security levels; and
Repeating step (2) in the lineage tree toward the query data field to determine a security level of the query data field.
6. The method according to claim 4, wherein:
The security levels of the query data field and the derived data field are determined based on a direct parent data field.
7. A system for data security classification, comprising a processor configured to:
receiving a request to access a query data field;
searching the query data field from a security level table;
In response to searching the query data field from the security level table, obtaining a security level corresponding to the query data field from the security level table; and
In response to not searching for the query data field from the security level table, determining a security level corresponding to the query data field based at least on a lineage tree and the security level table, wherein:
the lineage tree tracing the query data field to one or more source data fields; and
The security level table includes one or more security levels corresponding to the one or more source data fields;
Wherein the lineage tree traces a derived data field to one or more first level parent data fields, each first level parent data field being the source data field or a first level derived data field, and the derived data field is derived from the one or more first level parent data fields based on a derived function;
For the first-level derived data field, the lineage tree traces the first-level derived data field to one or more second-level parent data fields, each second-level parent data field being the source data field or a second-level derived data field, and the first-level derived data field being derived from the one or more second-level parent data fields based on another derived function;
the trace back is replicated on any derived data fields until one or more of the source data fields are traced back; and
The lineage tree includes the derivative function.
8. The system of claim 7, wherein to determine a security level corresponding to the query data field based at least on the lineage tree and the security level table, the processor is configured to:
For each derived data field of any level, obtaining a security level of the derived data field based at least on (1) the derived function of the derived data field and (2) a security level of the first level parent data field of the derived data field; and
One of the derived data fields corresponding to the query data field is determined, and the determined security level of the derived data field is taken as the security level of the query data field.
9. The system of claim 8, wherein:
The determined security level of the derived data field is the highest security level of the security levels of the first-level parent data fields of the derived data field.
10. A system for data security classification, comprising a processor configured to:
Classifying one or more source data fields in a data space according to one or more rules into one or more security levels, the data space further comprising one or more derivative data fields, each derivative data field containing data derived from one or more parent data fields, wherein the parent data field is the source data field or another derivative data field;
For a query data field, recording an SQL (structured query language) statement for generating the query data field;
analyzing the SQL statement;
constructing a lineage tree based on the parsed SQL statement, the lineage tree tracing the query data field to one or more parent data fields; and
Determining a security level of the query data field based at least on the lineage tree and one or more security levels of the one or more parent data fields of the query data field;
wherein, to parse the SQL statement, the processor is configured to convert the SQL statement into an abstract syntax tree; and
To construct the lineage tree based on the parsed SQL statement, the processor is configured to traverse the abstract syntax tree to identify a derivative relationship between the query data field and one or more of the source data field and/or derivative data field.
11. The system of claim 10, wherein to determine the security level of the query data field based at least on the lineage tree and the one or more security levels of the one or more parent data fields of the query data field, the processor is configured to:
(1) Determining one or more nth security levels corresponding to one or more nth derived data fields derived from the one or more source data fields in the lineage tree based on one or more corresponding nth derived functions and security levels of the one or more source data fields;
(2) Determining one or more N-1 th security levels corresponding to one or more N-1 th derived data fields derived from the one or more source data fields and/or the N-th derived data fields in the lineage tree based on one or more corresponding N-1 th derived functions and the security levels of the one or more source data fields and/or the one or more N-th security levels; and
Repeating step (2) in the lineage tree toward the query data field to determine a security level of the query data field.
12. The system of claim 10, wherein:
The security levels of the query data field and the derived data field are determined based on a direct parent data field.
13. A computer-implementable method for data security classification, the method comprising:
(1) Receiving a query associated with a query data field; and
(2) The security level of the query data field is determined based on: one or more corresponding security levels of one or more direct upper data fields of the query data field, and deriving a derivation function of the data in the query data field from the data in the one or more direct upper data fields;
Wherein the direct upper level data field is a first level parent level data field, and step (2) includes:
Tracing a derived data field corresponding to the query data field to one or more of the first level parent data fields based on a lineage tree, each first level parent data field being a source data field or a first level derived data field, and the derived data field being derived from the one or more first level parent data fields based on the derived function;
For the first-level derived data field, the lineage tree traces the first-level derived data field to one or more second-level parent data fields, each second-level parent data field being the source data field or a second-level derived data field, and the first-level derived data field being derived from the one or more second-level parent data fields based on another derived function; and
The lineage tree includes the derivative function.
14. The method of claim 13, wherein step (2) comprises:
(3) Applying a security level associated with each of the source data fields to the determination of step (2) for each of the one or more directly superior data fields that are source data fields; and
(4) Repeating step (2) for each of the one or more direct upper data fields that are not source data fields, taking the direct upper data fields as query data fields until only source data fields are obtained for the direct upper data fields, and applying step (3).
15. A system for data security classification, comprising a processor configured to:
(1) Receiving a query associated with a query data field; and
(2) The security level of the query data field is determined based on: one or more corresponding security levels of one or more direct upper data fields of the query data field, and deriving a derivation function of the data in the query data field from the data in the one or more direct upper data fields;
wherein the immediately superior data field is a first-level parent data field, the processor being configured to:
Tracing a derived data field corresponding to the query data field to one or more of the first level parent data fields based on a lineage tree, each first level parent data field being a source data field or a first level derived data field, and the derived data field being derived from the one or more first level parent data fields based on the derived function;
For the first-level derived data field, the lineage tree traces the first-level derived data field to one or more second-level parent data fields, each second-level parent data field being the source data field or a second-level derived data field, and the first-level derived data field being derived from the one or more second-level parent data fields based on another derived function; and
The lineage tree includes the derivative function.
16. The system of claim 15, wherein to determine the security level of the query data field, the processor is configured to:
(3) Applying a security level associated with each of the source data fields to the determination of step (2) for each of the one or more directly superior data fields that are source data fields; and
(4) Repeating step (2) for each of the one or more direct upper data fields that are not source data fields, taking the direct upper data fields as query data fields until only source data fields are obtained for the direct upper data fields, and applying step (3).
CN201880097090.7A 2018-07-16 2018-12-29 Intelligent landmark Active CN112639786B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US16/036,865 US10922430B2 (en) 2018-07-16 2018-07-16 System and method for data security grading
US16/036,865 2018-07-16
PCT/US2018/068075 WO2020018144A1 (en) 2018-07-16 2018-12-29 System and method for data security grading

Publications (2)

Publication Number Publication Date
CN112639786A CN112639786A (en) 2021-04-09
CN112639786B true CN112639786B (en) 2024-07-16

Family

ID=

Similar Documents

Publication Publication Date Title
US20200050968A1 (en) Interactive interfaces for machine learning model evaluations
US10387801B2 (en) Method of and system for generating a prediction model and determining an accuracy of a prediction model
US9454599B2 (en) Automatic definition of entity collections
EP3161635B1 (en) Machine learning service
US10318882B2 (en) Optimized training of linear machine learning models
Poorthuis et al. Making big data small: strategies to expand urban and geographical research using social media
US10452702B2 (en) Data clustering
WO2021114810A1 (en) Graph structure-based official document recommendation method, apparatus, computer device, and medium
US20150100605A1 (en) Determining collection membership in a data graph
WO2018121198A1 (en) Topic based intelligent electronic file searching
Gupta et al. Faster as well as early measurements from big data predictive analytics model
US10311093B2 (en) Entity resolution from documents
CN112070550A (en) Keyword determination method, device and equipment based on search platform and storage medium
US11449628B2 (en) System and method for data security grading
CN112639786B (en) Intelligent landmark
CN112348041A (en) Log classification and log classification training method and device, equipment and storage medium
US20150286700A1 (en) Recording medium having stored thereon database access control program, method for controlling database access, and information processing apparatus
Singh NoSQL: A new horizon in big data
US11436220B1 (en) Automated, configurable and extensible digital asset curation tool
US11500933B2 (en) Techniques to generate and store graph models from structured and unstructured data in a cloud-based graph database system
Glumenko et al. Characteristics of the development of an it project for creation of open educational resource
US20240012827A1 (en) Cleaning and organizing schemaless semi-structured data for extract, transform, and load processing
Guo Distributed Time Series Analytics
Albawi et al. Review Previous Solutions To Query Based Uncertain Object Determining
Bagwari et al. Indexing optimizations on Hadoop

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant