CN112989412B - Data desensitization method and device based on SQL statement analysis - Google Patents

Data desensitization method and device based on SQL statement analysis Download PDF

Info

Publication number
CN112989412B
CN112989412B CN202110291401.7A CN202110291401A CN112989412B CN 112989412 B CN112989412 B CN 112989412B CN 202110291401 A CN202110291401 A CN 202110291401A CN 112989412 B CN112989412 B CN 112989412B
Authority
CN
China
Prior art keywords
data
query
field
desensitization
fields
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110291401.7A
Other languages
Chinese (zh)
Other versions
CN112989412A (en
Inventor
李圣权
彭大蒙
毛云青
郁强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCI China Co Ltd
Original Assignee
CCI China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCI China Co Ltd filed Critical CCI China Co Ltd
Priority to CN202110291401.7A priority Critical patent/CN112989412B/en
Publication of CN112989412A publication Critical patent/CN112989412A/en
Application granted granted Critical
Publication of CN112989412B publication Critical patent/CN112989412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data desensitization method and a device based on SQL statement analysis, wherein the method acquires an SQL query statement and a transfer field list of a user; executing query SQL statement query to acquire data to be desensitized of corresponding query fields; analyzing the query SQL sentence into a query data block sentence tree, recursively querying sentence blocks of the data block sentence tree, and acquiring data table information and field information according to the query field; analyzing the data table information and the field information to obtain a source data table and a source field corresponding to the query field; the field to be transmitted is encrypted, desensitization rules corresponding to the source field are obtained based on a preset desensitization information configuration table, desensitization processing is carried out on the data to be desensitized, effective desensitization can be carried out on the data inquired in the database in real time according to actual use requirements, and meanwhile the effect of desensitization data transmission is considered.

Description

Data desensitization method and device based on SQL statement analysis
Technical Field
The invention relates to the field of computer data processing, in particular to a data desensitization method and device based on SQL statement analysis.
Background
Data desensitization refers to a data processing technology for performing data deformation on some sensitive privacy information through desensitization rules to realize reliable protection of the sensitive privacy data. For example, the ID card number is randomly replaced into a group of 18 digits, and the middle digits of the mobile phone number are shielded by the star for displaying, so that the effect of protecting the personal information of the user is achieved.
When the relational database displays data, SQL (structured query language) is used for inquiring the data, if the inquired data needs desensitization data, the desensitization processing of the data needs to be carried out by combining the use requirements of a user and different desensitization rules corresponding to data tables and fields involved in data inquiry, and meanwhile, the transmission of sensitive information among different pages needs to be considered. For example, if a hyperlink of a person name is clicked in a person list to enter a person detail page, the identity card number needs to be transmitted, sensitive information cannot be exposed, different desensitization rules may need to be used when different users view the same piece of data, and desensitization processing is not needed for users who need to display real content of a sensitive field in business requirements.
In order to meet the increasingly stringent requirement for data desensitization, two types of desensitization methods are currently available: static data desensitization and dynamic data desensitization. Static data desensitization generally stores desensitization data after desensitization into a specific data table through algorithms such as deformation, replacement, shielding, format-preserving encryption and the like, and provides the specific data table for a user to inquire and display, however, different data tables need to be generated by the method aiming at desensitization data processed by different desensitization rules, so that the cost of data processing and data storage is greatly increased, the timeliness of the data is poor, and the method is only suitable for data use scenes with small data volume and data distribution and sharing separated from a production environment; the desensitization of the dynamic data needs to analyze SQL statements, detect sensitive data tables in data query and fields of data to be processed to match desensitization conditions, and return desensitization data after desensitization to an application end by rewriting query SQL or intercepting protection after successful matching. However, this method does not consider the source of the field and the field processed by the function, which easily causes a lot of omission of sensitive field processing.
Disclosure of Invention
The invention aims to provide a data desensitization method and a data desensitization device based on SQL statement analysis, which can effectively desensitize data inquired by a database in real time according to actual use requirements and simultaneously give consideration to the effect of desensitization data transmission.
In order to achieve the above object, in a first aspect, the present technical solution provides a data desensitization method based on SQL statement parsing, including the following steps: acquiring a data query request of a user, wherein the data query request comprises a query SQL statement containing a query field; executing the query SQL statement query to acquire data to be desensitized corresponding to the query field; analyzing the query SQL sentence into a query data block sentence tree, recursion of the sentence blocks of the query data block sentence tree, and obtaining data table information and field information according to query fields; analyzing the data table information and the field information to obtain a source data table and a source field corresponding to the query field; and acquiring desensitization rules corresponding to the source fields based on a preset desensitization information configuration table, and performing desensitization treatment on the data to be desensitized, wherein the desensitization rules of users for different source data tables and the source fields are stored in the preset desensitization information configuration table.
In a second aspect, the present technical solution provides a data desensitization apparatus based on SQL statement parsing, including:
the system comprises a request acquisition unit, a query processing unit and a query processing unit, wherein the request acquisition unit is used for acquiring a data query request of a user, and the data query request comprises a query SQL statement containing a query field;
the query execution unit is used for executing the query SQL statement query to acquire the data to be desensitized corresponding to the query field;
the analysis unit is used for analyzing the query SQL statement into a query data block statement tree, recursing statement blocks of the query data block statement tree, and acquiring data table information and field information according to query fields;
the analysis unit is used for analyzing the data table information and the field information to obtain a source field and a source data table corresponding to the query field;
the matching unit is used for acquiring desensitization rules corresponding to the source fields based on the preset desensitization information configuration table, wherein the desensitization rules of users for different source data tables and the source fields are stored in the preset desensitization information configuration table;
and the desensitization unit is used for desensitizing the data to be desensitized.
In a third aspect, the present disclosure provides an electronic device, including: the data desensitization device comprises a memory and a processor, wherein the memory is stored with a computer program, and the processor is configured to run the computer program to execute the data desensitization device method based on SQL statement analysis.
Compared with the prior art, the technical scheme has the following characteristics and beneficial effects:
compared with a static data desensitization technology, the method has the advantages that the effect of effectively desensitizing inquired data in real time according to the actual use requirements of users is realized, the timeliness of data desensitization is improved, and a large number of data tables do not need to be prestored to achieve the effect of reducing data storage pressure;
compared with a dynamic data desensitization technology, the data transmission requirement of data under the action of hyperlinks and the like is met, the field transfer relation in the query statement and the fields processed by functions are considered in the data desensitization processing process, and the fields are not omitted in the data desensitization processing process.
Drawings
Fig. 1 is a schematic flow chart of a data desensitization method based on SQL statement parsing according to an embodiment of the present solution.
Fig. 2 is a schematic flow chart of a data desensitization method based on SQL statement parsing according to another embodiment of the present solution.
Fig. 3 is a schematic diagram of a framework of a data desensitization device based on SQL statement parsing according to an embodiment of the present solution in fig. 1.
Fig. 4 is a schematic diagram of a framework of a data desensitization device based on SQL statement parsing according to an embodiment of the present solution in accordance with fig. 2.
Fig. 5 is a schematic diagram of a frame of an electronic device according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
In order to achieve the effects of real-time effective data desensitization and desensitization data transmission of data after SQL query, the scheme provides a data desensitization method and system based on SQL statement analysis.
According to the scheme, a data administrator needs to write desensitization rules of different data tables and fields of a user into a background database in advance to form a preset desensitization information configuration table, the user queries data by using the data desensitization system and method based on SQL statement analysis, the desensitization rules corresponding to the query fields in the data to be desensitized are matched based on the identity of the current user, and desensitization processing is performed on the data to be desensitized by using the desensitization rules obtained by matching to obtain desensitization data, so that real-time effective data desensitization is realized; and meanwhile, the data needing value transmission is encrypted to generate an additional field so as to achieve the effect of desensitization data transmission.
Specifically, the data desensitization method based on SQL statement analysis comprises the following steps:
acquiring a data query request of a user, wherein the data query request comprises a query SQL statement containing a query field;
executing the query SQL statement query to acquire data to be desensitized corresponding to the query field;
analyzing the query SQL statement into a query data block statement tree, recursion of statement blocks of the query data block statement tree, and acquisition of data table information and field information according to the query field;
analyzing the data table information and the field information to obtain a source data table and a source field corresponding to the query field;
and acquiring desensitization rules corresponding to the source fields based on the preset desensitization information configuration table, and desensitizing the data to be desensitized, wherein the desensitization rules of the user for different source data tables and the source fields are stored in the desensitization information configuration table.
It is worth mentioning that in the scheme, when a user uses a query SQL statement to perform data query, the scheme generally includes selecting fields, processing functions, and sub-queries (a sub-query means that a plurality of small queries with different functions are nested in a complete query statement, so as to complete a writing form of complex query together). When a user queries data, firstly, the query SQL statement is used for querying a target database to obtain data to be desensitized, field information of the data to be desensitized is recorded, in order to ensure that fields are not omitted, the desensitization rule of each field under the user identity management is obtained, in order to achieve the purpose, the query SQL statement needs to be analyzed, the transfer relationship of the fields and the processing functions used by the fields are recorded in the analyzing process, and meanwhile, preset desensitization information configuration tables for different user identities are recorded. When a user queries data, preset desensitization rules are obtained according to user information and a preset desensitization information configuration table, desensitization rules which should be used by query fields in the query are determined, and desensitization processing is respectively carried out on data of different fields in a program by using respective desensitization rules. The field content needing to be transmitted by using the real value is encrypted by using a symmetric encryption mode to additionally generate an encrypted field, so that sensitive information leakage in the data transmission process can be effectively solved.
In the step of executing the query SQL statement to query and acquire the data to be desensitized corresponding to the query field, executing a data query function to acquire a plurality of lines of data to be desensitized, wherein each line of data to be desensitized at least comprises a field corresponding to the query field and corresponding field content, and performing corresponding desensitization treatment on the fields needing desensitization in the data to output the data meeting the use requirements.
In the step of analyzing the query SQL statement into a query data block statement tree, the SQL statement is analyzed into a query data block (SQLSelectStatement) statement tree by utilizing a statement analysis function in an alibaba drive source library, wherein the query data block statement tree at least comprises a layer of statement blocks, and when the query data block statement tree comprises a plurality of layers of statement blocks, the plurality of layers of statement blocks are sequentially ordered from outside to inside.
In the 'recursively inquiring statement blocks of a data block statement tree, data table information and field information are obtained according to the inquiry fields', because a plurality of layers of statement blocks are sequentially ordered from outside to inside, correspondingly, each layer of statement block comprises the inquiry fields, the data table information and the field information corresponding to the inquiry fields of each layer of statement blocks are recursively obtained, and the data table information and the field information are ordered from top to bottom according to the sequence from outside to inside. It is worth mentioning that the outermost statement block corresponds to the outermost query field.
Because the query field of the scheme may be a function field, the function field refers to a field processed by the processing function, and the field names of part of the query fields used in the data tables from different sources are different, the scheme can acquire all the field information corresponding to the query field by traversing the data table information and the field information. The field information at least comprises a field name, a field alias, whether the field is a function field or not and fields contained in the function field; the data table information at least comprises a data table name and a data table alias.
In the "analyzing the data table information and the field information to obtain the source data table and the source field corresponding to the query field", because the data table information and the field information are sorted from top to bottom according to the sequence of the statement blocks from outside to inside, the data row corresponding to the query field can be found from top to bottom until the data table in the data row is displayed correspondingly in a list of the data table names, the data table in the data row is used as the source data table, and the field name in the data row is used as the source field.
It is worth mentioning that the field transfer relationship and the field processed by the function in the query statement can be considered through the processing of the steps, so that the field can be ensured not to be omitted in the data desensitization processing process, and the field processed by the processing function can also be obtained.
In the step of acquiring the desensitization rule corresponding to the source field based on the preset desensitization information configuration table, the source field desensitization rule corresponding to the query field is searched in the desensitization information configuration table. In some embodiments, the query field corresponds to at least one source field. In the scheme, the desensitization rule of each output field is prestored in the desensitization information configuration table, and then the desensitization rule can be matched with each source field so as to ensure that the desensitization work is carried out comprehensively. In particular, when select occurs in the query SQL statement (query all fields), all field information in the data table needs to be known, since all fields under the data table are contained in the preset desensitization information configuration table, only the desensitization rule of some fields is that desensitization processing is not required.
Meanwhile, when the source field is a plurality of fields, a desensitization rule can be obtained for the source field according to the priority of the desensitization rule or the merge policy of the specified desensitization rule.
The desensitization treatment of the data to be desensitized comprises the following steps: when desensitization processing is carried out on data to be desensitized, desensitization rules are defined in advance according to use requirements, and how to carry out desensitization conversion on the data to be desensitized is predefined in program processing, each field of the data to be desensitized obtained by traversing query is processed by using the desensitization rule of the field, and desensitization work of the data to be desensitized is completed.
In another embodiment, in order to compromise the effect of desensitized data transfer, at this point, the data query request includes at least a list of transfer fields for recording fields that require value transfer. Correspondingly, the scheme comprises the following steps after the data table information and the field information are analyzed to obtain the source data table and the source field corresponding to the query field: "generating encrypted transmission data by using data encryption for the field needing value transmission according to the transmission field list".
Specifically, the source field of the field requiring value transfer is encrypted, additional encrypted transfer data is produced, the encrypted transfer data is transferred according to the transfer rule, and the encrypted transfer data is decrypted if the value is required to be used. In the embodiment of the scheme, a symmetric encryption processing technology is adopted to encrypt the source field.
In some embodiments, if the field requiring value transfer is a field requiring desensitization processing, desensitization processing and encryption processing are performed on the field at the same time.
In the specific application scenario, desensitization and transmission of data are simultaneously realized, data to be displayed is personnel information, and the method aims at common users: displaying the name, the identification number, the mobile phone number, the birthday, the school calendar and the address of the personnel information, wherein the identification number and the mobile phone number need desensitization treatment and then are displayed, and meanwhile, the identification number of the personnel information needs to be transmitted; for advanced users: the identification number of the personnel information and the mobile phone number do not need desensitization treatment.
The information of the personnel to be displayed is stored in a user information table user _ info and a user address table user _ address according to the information content, wherein a name user _ name, an identity card number id _ card, a mobile phone number phone, a birthday and a school calendar are stored in the user information table user _ info, and the address is stored in the user address table user _ address.
Correspondingly, the data desensitization method based on SQL statement analysis comprises the following steps:
aiming at the specific application scene, at least recording in a preset desensitization information configuration table: the ordinary user adopts a rule 1 for the ID card number id _ card in the user information table user _ info and a rule 2 for the mobile phone number phone in the user information table user _ info; the senior user does not adopt desensitization rules for the ID card number id _ card and the phone number phone in the user information table user _ info, the desensitization information configuration table is shown as table one, the common user id is 1, and the senior user id is 2:
table one: desensitization information configuration representation table
User id Table name Name of field Rule of desensitization
1 user_info user_name Non-desensitization
1 user_info id_card Rule 1
1 user_info phone Rule 2
1 user_info birth Non-desensitization
1 user_info education Non-desensitization
1 user_address city_name Non-desensitization
1 user_address detail Rule 3
2 user_info user_name Non-desensitization
2 user_info id_card Non-desensitization
2 user_info phone Non-desensitization
2 user_info birth Non-desensitization
2 user_info education Non-desensitization
2 user_address city_name Non-desensitization
2 user_address detail Non-desensitization
"obtain a data query request of a user, the data query request at least comprising a query SQL statement including at least one query field and a delivery field list": in this specific application scenario, the query field in the query SQL statement includes: name, identification card number card, phone number phone, birthday birth, education and address;
illustratively, the query SQL statement is:
select name,card,phone,birth,education,address from(
select u.user_name as name,u.id_card as card,
u.phone,u.birth,u.education,concat(a.city_name,”,a.detail)as address
from user_info u
left join user_address a on u.id_card=a.id_card
)info
the outermost fields are name, identification number card, phone number phone, birthday, education and address, and SQL is written by using a sub-query mode for describing the transfer relationship of the fields.
"execute the query SQL statement to obtain the data to be desensitized corresponding to the query field": in the specific application scenario, SQL statement query is executed to obtain multiple lines of data to be desensitized, and each line of data to be desensitized consists of a field corresponding to a query field and field content.
"analyze the query SQL statement as a query data block statement tree, recurse the statement blocks of the query data block statement tree, and obtain data table information and field information according to query fields": in the specific application scenario, a statement parsing function in an alibaba dry source library is used, an SQL statement is parsed into a query data block statement tree, the query data block statement tree includes at least one query data block statement tree, and the query data block statement block includes a query field and data table information. Specifically, data table information and field information of query fields of a query data block statement tree of each layer are continuously acquired in a recursive manner, and the query fields and the data table information of each layer are recorded, wherein the field information includes a field name, a field alias, a statement block to which the statement block belongs, whether the statement block is a function field, a field which is only included in the function field, and the data table information includes a data table name and a data table alias, and the data table information and the field information are sequentially ordered from top to bottom in a recursive order, as shown in table two for example:
table two: example table of data table information and field information
Figure BDA0002982819900000101
Figure BDA0002982819900000111
Correspondingly, the SQL statement content corresponding to the statement block marked as AA in the table above is as follows:
{select name,card,phone,birth,education,address from(
select u.user_name as name,u.id_card as card,
u.phone,u.birth,u.education,concat(a.city_name,”,a.detail)as address
from user_info u
left join user_address a on u.id_card=a.id_card
)info
}
correspondingly, the SQL statement content corresponding to the statement block identified as BB in the table above is as follows:
{selectu.user_name as name,u.id_card as card,
u.phone,u.birth,u.education,concat(a.city_name,”,a.detail)as address
from user_info u
left join user_address a on u.id_card=a.id_card
}
correspondingly, the SQL statement content corresponding to the statement block marked CC in the table above is as follows:
{user_info LEFT JOIN user_address ON u.id_card=a.id_card
}
"analyze data table information and field information to find a source data table and a source field corresponding to a query field": in the specific application scenario, the AA/BB/CC statement blocks are SQL statement blocks rather than data tables, and the final source field is found for each outermost field from top to bottom of the outermost field, specifically: and finding the row of the field name or the field alias, and downwards searching the row of the data table which is the corresponding table so as to obtain the source field of the query field and the source data table thereof.
The origin of the fields is analyzed separately by taking the card field and the address field in the outermost layer as an example.
The outermost card field comes from BB, the data table with the field alias of the field id _ card found in the row with the statement block BB is CC data table alias u, and then the CC data table with the field alias including two tables user _ info data table alias u and user _ address data table alias a is found, and the source table is user _ info through the field alias, so the outermost card field is derived from the id _ card field of the user _ info data table, the source field is "id _ card" at this time, and the source data table is "user _ info".
The address field of the outermost layer is a function field and comprises two fields of a.city _ name and a.detail source table as CC, searching is carried out downwards, the CC is found to comprise two data tables of user _ info, the alias of the data tables is u and the alias of the data tables of the user _ address is a, the source of the city _ name and the detail field contained in the address can be determined to be a user _ address data table through the alias of the data tables, at the moment, the source fields are 'city _ name' and 'detail', and the source data table is 'user _ address'. This example illustrates the case if the field is a function field.
It is worth noting that if select occurs in the SQL statement (indicating that all fields are acquired), we now consider this as a function field whose source field is all the fields in the data table. The final source field and source data table are found for the query field in the same manner.
The contents of the example table of the source field and the source data table are obtained by sorting, as shown in table three:
table three source field and example table of source data table
Figure BDA0002982819900000131
"obtain the desensitization rule corresponding to the source field based on the preset desensitization information configuration table": in this specific application scenario, a table four is obtained according to the source field of each query field in combination with the desensitization rule configured in the desensitization information configuration table, which is exemplified as follows:
table desensitization rule table with four query fields
Figure BDA0002982819900000132
Figure BDA0002982819900000141
It should be noted that since select may occur in the SQL statement (query all fields), all field information in the data table needs to be known, in this embodiment, all fields in the data table are included in the desensitization rule setting, and only the desensitization rule of some fields does not need desensitization processing.
In some embodiments, a desensitization rule may be derived for a source field for multiple fields according to the priority of the desensitization rule or a merge policy that specifies desensitization rules for the field.
It should be noted that if select (query all fields) occurs in the SQL statement, the field corresponds to one or more fields of the query field, because all fields in the data table are included in the desensitization rule obtained according to the data table information, the field name corresponding to the field can be determined by comparing the output result field with the outermost field until the field is not the field, and then the desensitization rule can be determined by searching the obtained desensitization rule. From this point, a desensitization rule for each field of the data result field may be found for that field.
In the specific application scenario, the field needing value transfer is a card field, and the card field is desensitized under the current user by using desensitization rule 1, and the card field can be encrypted by using a symmetric encryption processing technology to obtain encrypted transfer data, and the encrypted transfer data is decrypted when needed to be used.
"desensitize the data to be desensitized according to the desensitization rule corresponding to the query field": because the desensitization rules of all fields needing desensitization are obtained in the steps, when desensitization processing is carried out on data, the desensitization rules need to be defined in advance according to use requirements, and how each rule carries out desensitization conversion on the data is predefined in program processing, each field of the data obtained by traversing inquiry is processed by using the desensitization rule of the field, and desensitization work of the data is completed.
"show desensitization data after desensitization treatment": in the specific application scenario, the desensitized data is output to the query request, the query request displays the data as needed, and the display result is shown in table five. As shown in table five, the mobile phone numbers, the identity card information and the addresses of zhang san, lie si and wang wu are desensitized and displayed.
Watch five display
Serial number Name (I) Identity card number Mobile phone number Address Date of birth
1 Zhang San 110******* 189****1234 Chao Jing city zhang district 2001-01-03
2 Li Si 120******* 139****1234 Chao Jing city zhang district 2001-01-03
3 Wang Wu 130******* 138****1234 Binjiang district of Hangzhou City 2001-01-03
In a third aspect, the present disclosure provides a data desensitization device based on SQL statement analysis, where the data desensitization device based on SQL statement analysis includes:
the system comprises a request acquisition unit, a query processing unit and a query processing unit, wherein the request acquisition unit is used for acquiring a data query request of a user, and the data query request comprises a query SQL statement containing a query field;
the query execution unit is used for executing the query SQL statement query to acquire the data to be desensitized corresponding to the query field;
the analysis unit is used for analyzing the query SQL statement into a query data block statement tree, recursing statement blocks of the query data block statement tree, and acquiring data table information and field information according to query fields;
the analysis unit is used for analyzing the data table information and the field information to obtain a source data table and a source field corresponding to the query field;
the matching unit is used for acquiring desensitization rules corresponding to the source fields based on the preset desensitization information configuration table, wherein the desensitization rules of users for different source data tables and the source fields are stored in the preset desensitization information configuration table;
and the desensitization unit is used for desensitizing the data to be desensitized.
In some embodiments, the data query request acquired in the request acquisition unit includes a delivery field list for recording fields requiring value delivery. At this time, the data desensitization device based on SQL statement parsing at least comprises an encryption unit, which is used for generating encrypted transmission data by using data encryption on fields needing value transmission according to a transmission field list.
It should be noted that the data desensitization device based on SQL statement analysis provided in the present disclosure performs desensitization processing on data according to the data desensitization method based on SQL statement analysis mentioned in the first aspect, and specific contents of the present disclosure are not described herein in too much detail.
Referring to fig. 5, the embodiment further provides an electronic apparatus, which includes a memory 304 and a processor 302, where the memory 304 stores a computer program, and the processor 302 is configured to execute the computer program to perform the steps in any of the above embodiments of the data desensitization method based on SQL statement parsing.
Specifically, the processor 302 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
Memory 304 may include, among other things, mass storage 304 for data or instructions. By way of example, and not limitation, memory 304 may include a hard disk drive (hard disk drive, HDD for short), a floppy disk drive, a solid state drive (SSD for short), flash memory, an optical disk, a magneto-optical disk, tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 304 may include removable or non-removable (or fixed) media, where appropriate. The memory 304 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 304 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 304 includes Read-only memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or FLASH memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random Access Memory (SRAM) or a Dynamic Random Access Memory (DRAM), where the DRAM may be a fast page mode dynamic random access memory 304 (FPMDRAM), an Extended Data Out Dynamic Random Access Memory (EDODRAM), a Synchronous Dynamic Random Access Memory (SDRAM), or the like.
Memory 304 may be used to store or cache various data files for processing and/or communication purposes, as well as possibly computer program instructions for execution by processor 302.
The processor 302 realizes the behavior analysis method of any solitary old man or the behavior analysis method of solitary old man in the above-described embodiments by reading and executing the computer program instructions stored in the memory 304.
Optionally, the electronic apparatus may further include a transmission device 306 and an input/output device 308, where the transmission device 306 is connected to the processor 302, and the input/output device 308 is connected to the processor 302.
The transmitting device 306 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wired or wireless network provided by a communication provider of the electronic device. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 306 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The input/output device 308 is used to input or output information. For example, the input/output device may be a display screen, a mouse, a keyboard, or other devices. In this embodiment, the input device is used to input the acquired information, the input information may be a user query request, and the output information may be data desensitized by a desensitization rule, or the like.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A data desensitization method based on SQL statement parsing is characterized by comprising the following steps:
acquiring a data query request of a user, wherein the data query request comprises a query SQL statement containing a query field;
executing the query SQL statement query to acquire data to be desensitized corresponding to the query field;
analyzing the query SQL statement into a query data block statement tree, wherein when the query data block statement tree comprises multiple layers of statement blocks, the multiple layers of statement blocks are sequentially ordered from outside to inside;
recursively acquiring data table information and field information corresponding to query fields of statement blocks on each layer, and sequencing the data table information and the field information from top to bottom according to the sequence of the statement blocks from outside to inside; finding the data row corresponding to the query field from top to bottom until a row of data table names in the data row corresponds to the display data table, taking the data table in the data row as a source data table, and taking the field name in the data row as a source field;
the method comprises the steps of obtaining a desensitization rule corresponding to a source field based on a preset desensitization information configuration table, defining the desensitization rule in advance according to use requirements when desensitization processing is carried out on data to be desensitized, predefining how each rule carries out desensitization conversion on the data to be desensitized in program processing, traversing each field of the data to be desensitized obtained through query and using the desensitization rule of the field for processing, wherein the desensitization rules of users for different source data tables and source fields are stored in the preset desensitization information configuration table.
2. The method of claim 1, wherein the query field corresponds to at least one source field.
3. The data desensitization method based on SQL statement parsing of claim 2, characterized in that when the query fields correspond to at least two source fields, the corresponding desensitization rules are obtained according to the priority of the desensitization rules of the source fields or the merging strategy of the assigned desensitization rules.
4. The SQL statement parsing-based data desensitization method according to claim 1, wherein the data query request includes a list of transfer fields for recording fields that require value transfer.
5. The method of claim 4, wherein analyzing the data table information and field information to obtain a source data table and source fields corresponding to the query fields comprises: "generate encrypted transmission data by using data encryption for the field needing value transmission in the data to be desensitized according to the transmission field list".
6. The method for data desensitization based on SQL statement parsing of claim 1, wherein the field information comprises at least a field name, a field alias, whether it is a function field, and the fields contained within the function field; the data table information at least comprises a data table name and a data table alias.
7. A data desensitization device based on SQL statement parsing is characterized by comprising:
the system comprises a request acquisition unit, a query processing unit and a query processing unit, wherein the request acquisition unit is used for acquiring a data query request of a user, and the data query request comprises a query SQL statement containing a query field;
the query execution unit is used for executing the query SQL statement query to acquire the data to be desensitized corresponding to the query field;
a parsing unit for parsing the query SQL statement into a query data block statement tree,
when the query data block statement tree comprises a plurality of layers of statement blocks, the plurality of layers of statement blocks are sequentially ordered from outside to inside, data table information and field information corresponding to query fields of each layer of statement blocks are recursively obtained, and the data table information and the field information are ordered from top to bottom according to the sequence of the statement blocks from outside to inside;
the analysis unit is used for finding the data rows corresponding to the query fields from top to bottom until a row of names of the data tables in the data rows corresponds to the display data table, taking the data tables in the data rows as a source data table, and taking the field names in the data rows as source fields;
the matching unit is used for acquiring desensitization rules corresponding to the source fields based on a preset desensitization information configuration table, wherein the desensitization rules of users for different source data tables and the source fields are stored in the preset desensitization information configuration table;
and the desensitization unit is used for defining desensitization rules in advance according to use requirements and predefining how each rule performs desensitization conversion on the data to be desensitized in program processing when desensitization processing is performed on the data to be desensitized, traversing each field of the data to be desensitized obtained by query and performing processing by using the desensitization rule of the field.
8. The data desensitization apparatus based on SQL statement parsing of claim 7, comprising an encryption unit, wherein the data query request includes a list of transfer fields for recording fields requiring value transfer, and the encryption unit is configured to generate encrypted transfer data by using data encryption for the fields requiring value transfer according to the list of transfer fields.
9. An electronic device, comprising: a memory in which is stored a computer program and a processor arranged to run the computer program to perform the method of the apparatus for desensitizing data based on SQL statement parsing of any of the preceding claims 1 to 6.
CN202110291401.7A 2021-03-18 2021-03-18 Data desensitization method and device based on SQL statement analysis Active CN112989412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110291401.7A CN112989412B (en) 2021-03-18 2021-03-18 Data desensitization method and device based on SQL statement analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110291401.7A CN112989412B (en) 2021-03-18 2021-03-18 Data desensitization method and device based on SQL statement analysis

Publications (2)

Publication Number Publication Date
CN112989412A CN112989412A (en) 2021-06-18
CN112989412B true CN112989412B (en) 2022-09-20

Family

ID=76334393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110291401.7A Active CN112989412B (en) 2021-03-18 2021-03-18 Data desensitization method and device based on SQL statement analysis

Country Status (1)

Country Link
CN (1) CN112989412B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535754A (en) * 2021-07-27 2021-10-22 杭州海康威视数字技术股份有限公司 Data access method, device and system
CN113656830B (en) * 2021-08-06 2024-03-26 杭州安恒信息技术股份有限公司 Database desensitization grammar parsing method, system, computer and readable storage medium
CN113704306B (en) * 2021-08-31 2024-01-30 上海观安信息技术股份有限公司 Database data processing method and device, storage medium and electronic equipment
CN113901513A (en) * 2021-09-30 2022-01-07 四川新网银行股份有限公司 Dynamic data desensitization method based on blood margin analysis
CN114003953A (en) * 2021-10-29 2022-02-01 平安科技(深圳)有限公司 Data processing method, device and storage medium
CN114021199A (en) * 2022-01-04 2022-02-08 北京安华金和科技有限公司 Sensitive data discovery method and device based on predetermined rules
CN116502273B (en) * 2023-06-25 2023-09-05 中科金瑞(北京)大数据科技有限公司 Dynamic data desensitization method, device and equipment based on data blood edges
CN117370620B (en) * 2023-12-08 2024-04-05 广东航宇卫星科技有限公司 Data blood margin construction method and device, terminal equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017088683A1 (en) * 2015-11-24 2017-06-01 阿里巴巴集团控股有限公司 Data desensitization method and system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885876A (en) * 2017-11-29 2018-04-06 北京安华金和科技有限公司 A kind of dynamic desensitization method rewritten based on SQL statement
CN109902514A (en) * 2019-03-07 2019-06-18 杭州比智科技有限公司 A kind of data desensitization control system, method, server and storage medium
CN111159754A (en) * 2019-12-12 2020-05-15 浙江华云信息科技有限公司 Data desensitization method and device for reverse analysis
CN111125758A (en) * 2019-12-19 2020-05-08 北京安华金和科技有限公司 Dynamic desensitization method based on full syntax tree analysis
CN111858728A (en) * 2020-06-29 2020-10-30 国家计算机网络与信息安全管理中心 Data extraction method, device and equipment for different data sources and storage medium
CN112417476A (en) * 2020-11-24 2021-02-26 广州华熙汇控小额贷款有限公司 Desensitization method and data desensitization system for sensitive data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017088683A1 (en) * 2015-11-24 2017-06-01 阿里巴巴集团控股有限公司 Data desensitization method and system

Also Published As

Publication number Publication date
CN112989412A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN112989412B (en) Data desensitization method and device based on SQL statement analysis
JP5575902B2 (en) Information retrieval based on query semantic patterns
US8826370B2 (en) System and method for data masking
US8862566B2 (en) Systems and methods for intelligent parallel searching
US20210004373A1 (en) Facilitating queries of encrypted sensitive data via encrypted variant data objects
US10176227B2 (en) Managing a search
US9195744B2 (en) Protecting information in search queries
US8924373B2 (en) Query plans with parameter markers in place of object identifiers
CN109783543B (en) Data query method, device, equipment and storage medium
CA2850672C (en) Entity resolution
CN111258966A (en) Data deduplication method, device, equipment and storage medium
CN104281672A (en) Log data processing method and device
KR20200104789A (en) Method, apparatus, device and medium for storing and querying data
CN105183884A (en) Search engine system and method based on big data technique
US8805848B2 (en) Systems, methods and computer program products for fast and scalable proximal search for search queries
WO2021115474A1 (en) Data search method, apparatus, computer device, and storage medium
US9323798B2 (en) Storing a key value to a deleted row based on key range density
Dašić et al. Service models for cloud computing: Search as a service (SaaS)
CN112650890A (en) Graph database-based encrypted currency flow direction tracking method and device
US20220335156A1 (en) Dynamic Data Dissemination Under Declarative Data Subject Constraint
CN105447342B (en) script encryption method, decryption method and engine
CN103036726A (en) Method and device for network user management
Mittal et al. Privacy preserving synonym based fuzzy multi-keyword ranked search over encrypted cloud data
CN110515979B (en) Data query method, device, equipment and storage medium
WO2023040530A1 (en) Webpage content traceability method, knowledge graph construction method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant