CN113779616B - Method and device for identifying data - Google Patents

Method and device for identifying data Download PDF

Info

Publication number
CN113779616B
CN113779616B CN202110180902.8A CN202110180902A CN113779616B CN 113779616 B CN113779616 B CN 113779616B CN 202110180902 A CN202110180902 A CN 202110180902A CN 113779616 B CN113779616 B CN 113779616B
Authority
CN
China
Prior art keywords
data
application
codes
service
sensitive data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110180902.8A
Other languages
Chinese (zh)
Other versions
CN113779616A (en
Inventor
李长伟
方城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110180902.8A priority Critical patent/CN113779616B/en
Publication of CN113779616A publication Critical patent/CN113779616A/en
Application granted granted Critical
Publication of CN113779616B publication Critical patent/CN113779616B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2141Access rights, e.g. capability lists, access control lists, access tables, access matrices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Storage Device Security (AREA)

Abstract

The application discloses a method and a device for identifying data, and the specific implementation scheme is as follows: responding to a received user access request sent by a client, analyzing the user access request to obtain a user identification, an access address and various service applications provided for the request; generating a link tracking identifier based on the access address and each service application; based on the link tracking identification, application information of each service application is obtained from a database, wherein the application information comprises: the interface information and the execution code are updated in advance by utilizing a link tracking identifier to obtain a database; generating authority resource codes of all service applications based on the user identification and the interface information of all service applications; and identifying the authority resource codes of each service application to obtain each sensitive data represented by each authority resource code and the data type corresponding to each sensitive data. The scheme realizes a data identification method for automatically identifying the sensitive data and the sensitive data type represented by the authority resource codes.

Description

Method and device for identifying data
Technical Field
Embodiments of the present application relate to the field of computer technology, and in particular, to the field of data processing technology, and in particular, to a method and apparatus for identifying data.
Background
The authority platform is a standard role-based access control (RBAC) authority management platform, and the service requiring authority control is applied to the authority platform for registration, application of authority resource codes and authority management of the resource codes through roles.
At present, the authority platform cannot identify which sensitive data are represented behind the authority resource code, and the authority approver does not have approval basis and can only judge by the application reason written by the applicant and own experience, so that the authority management and the examination are difficult. The URL in the flow, namely which back-end application is related to the access address and which authority resource code is corresponding to the back-end application, is not known through the http message flow data, the asset positioning is difficult, the sensitive data is fuzzy, and the key protection is difficult to achieve.
Disclosure of Invention
The application provides a method, a device, equipment and a storage medium for identifying data.
According to a first aspect of the present application, there is provided a method for identifying data, the method comprising: responding to a user access request sent by a client, and analyzing the user access request to obtain a user identification, an access address and various service applications provided for the request corresponding to the request; generating a link tracking identifier corresponding to the access identifier and each service application based on the access address and each service application, wherein the link tracking identifier is used for representing the association relationship between the access address and each service application; based on the link tracking identification, application information of each service application is obtained from a database, wherein the application information comprises: the interface information and the execution code are updated in advance by utilizing a link tracking identifier to obtain a database; generating authority resource codes of all service applications corresponding to all interface information based on the user identification and the interface information of all service applications, wherein the authority resource codes are used for representing resource information used for carrying out authority verification on the request; and identifying the authority resource codes of each service application to obtain each sensitive data represented by each authority resource code and the data type corresponding to each sensitive data, wherein the identification is used for comparing the execution code of each service application with the classified data aiming at the sensitive data in the metadata of the database.
In some embodiments, the update process of the database is as follows: binding the link tracking identifier with the acquired application information corresponding to each service application, and generating interface information corresponding to each service application after binding and execution codes corresponding to each service application after binding; and updating the database based on the interface information corresponding to each service application after binding and the execution codes corresponding to each service application after binding.
In some embodiments, binding the link tracking identifier with the acquired application information corresponding to each service application, and generating interface information corresponding to each service application after binding and execution code corresponding to each service application after binding, including: based on a link tracking technology, binding the link tracking identification with application information corresponding to each service application, and generating interface information corresponding to each service application after binding and execution codes corresponding to each service application after binding, wherein the link tracking technology is used for representing embedding points at corresponding positions of the application information corresponding to each service application by utilizing an embedding point technology.
In some embodiments, identifying the rights resource codes of each service application to obtain each sensitive data represented by each rights resource code and a data type corresponding to each sensitive data, where the identifying includes: extracting the execution codes of each service application to obtain a characteristic data set corresponding to the execution codes of each service application; and identifying the authority resource codes of each service application to obtain each sensitive data represented by each authority resource code and the data type corresponding to each sensitive data, wherein the identification is used for comparing the characteristic data in each characteristic data set with the classified data of the database metadata aiming at the sensitive data.
In some embodiments, identifying the rights resource codes of each service application to obtain each sensitive data represented by each rights resource code and a data type corresponding to each sensitive data, where the identifying includes: and inputting the authority resource codes of each service application into a data identification model obtained through training, and generating each sensitive data represented by each authority resource code and a data type corresponding to each sensitive data, wherein the data identification model is used for representing whether the sensitive data and the data type of the sensitive data exist in the data represented by the authority resource codes or not.
In some embodiments, the method further comprises: and sending the link tracking identification to the client.
In some embodiments, the method further comprises: and optimizing the authority examination strategy based on the correlation between each sensitive data represented by each authority resource code and the authority examination.
According to a second aspect of the present application, there is provided an apparatus for identifying data, the apparatus comprising: the first acquisition unit is configured to respond to receiving a user access request sent by the client side, and acquire a user identifier, an access address and various service applications provided for the request corresponding to the request; the first generation unit is configured to generate link tracking identifiers corresponding to the access identifiers and the service applications based on the access addresses and the service applications, wherein the link tracking identifiers are used for representing association relations between the access addresses and the service applications; a second obtaining unit configured to obtain application information of each service application from the database based on the link tracking identifier, wherein the application information includes: the interface information and the execution code are updated in advance by utilizing a link tracking identifier to obtain a database; the second generation unit is configured to generate authority resource codes of all service applications corresponding to all interface information based on the user identification and the interface information of all service applications, wherein the authority resource codes are used for representing resource information used for carrying out authority verification on the request; the data identification unit is configured to identify authority resource codes of all service applications to obtain all sensitive data represented by the authority resource codes and data types corresponding to the sensitive data, wherein the identification unit is used for comparing the execution codes of all service applications with classified data aiming at the sensitive data in the metadata of the database.
In some embodiments, the database update process is accomplished by the following modules: the generation module is configured to bind the link tracking identification with the acquired application information corresponding to each service application, and generate interface information corresponding to each bound service application and execution codes corresponding to each bound service application; and the updating module is configured to update the database based on the interface information corresponding to each service application after binding and the execution code corresponding to each service application after binding.
In some embodiments, the generation module is further configured to bind the link tracking identifier with application information corresponding to each service application based on a link tracking technique, where the link tracking technique is used to characterize embedding points at respective locations of the application information corresponding to each service application using a embedding point technique.
In some embodiments, the data identification unit comprises: the extraction module is configured to extract the execution codes of the service applications to obtain feature data sets corresponding to the execution codes of the service applications; the identification module is configured to identify authority resource codes of all service applications to obtain all sensitive data represented by all authority resource codes and data types corresponding to all the sensitive data, wherein the identification module is used for comparing the characteristic data in all the characteristic data sets with the classified data of the database metadata aiming at the sensitive data.
In some embodiments, the data identification unit is further configured to input the rights resource codes of the respective service applications into the trained data identification model, and generate respective sensitive data represented by the respective rights resource codes and data types corresponding to the respective sensitive data, where the data identification model is used for representing whether the sensitive data and the data types of the sensitive data exist in the data represented by the rights resource codes.
In some embodiments, the apparatus further comprises: and the sending unit is configured to send the link tracking identification to the client.
In some embodiments, the apparatus further comprises: and the optimizing unit is configured to optimize the authority examination strategy based on the correlation between each sensitive data characterized by each authority resource code and the authority examination.
According to a third aspect of the present application, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.
According to a fourth aspect of the present application there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method as described in any of the implementations of the first aspect.
According to the technology, a user access request sent by a client is received, the user access request is analyzed to obtain a user identification corresponding to the request, an access address and various service applications provided for the request, a link tracking identification corresponding to the access identification and the various service applications is generated based on the access address and the various service applications, and application information of the various service applications is obtained from a database based on the link tracking identification, wherein the application information comprises: the method comprises the steps that interface information and execution codes are updated in advance by a database, authority resource codes of all service applications corresponding to all interface information are generated based on user identification and interface information of all service applications, the authority resource codes of all service applications are identified, all sensitive data represented by all the authority resource codes and data types corresponding to all the sensitive data are obtained, wherein the identification is used for comparing the execution codes of all the service applications with classified data aiming at the sensitive data in metadata of the database, the problems that in the prior art, the sensitive data represented behind the authority resource codes cannot be identified, authority approvers do not have approval basis, authority management and examination are difficult are solved, and the problems that through which rear end application related to access addresses in flow is not known by http message flow data, and which of the authority resource codes corresponding to the rear end application is fuzzy in positioning of assets, and important protection is difficult to achieve are avoided. By generating the link tracking identifier and updating the corresponding application information in the database, the data blood-edge relationship of the whole data chain from the user to the front-end application, the front-end application to the back-end application, the back-end service room and the back-end application to the database is obtained through combining the traffic data perspective, and the data identification method for automatically identifying the sensitive data and the sensitive data types represented by the authority resource codes according to the data blood-edge relationship is realized.
It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application.
FIG. 1 is a schematic diagram of a first embodiment of a method for identifying data according to the present application;
FIG. 2 is a scene graph of a method for identifying data in which embodiments of the present application may be implemented;
FIG. 3 is a schematic diagram of a second embodiment of a method for identifying data according to the present application;
FIG. 4 is a schematic structural view of one embodiment of an apparatus for identifying data according to the present application;
fig. 5 is a block diagram of an electronic device for implementing a method for identifying data according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows a schematic diagram 100 of a first embodiment of a method for identifying data according to the present application. The method for identifying data comprises the following steps:
step 101, in response to receiving a user access request sent by a client, the user access request is analyzed to obtain a user identifier, an access address and each service application provided for the request corresponding to the request.
In this embodiment, when the executing body (for example, the rights management platform) receives the user access request sent by the client, the executing body may parse the user access request to obtain the user identifier, the access address, and each service application provided for the request corresponding to the request. The access address may be access information such as URL. The application is used to characterize the service software provided for the request, and the application may be an application of the respective service provided for the request.
Step 102, based on the access address and each service application, generating a link tracking identifier corresponding to the access identifier and each service application.
In this embodiment, the executing body may randomly generate a unique link tracking identifier corresponding to the access identifier and each service application according to the access address and each service application acquired in step 101. The link tracking identifier is used for representing the association relation between the access address and each service application, namely, through the link tracking identifier, the client can see which back-end service application is called.
And step 103, acquiring application information of each service application from the database based on the link tracking identification.
In this embodiment, the execution body may acquire application information of each service application from the updated database based on the link tracking identifier generated in step 102. The application information at least comprises: the interface information and the execution code are updated in advance by the database by using the link tracking identification. The execution code may be SQL code.
Further, the update process of the database is as follows: binding the link tracking identifier with the acquired application information corresponding to each service application, and generating interface information corresponding to each service application after binding and execution codes corresponding to each service application after binding; and updating the database based on the interface information corresponding to each service application after binding and the execution codes corresponding to each service application after binding. The corresponding application information is queried through the link tracking identification, the data blood-edge relation of the back end applied to the database is established, and the establishment of the data blood-edge relation of the complete data chain is realized.
And 104, generating authority resource codes of the service applications corresponding to the interface information based on the user identification and the interface information of the service applications.
In this embodiment, the execution body may generate, according to the user identifier and the interface information of each service application obtained in step 101, the rights resource code of each service application corresponding to each interface information according to the resource code generation rule. The rights resource code is used to characterize the resource information used to carry out rights verification on the request. The resource codes associate the user with the backend service application through the role information.
And 105, identifying the authority resource codes of the service applications to obtain all the sensitive data represented by the authority resource codes and the data types corresponding to the sensitive data.
In this embodiment, the execution body may compare the execution code of each service application with the classified data for the sensitive data in the metadata of the database, to obtain each sensitive data represented by each rights resource code and the data type corresponding to each sensitive data. The sensitive data may include: the data types of the sensitive data can be obtained by classifying the sensitive data in various modes, such as mobile phone numbers, mailboxes, identity cards, addresses, bank cards and other preset information.
In some optional implementations of this embodiment, identifying the rights resource codes of each service application to obtain each sensitive data represented by each rights resource code and a data type corresponding to each sensitive data includes: extracting the execution codes of each service application to obtain a characteristic data set corresponding to the execution codes of each service application; identifying authority resource codes of all service applications to obtain all sensitive data represented by all authority resource codes and data types corresponding to all the sensitive data, wherein the identifying is used for comparing the characteristic data in all the characteristic data sets with the classified data of the database metadata aiming at the sensitive data, and the characteristics can comprise: SQL related information such as fields, tables, library information, etc. Accurate and comprehensive data identification is realized.
With continued reference to fig. 2, the method 200 for identifying data of the present embodiment operates in an electronic device 201. When the electronic device 201 receives a user access request sent by a client, the user access request is analyzed to obtain a user identifier and an access address corresponding to the request, each service application 202 is provided for the request, then the electronic device 201 generates a link tracking identifier 203 corresponding to the access identifier and each service application based on the access address and each service application, then the electronic device 201 obtains application information 204 of each service application from a database based on the link tracking identifier, the electronic device 201 generates authority resource codes 205 of each service application corresponding to each interface information based on the user identifier and interface information of each service application, and finally the electronic device 201 identifies the authority resource codes of each service application to obtain each sensitive data represented by each authority resource code and data types 206 corresponding to each sensitive data. The link tracking identifier is used for representing the association relation between the access address and each service application, and the application information comprises: the interface information and the execution codes are updated in advance by the database through the link tracking identification, and the identification is used for comparing the execution codes of all service applications with the classification data aiming at sensitive data in the metadata of the database.
The method for identifying data provided in the foregoing embodiment of the present application uses a method for analyzing a user access request in response to receiving a user access request sent by a client to obtain a user identifier corresponding to the request, an access address, and each service application provided for the request, generates a link tracking identifier corresponding to the access identifier and each service application based on the access address and each service application, and obtains application information of each service application from a database based on the link tracking identifier, where the application information includes: the method comprises the steps that interface information and execution codes are updated in advance by a database, authority resource codes of all service applications corresponding to all interface information are generated based on user identification and interface information of all service applications, the authority resource codes of all service applications are identified, all sensitive data represented by all the authority resource codes and data types corresponding to all the sensitive data are obtained, wherein the identification is used for comparing the execution codes of all the service applications with classified data aiming at the sensitive data in metadata of the database, the problems that in the prior art, the sensitive data represented behind the authority resource codes cannot be identified, authority approvers do not have approval basis, authority management and examination are difficult are solved, and the problems that through which rear end application related to access addresses in flow is not known by http message flow data, and which of the authority resource codes corresponding to the rear end application is fuzzy in positioning of assets, and important protection is difficult to achieve are avoided. By generating the link tracking identifier and updating the corresponding application information in the database, the data blood-edge relationship of the whole data chain from the user to the front-end application, the front-end application to the back-end application, the back-end service room and the back-end application to the database is obtained through combining the traffic data perspective, and the data identification method for automatically identifying the sensitive data and the sensitive data types represented by the authority resource codes according to the data blood-edge relationship is realized.
With further reference to fig. 3, a schematic diagram 300 of a second embodiment of a method for identifying data is shown. The flow of the method comprises the following steps:
in step 301, in response to receiving a user access request sent by a client, the user access request is parsed to obtain a user identifier, an access address and each service application provided for the request corresponding to the request.
Step 302, based on the access address and each service application, generating a link tracking identifier corresponding to the access identifier and each service application, and transmitting the link tracking identifier to the client.
In this embodiment, the executing body may randomly generate a unique link tracking identifier corresponding to the access identifier and each service application according to the access address and each service application acquired in step 301, and send the link tracking identifier to the client. The link tracking identifier is used for representing the association relation between the access address and each service application, namely, through the link tracking identifier, the client can see which back-end service application is called.
And step 303, acquiring application information of each service application from the database based on the link tracking identification.
In this embodiment, the executing body may acquire the application information of each service application from the updated database based on the link tracking identifier generated in step 302. The application information at least comprises: the interface information and the execution code are updated in advance by the database by using the link tracking identification. The updating process of the database is as follows: binding the link tracking identifier with the acquired application information corresponding to each service application, and generating interface information corresponding to each service application after binding and execution codes corresponding to each service application after binding; and updating the database based on the interface information corresponding to each service application after binding and the execution codes corresponding to each service application after binding. The corresponding application information is queried through the link tracking identification, the data blood-edge relation of the back end applied to the database is established, and the establishment of the data blood-edge relation of the complete data chain is realized.
In some optional implementations of this embodiment, binding the link tracking identifier with the acquired application information corresponding to each service application, and generating interface information corresponding to each service application after binding and execution code corresponding to each service application after binding includes: based on a link tracking technology, binding the link tracking identification with application information corresponding to each service application, and generating interface information corresponding to each service application after binding and execution codes corresponding to each service application after binding, wherein the link tracking technology is used for representing embedding points at corresponding positions of the application information corresponding to each service application by utilizing an embedding point technology. The link tracking technique may be a SDK-integrated Java Agent-invoked link tracking technique. The information binding is realized simply and quickly.
Step 304, generating authority resource codes of each service application corresponding to each interface information based on the user identification and the interface information of each service application.
And 305, inputting the authority resource codes of each service application into the data identification model obtained through training, and generating each sensitive data represented by each authority resource code and the data type corresponding to each sensitive data.
In this embodiment, the execution body may input the rights resource codes of each service application to the data recognition model obtained by training, and obtain each sensitive data represented by each rights resource code and the data type corresponding to each sensitive data by using the recognition method. The data identification model is used for characterizing whether sensitive data and the data type of the sensitive data exist in the data characterized by the authority resource codes or not. The data recognition model is trained in advance based on historical data. The data recognition model may be constructed based on a convolutional neural network.
In some optional implementations of the present embodiment, the method further includes: and optimizing the authority examination strategy based on the correlation between each sensitive data represented by each authority resource code and the authority examination. According to the data represented by the authority, the data is used as the basis of authority examination, so that better authority examination is realized, traffic abnormality detection is further carried out on URLs related to sensitive data, and the false alarm rate is reduced.
It should be noted that, the training process of the structure and model of the convolutional neural network is a well-known technology widely studied and applied at present, and will not be described herein.
In this embodiment, the specific operations of steps 301 and 304 are substantially the same as those of steps 101 and 104 in the embodiment shown in fig. 1, and will not be described herein.
As can be seen from fig. 3, compared with the embodiment corresponding to fig. 1, the schematic diagram 300 of the method for identifying data in this embodiment uses the access address and each service application to generate the link tracking identifier corresponding to the access identifier and each service application, and sends the link tracking identifier to the client, so as to realize that the data blood-edge relationship of all data chains from the user to the front-end application, the front-end application to the back-end application, the back-end service room and the back-end application to the database is seen through in combination with the traffic data; and inputting the authority resource codes of each service application into the trained data identification model, and generating each sensitive data represented by each authority resource code and the data type corresponding to each sensitive data, thereby realizing accurate and comprehensive data identification with wider application range.
With further reference to fig. 4, as an implementation of the method shown in fig. 1-3 described above, the present application provides an embodiment of an apparatus for identifying data, which corresponds to the method embodiment shown in fig. 1, and which is particularly applicable to various electronic devices.
As shown in fig. 4, the apparatus 400 for identifying data of the present embodiment includes: the first acquiring unit 401, the first generating unit 402, the second acquiring unit 403, the second generating unit 404 and the data identifying unit 405, wherein the first acquiring unit is configured to acquire a user identifier, an access address and each service application provided for the request corresponding to the request in response to receiving a user access request sent by the client; the first generation unit is configured to generate link tracking identifiers corresponding to the access identifiers and the service applications based on the access addresses and the service applications, wherein the link tracking identifiers are used for representing association relations between the access addresses and the service applications; a second obtaining unit configured to obtain application information of each service application from the database based on the link tracking identifier, wherein the application information includes: the interface information and the execution code are updated in advance by utilizing a link tracking identifier to obtain a database; the second generation unit is configured to generate authority resource codes of all service applications corresponding to all interface information based on the user identification and the interface information of all service applications, wherein the authority resource codes are used for representing resource information used for carrying out authority verification on the request; the data identification unit is configured to identify authority resource codes of all service applications to obtain all sensitive data represented by the authority resource codes and data types corresponding to the sensitive data, wherein the identification unit is used for comparing the execution codes of all service applications with classified data aiming at the sensitive data in the metadata of the database.
In this embodiment, the specific processes of the first obtaining unit 401, the first generating unit 402, the second obtaining unit 403, the second generating unit 404, and the data identifying unit 405 of the apparatus 400 for identifying data and the technical effects thereof may refer to the relevant descriptions of steps 101 to 105 in the corresponding embodiment of fig. 1, and are not repeated herein.
In some alternative implementations of the present embodiment, the updating of the database is accomplished by the following modules: the generation module is configured to bind the link tracking identification with the acquired application information corresponding to each service application, and generate interface information corresponding to each bound service application and execution codes corresponding to each bound service application; and the updating module is configured to update the database based on the interface information corresponding to each service application after binding and the execution code corresponding to each service application after binding.
In some optional implementations of this embodiment, the generating module is further configured to bind the link tracking identifier with application information corresponding to each service application based on a link tracking technique, where the link tracking technique is used to characterize embedding points at corresponding positions of the application information corresponding to each service application using a embedding point technique.
In some optional implementations of the present embodiment, the data identifying unit includes: the extraction module is configured to extract the execution codes of the service applications to obtain feature data sets corresponding to the execution codes of the service applications; the identification module is configured to identify authority resource codes of all service applications to obtain all sensitive data represented by all authority resource codes and data types corresponding to all the sensitive data, wherein the identification module is used for comparing the characteristic data in all the characteristic data sets with the classified data of the database metadata aiming at the sensitive data.
In some optional implementations of this embodiment, the data identifying unit is further configured to input the rights resource codes of each service application to the trained data identifying model, and generate each sensitive data represented by each rights resource code and a data type corresponding to each sensitive data, where the data identifying model is used for determining whether the sensitive data and the data type of the sensitive data exist in the data represented by the rights resource codes.
In some optional implementations of this embodiment, the apparatus further includes: and the sending unit is configured to send the link tracking identification to the client.
In some optional implementations of this embodiment, the apparatus further includes: and the optimizing unit is configured to optimize the authority examination strategy based on the correlation between each sensitive data characterized by each authority resource code and the authority examination.
According to embodiments of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 5, is a block diagram of an electronic device for a method of identifying data according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.
Memory 502 is a non-transitory computer readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the methods for identifying data provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for identifying data provided by the present application.
The memory 502 is a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and a module, such as program instructions/modules (e.g., the first acquisition unit 401, the first generation unit 402, the second acquisition unit 403, the second generation unit 404, and the data identification unit 405 shown in fig. 4) corresponding to a method for identifying data in the embodiments of the present application. The processor 501 executes various functional applications of the server and data processing, i.e., implements the method for identifying data in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.
Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device for identifying data, or the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected to the electronic device for identifying data via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device for the method of identifying data may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device for identifying data, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme, the user access request is analyzed in response to the received user access request sent by the client, the user identification corresponding to the request, the access address and each service application provided for the request are obtained, the link tracking identification corresponding to the access identification and each service application is generated based on the access address and each service application, and the application information of each service application is obtained from the database based on the link tracking identification, wherein the application information comprises: the method comprises the steps that interface information and execution codes are updated in advance by a database, authority resource codes of all service applications corresponding to all interface information are generated based on user identification and interface information of all service applications, the authority resource codes of all service applications are identified, all sensitive data represented by all the authority resource codes and data types corresponding to all the sensitive data are obtained, wherein the identification is used for comparing the execution codes of all the service applications with classified data aiming at the sensitive data in metadata of the database, the problems that in the prior art, the sensitive data represented behind the authority resource codes cannot be identified, authority approvers do not have approval basis, authority management and examination are difficult are solved, and the problems that through which rear end application related to access addresses in flow is not known by http message flow data, and which of the authority resource codes corresponding to the rear end application is fuzzy in positioning of assets, and important protection is difficult to achieve are avoided. By generating the link tracking identifier and updating the corresponding application information in the database, the data blood-edge relationship of the whole data chain from the user to the front-end application, the front-end application to the back-end application, the back-end service room and the back-end application to the database is obtained through combining the traffic data perspective, and the data identification method for automatically identifying the sensitive data and the sensitive data types represented by the authority resource codes according to the data blood-edge relationship is realized.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (16)

1. A method for identifying data, the method comprising:
responding to a user access request sent by a client, and analyzing the user access request to obtain a user identifier, an access address and various service applications provided for the request corresponding to the request;
generating a link tracking identifier corresponding to the access address and each service application based on the access address and each service application, wherein the link tracking identifier is used for representing the association relationship between the access address and each service application;
Based on the link tracking identification, acquiring application information of each service application from a database, wherein the application information comprises: interface information and execution codes, wherein the database is updated in advance by utilizing the link tracking identification;
generating authority resource codes of all service applications corresponding to all interface information based on the user identification and the interface information of all service applications, wherein the authority resource codes are used for representing resource information used for performing authority verification on the request;
and identifying the authority resource codes of the service applications to obtain sensitive data represented by the authority resource codes and data types corresponding to the sensitive data, wherein the identification is used for comparing the execution codes of the service applications with the classified data aiming at the sensitive data in the database metadata.
2. The method of claim 1, wherein the database update procedure is as follows:
binding the link tracking identification with the acquired application information corresponding to each service application, and generating interface information corresponding to each service application after binding and execution codes corresponding to each service application after binding;
And updating the database based on the interface information corresponding to each service application after binding and the execution code corresponding to each service application after binding.
3. The method of claim 2, wherein the binding the link tracking identifier with the acquired application information corresponding to the service applications, and generating the bound interface information corresponding to the service applications and the bound execution codes corresponding to the service applications, includes:
and binding the link tracking identification with the application information corresponding to each service application based on a link tracking technology, and generating interface information corresponding to each service application after binding and execution codes corresponding to each service application after binding, wherein the link tracking technology is used for representing embedding points at corresponding positions of the application information corresponding to each service application by utilizing an embedding point technology.
4. The method of claim 1, wherein the identifying the rights resource codes of the service applications to obtain each sensitive data represented by each rights resource code and a data type corresponding to each sensitive data includes:
Extracting the execution codes of the service applications to obtain feature data sets corresponding to the execution codes of the service applications;
and identifying the authority resource codes of each service application to obtain each sensitive data represented by each authority resource code and the data type corresponding to each sensitive data, wherein the identification is used for comparing the characteristic data in each characteristic data set with the classified data of the database metadata aiming at the sensitive data.
5. The method of claim 1, wherein the identifying the rights resource codes of the service applications to obtain each sensitive data represented by each rights resource code and a data type corresponding to each sensitive data includes:
and inputting the authority resource codes of the service applications into a data identification model obtained through training, and generating all sensitive data represented by the authority resource codes and data types corresponding to the sensitive data, wherein the data identification model is used for representing whether the sensitive data exist in the data represented by the authority resource codes or not and judging the data types of the sensitive data.
6. The method of claim 1, further comprising:
and sending the link tracking identification to the client.
7. The method of claim 1, further comprising:
and optimizing the authority examination strategy based on the correlation between each sensitive data represented by each authority resource code and the authority examination.
8. An apparatus for identifying data, the apparatus comprising:
the first acquisition unit is configured to respond to receiving a user access request sent by a client and acquire a user identifier, an access address and various service applications provided for the request corresponding to the request;
a first generation unit configured to generate, based on the access address and the respective service applications, link tracking identifications corresponding to the access address and the respective service applications, wherein the link tracking identifications are used for characterizing association relations between the access address and the respective service applications;
a second obtaining unit, configured to obtain application information of the service applications from a database based on the link tracking identifier, where the application information includes: interface information and execution codes, wherein the database is updated in advance by utilizing the link tracking identification;
A second generating unit configured to generate, based on the user identifier and interface information of each service application, rights resource codes of each service application corresponding to each interface information, where the rights resource codes are used to characterize resource information used for performing rights verification on the request;
the data identification unit is configured to identify the authority resource codes of the service applications to obtain sensitive data represented by the authority resource codes and data types corresponding to the sensitive data, wherein the identification is used for comparing the execution codes of the service applications with classified data aiming at the sensitive data in the database metadata.
9. The apparatus of claim 8, wherein the database update process is accomplished by:
the generation module is configured to bind the link tracking identification with the acquired application information corresponding to each service application, and generate interface information corresponding to each service application after binding and execution codes corresponding to each service application after binding;
and the updating module is configured to update the database based on the interface information corresponding to each service application after binding and the execution code corresponding to each service application after binding.
10. The apparatus of claim 9, wherein the generation module is further configured to bind the link tracking identification with the application information corresponding to the respective service application based on a link tracking technique that characterizes embedding points at respective locations of the application information corresponding to the respective service application using a point embedding technique.
11. The apparatus of claim 8, wherein the data identification unit comprises:
the extraction module is configured to extract the execution codes of the service applications to obtain feature data sets corresponding to the execution codes of the service applications;
the identification module is configured to identify the authority resource codes of all service applications to obtain all the sensitive data represented by the authority resource codes and the data types corresponding to the sensitive data, wherein the identification is used for comparing the characteristic data in all the characteristic data sets with the classified data of the database metadata aiming at the sensitive data.
12. The apparatus according to claim 8, wherein the data identification unit is further configured to input rights resource codes of the respective service applications into a trained data identification model, and generate respective sensitive data characterized by the rights resource codes and data types corresponding to the respective sensitive data, wherein the data identification model is used for characterization to determine whether the rights resource codes characterized data have sensitive data and data types of the sensitive data.
13. The apparatus of claim 8, further comprising:
and the sending unit is configured to send the link tracking identification to the client.
14. The apparatus of claim 8, further comprising:
and the optimizing unit is configured to optimize the authority examination strategy based on the correlation between each sensitive data characterized by each authority resource code and the authority examination.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.
CN202110180902.8A 2021-02-08 2021-02-08 Method and device for identifying data Active CN113779616B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110180902.8A CN113779616B (en) 2021-02-08 2021-02-08 Method and device for identifying data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110180902.8A CN113779616B (en) 2021-02-08 2021-02-08 Method and device for identifying data

Publications (2)

Publication Number Publication Date
CN113779616A CN113779616A (en) 2021-12-10
CN113779616B true CN113779616B (en) 2024-04-05

Family

ID=78835697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110180902.8A Active CN113779616B (en) 2021-02-08 2021-02-08 Method and device for identifying data

Country Status (1)

Country Link
CN (1) CN113779616B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114726596A (en) * 2022-03-25 2022-07-08 北京沃东天骏信息技术有限公司 Sensitive data processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103078859A (en) * 2012-12-31 2013-05-01 普天新能源有限责任公司 Service system authority management method, equipment and system
CN110602046A (en) * 2019-08-13 2019-12-20 上海陆家嘴国际金融资产交易市场股份有限公司 Data monitoring processing method and device, computer equipment and storage medium
US10515212B1 (en) * 2016-06-22 2019-12-24 Amazon Technologies, Inc. Tracking sensitive data in a distributed computing environment
CN111367983A (en) * 2020-03-10 2020-07-03 中国联合网络通信集团有限公司 Database access method, system, device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103078859A (en) * 2012-12-31 2013-05-01 普天新能源有限责任公司 Service system authority management method, equipment and system
US10515212B1 (en) * 2016-06-22 2019-12-24 Amazon Technologies, Inc. Tracking sensitive data in a distributed computing environment
CN110602046A (en) * 2019-08-13 2019-12-20 上海陆家嘴国际金融资产交易市场股份有限公司 Data monitoring processing method and device, computer equipment and storage medium
CN111367983A (en) * 2020-03-10 2020-07-03 中国联合网络通信集团有限公司 Database access method, system, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于动态追踪机制的物资信息服务系统;陈士沣;廖泰安;;计算机工程与设计;20110316(03);全文 *

Also Published As

Publication number Publication date
CN113779616A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
CN111666546B (en) Application login method and device
CN112084366B (en) Method, apparatus, device and storage medium for retrieving image
CN111752843A (en) Method, device, electronic equipment and readable storage medium for determining influence surface
CN110572399B (en) Vulnerability detection processing method, device, equipment and storage medium
CN112487973B (en) Updating method and device for user image recognition model
CN111724069A (en) Method, apparatus, device and storage medium for processing data
US20210350805A1 (en) Method, apparatus, device and computer storage medium for processing voices
CN113779616B (en) Method and device for identifying data
CN112491617A (en) Link tracking method, device, electronic equipment and medium
CN111552829B (en) Method and apparatus for analyzing image material
CN109905366A (en) Terminal device safe verification method, device, readable storage medium storing program for executing and terminal device
CN112115334A (en) Method, device, equipment and storage medium for distinguishing hot content of network community
KR101566363B1 (en) Apparatus for analyzing connections about security events based on rule and method thereof
CN112069137A (en) Method and device for generating information, electronic equipment and computer readable storage medium
WO2023077940A1 (en) Recognition method and apparatus, processing method and apparatus, and intelligent detection system
CN111753330B (en) Determination method, apparatus, device and readable storage medium for data leakage main body
US11838294B2 (en) Method for identifying user, storage medium, and electronic device
CN115858345A (en) Application service module verification method and device, electronic equipment and storage medium
CN111553283B (en) Method and device for generating model
CN114661274A (en) Method and device for generating intelligent contract
CN111506787B (en) Method, device, electronic equipment and computer readable storage medium for web page update
CN114219420A (en) Processing method and device for handling guaranteed housing based on AI and RPA
CN112614479A (en) Training data processing method and device and electronic equipment
US9674160B2 (en) Methods for anti-fraud masking of a universal resource indentifier (“URI”)
CN111581071B (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant