CN113986956B - Data exception query analysis method and device, computer equipment and storage medium - Google Patents
Data exception query analysis method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN113986956B CN113986956B CN202111631123.1A CN202111631123A CN113986956B CN 113986956 B CN113986956 B CN 113986956B CN 202111631123 A CN202111631123 A CN 202111631123A CN 113986956 B CN113986956 B CN 113986956B
- Authority
- CN
- China
- Prior art keywords
- data
- personal privacy
- application system
- interface
- subunit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Storage Device Security (AREA)
Abstract
The embodiment of the invention discloses a data exception query analysis method and device, computer equipment and a storage medium. The method comprises the following steps: acquiring a query request of a database and interface access information of an application system to obtain query request data; judging whether an interface of the application system contains user account information or not according to the query request data; if the interface of the application system does not contain the user account information, acquiring data content returned by the database; judging whether the data content has personal privacy data or not; and if the data content has the personal privacy data, generating abnormal inquiry risk information of the personal privacy data without the account number. By the method, the abnormal inquiry behavior of the personal privacy data under the account-free state of the application system can be accurately and effectively identified.
Description
Technical Field
The invention relates to personal private data, in particular to a data abnormal query analysis method, a device, computer equipment and a storage medium.
Background
In the database calling process, the database can only see the application system ID, the database can be called through the interface after the application system ID is successfully verified so as to inquire the personal privacy data, but whether the calling behavior is initiated by a user cannot be judged. The application system accesses the database through the interface, but whether the calling behavior has risks or not cannot be judged by the application system, because the access behavior can be triggered by normal service use of a user, and can also be triggered artificially and abnormally on a host where the application system is located.
At present, a database exists in a means for inquiring and detecting personal privacy data, only an application system ID can be obtained, and user information cannot be obtained; when the host where the application is located directly accesses the database interface, the application system cannot judge whether the database interface is normal or not; the query data content cannot be detected quickly, and whether the query behavior is normal or not can be further judged.
Therefore, it is necessary to design a new method for accurately and effectively identifying the abnormal query behavior of the private data of the application system in the account-free state.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a data exception query analysis method, a data exception query analysis device, computer equipment and a storage medium.
In order to achieve the purpose, the invention adopts the following technical scheme: the data abnormal query analysis method comprises the following steps:
acquiring a query request of a database and interface access information of an application system to obtain query request data;
judging whether an interface of the application system contains user account information or not according to the query request data;
if the interface of the application system does not contain the user account information, acquiring data content returned by the database;
judging whether the data content has personal privacy data or not;
and if the data content has the personal privacy data, generating abnormal inquiry risk information of the personal privacy data without the account number.
The further technical scheme is as follows: after the abnormal inquiry risk information of the private data without the account number is generated, the method further comprises the following steps:
and analyzing the personal privacy data in the data content, and performing risk early warning by combining with an early warning rule.
The further technical scheme is as follows: the analyzing the personal privacy data in the data content and performing risk early warning by combining with early warning rules comprises the following steps:
combing the number and the category of personal privacy data in the data content, and determining the risk level by combining with an early warning rule;
and carrying out risk early warning according to the risk grade.
The further technical scheme is as follows: the judging whether the interface of the application system contains user account information according to the query request data comprises the following steps:
judging whether all user ID or token fields in a request header of an interface of the application system are empty;
if at least one of a user ID or a token field in a request header of an interface of the application system is not null, determining that the interface of the application system contains user account information;
and if the user ID or token field in the request header of the interface of the application system is all null, determining that the interface of the application system does not contain user account information.
The further technical scheme is as follows: after judging whether the interface of the application system contains the user account information according to the query request data, the method further comprises the following steps:
and if the interface of the application system contains the user account information, entering an end step.
The invention also provides a data abnormal query analysis device, which comprises:
the data acquisition unit is used for acquiring a query request of a database and interface access information of an application system to obtain query request data;
the information judging unit is used for judging whether an interface of the application system contains user account information or not according to the query request data;
the content acquisition unit is used for acquiring data content returned by the database if the interface of the application system does not contain user account information;
a content judgment unit for judging whether the data content has personal privacy data;
and the risk generating unit is used for generating abnormal inquiry risk information of the private data without the account number if the private data exists in the data content.
The further technical scheme is as follows: further comprising:
and the analysis unit is used for analyzing the personal privacy data in the data content and carrying out risk early warning by combining with early warning rules.
The further technical scheme is as follows: the analysis unit includes:
the combing subunit is used for combing the number and the category of the items of the personal privacy data in the data content and determining the risk level by combining with the early warning rule;
and the early warning subunit is used for carrying out risk early warning according to the risk level.
The invention also provides computer equipment which comprises a memory and a processor, wherein the memory is stored with a computer program, and the processor realizes the method when executing the computer program.
The invention also provides a storage medium storing a computer program which, when executed by a processor, implements the method described above.
Compared with the prior art, the invention has the beneficial effects that: according to the method and the device, the query request of the database and the interface access information of the application system are obtained, whether the interface of the application system contains the user account information or not is judged, and when the user account information does not exist, the returned data content is analyzed, so that the abnormal query behavior of the personal privacy data of the application system under the account-free state can be accurately and effectively identified.
The invention is further described below with reference to the accompanying drawings and specific embodiments.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of a data abnormal query analysis method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a data abnormal query analysis method according to an embodiment of the present invention;
FIG. 3 is a schematic sub-flow chart of a data abnormal query analysis method according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a data exception query analysis method according to another embodiment of the present invention;
FIG. 5 is a schematic sub-flow chart of a data abnormal query analysis method according to another embodiment of the present invention;
FIG. 6 is a schematic block diagram of a data exception query analysis apparatus according to an embodiment of the present invention;
fig. 7 is a schematic block diagram of an information determination unit of a data abnormal query analysis apparatus according to an embodiment of the present invention;
FIG. 8 is a schematic block diagram of a data exception query analysis apparatus according to another embodiment of the present invention;
fig. 9 is a schematic block diagram of an analysis unit of the data abnormal query analysis apparatus according to the embodiment of the present invention;
FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a data abnormal query analysis method according to an embodiment of the present invention. Fig. 2 is a schematic flow chart of a data exception query analysis method according to an embodiment of the present invention. The data anomaly query analysis method is applied to a server. The server performs data interaction with the terminal, acquires a query request of a database and interface access information of an application system through the terminal, analyzes whether user account information exists, judges whether abnormal query risks exist in personal privacy data when the user account information does not exist, and performs early warning.
Fig. 2 is a schematic flow chart of a data abnormal query analysis method according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S150.
S110, acquiring a query request of the database and interface access information of the application system to obtain query request data.
In this embodiment, the query request data refers to a query request of the database and interface access information of the application system.
Since the analysis needs to be performed for the application system interface, the application system interface access information needs to be collected. The query request of the database and the interface access information of the application system can be obtained through mainstream acquisition technologies such as code instrumentation or network flow mirroring, so that the query request of the database is obtained, and the request and the return are the same thread, so that the application system can be positioned on the basis of the thread, which interface is calling the database.
And S120, judging whether the interface of the application system contains user account information or not according to the query request data.
The application system performs an authentication operation on the user when receiving the user request. The common authentication method is based on the user ID, i.e. the SessionID, or the Token, i.e. Token field in the hypertext transfer protocol request header. When a user initiates first access through a browser or an APP, the user can further negotiate with an application system and generate a user ID or Token after authentication, and the user always carries information of the two fields in the subsequent access process and carries out authentication based on the numerical values of the fields.
In this embodiment, the user account information refers to information such as an ID or a token of the user.
In an embodiment, referring to fig. 3, the step S120 may include steps S121 to S123.
S121, judging whether all user IDs or token fields in a request header of an interface of the application system are empty;
s122, if at least one of a user ID or a token field in a request header of the interface of the application system is not null, determining that the interface of the application system contains user account information;
and S123, if the user ID or token field in the request header of the interface of the application system is all null, determining that the interface of the application system does not contain user account information.
Analyzing request parameters of an interface of an application system, for example, detecting whether a SessionID (other similar id) or Token field exists in a request header of the interface or whether values of the 2 fields are all null, and determining whether the request is initiated by a user. If the value of the SessionID (other similar id) or token field in the application system interface is null, then it may be determined that the request was not initiated by the user.
And S130, if the interface of the application system does not contain the user account information, acquiring the data content returned by the database.
In this embodiment, the data content refers to data returned by the database according to the request for query.
And judging whether the data content has personal privacy data.
For example, the interface request of the application system is not initiated by the user, but is directly initiated by the host where the application system is located, and it is necessary to analyze whether the returned data (the numerical content returned from the database and the data table after the application system queries the database) of the database contains personal privacy data (such as name, phone number, address, identification card, etc.). Because such a scenario often occurs when the operating system authority of the host where the application system is located is illegally invaded or utilized by a person, analyzing the returned data is beneficial to further risk judgment.
Specifically, content identification is performed on data returned by the database, and whether the data contains personal privacy data is judged, which can be performed through the following steps:
and inputting the data content into a recognition model for personal privacy data recognition to obtain a recognition result.
In the present embodiment, the recognition result refers to the probability that the data content belongs to a specific type of the individual privacy data, that is, the similarity score.
And transmitting all the collected data to a recognition model to complete calculation, recognition and identification. In the data transmission process, in order to guarantee the integrity of data, the transmission control strategies of asynchronous transmission or local cache and transmission queues are automatically selected according to the monitoring and calculation of WEB application system resources and network bandwidth.
In this embodiment, the recognition model is obtained by training a deep learning network by using a plurality of data with specific type labels of personal privacy data as a sample set and adopting a multi-GPU parallel computing framework.
In an embodiment, the recognition model is obtained by training a deep learning network by using a plurality of data with specific type labels of personal privacy data as a sample set and adopting a multi-GPU parallel computing framework, and may include:
the method comprises the steps of obtaining a plurality of data with specific type labels of personal privacy data to obtain a sample set, and dividing the sample set into training sets.
In this embodiment, the sample set refers to a set of several data with specific type tags of personal privacy data.
The training set refers to a data set divided by a sample set for training a model.
Constructing a ResNet deep learning network;
loading the training set to a plurality of GPU nodes to perform gradient derivation to obtain derivation results of all the nodes;
carrying out weighted average on the derivation results of all the nodes, updating the network parameters of the ResNet deep learning network, and synchronously updating all GPU nodes;
judging whether the ResNet deep learning network is converged;
if the ResNet deep learning network is converged, determining the ResNet deep learning network as an identification model;
if the ResNet deep learning network does not converge, step S133 is executed.
Specifically, a ResNet deep learning algorithm is adopted, the layer depth of the improved network is optimized, the layer depth is gradually increased from the layer number of a dozen layers of networks to the layer number of a hundred layers of networks, and the data characteristics of the personal privacy class, including character types such as numbers, letters, Chinese characters and the like, and specific character lengths, symbols, formats and the like, can be effectively learned and extracted from the data by the algorithm model.
Original data in a self-owned service system, such as an identity card number, a mobile phone number, an address, various office documents with personal privacy data, pictures containing the personal privacy data and the like, are constructed into a sample set, and the scale of the whole sample set is about 10 thousands. Performing algorithm model training by adopting a data-based multi-GPU parallel computing frame, randomly dividing sample data of a sample set into a plurality of training sets, and loading the training sets into a plurality of GPU nodes to perform gradient derivation; then, weighted averaging is carried out on the derivation results of all the nodes, network parameters are updated, and all GPU nodes are updated synchronously; and finally, continuing to train the next step until the model converges and the training is finished. And then, generating specific types of the personal privacy data, including an identity card, a mobile phone number, a home address, an electronic mail box, a license plate number, a bank account number, a social security number and a public accumulation fund number. The GPU may forward transmit the prediction results and backward transmit the update values of the model parameters.
And matching the data content with a preset data identification strategy to obtain a matching score.
In this embodiment, the matching score refers to a score obtained after the data content is matched with various data identification strategies.
In one embodiment, may include:
and defining a regular expression, a dictionary and a keyword recognition rule of specific types of personal privacy data, and performing multi-mode combination on the recognition rules to form various data recognition strategies.
In this embodiment, the multiple data recognition strategies include a keyword and regular expression rule combined recognition strategy, a regular expression and dictionary rule combined recognition strategy, a dictionary and keyword rule combined recognition strategy, and the like.
Specifically, identification rules such as regular expressions, dictionaries, keywords and the like of specific types of personal privacy data such as identity cards, mobile phone numbers, home addresses, electronic mailboxes, license plate numbers, bank accounts, social security numbers, public accumulation fund numbers and the like are defined, and the identification rules are combined in a multi-mode to form multiple data identification strategies.
And matching the data content with a preset data identification strategy to obtain a matching score.
Specifically, the data content is sequentially matched with various data identification strategies to obtain strategy matching scores.
And determining the data category according to the matching score and the recognition result.
In the present embodiment, the data category refers to which specific type of personal privacy data the data belongs to, or does not belong to the personal privacy data.
In one embodiment, may include:
carrying out weighted average on the matching scores and the recognition results to obtain the scores of each specific type;
and screening out the specific type with the highest score to obtain the data category.
And automatically extracting the characteristics of the data content, such as length, character type, symbol, format and the like by adopting a ResNet deep learning algorithm, and judging the similarity score of the data content and the specific type of the personal privacy data. Meanwhile, the data content is sequentially matched with various data identification strategies to obtain strategy matching scores. And finally, adopting a weighted average mode, wherein the class with the highest score indicates whether the data content belongs to the personal privacy class data.
Judging whether the data category is personal privacy data;
and if the data type is the personal privacy data, outputting the data type.
And if the data type is not the personal privacy data, outputting the information of which the data content does not belong to the personal privacy data.
For example: the data used in the Web application system contains the user identity card number, and when the data is acquired by the automatic data acquisition plug-in, the data is transmitted to the identification model. The recognition model extracts the character length, the character type, the fixed format and other characteristics of the ID number through a ResNet deep learning algorithm, and then carries out similarity judgment on specific types of personal privacy data to obtain similarity scores of the personal privacy data such as the ID, the mobile phone number, the home address, the electronic mail box, the license plate number, the bank account number, the social security number, the public accumulation fund number and the like. And matching and similarity discrimination are carried out on the user identity card number and each identification strategy of the data identification strategy set in sequence to obtain similarity scores of specific types of personal privacy data such as the identity card, a mobile phone number, a home address, an e-mail box, a license plate number, a bank account number, a social security number, a public accumulation fund number and the like. And finally, performing weighted average on all the similarity scores, wherein the score with the highest score is the identity card, namely the data is identified as the identity card.
The method is more efficient, realizes automatic data acquisition and automatic identification without manual participation; more accurate, effectively identify structured and unstructured personal privacy data used by WEB application; more comprehensive, the WEB application and the service and function interfaces thereof are covered, and the mistakes and omissions can not occur.
And S150, if the data content has the personal privacy data, generating abnormal inquiry risk information with the account-free personal privacy data.
When personal privacy data is present, risks need to be generated to function as a reminder.
And if the interface of the application system contains the user account information, entering an end step.
And if the data content does not have personal privacy data, entering an end step.
According to the data anomaly query analysis method, query requests of the database and interface access information of the application system are obtained, whether the interface of the application system contains user account information or not is judged, and when the user account information does not exist, returned data content is analyzed, so that the anomaly query behavior of the personal privacy data of the application system in the account-free state can be accurately and effectively identified.
Fig. 4 is a flowchart illustrating a data abnormal query analysis method according to another embodiment of the present invention. As shown in FIG. 4, the data abnormal query analysis method of the present embodiment includes steps S210-S260. Steps S210 to S250 are similar to steps S110 to S150 in the above embodiments, and are not described herein again. The added step S260 in the present embodiment is explained in detail below.
And S260, analyzing the personal privacy data in the data content, and performing risk early warning by combining with early warning rules.
In an embodiment, referring to fig. 5, the step S260 may include steps S261 to S262.
S261, the number and the types of the personal privacy data in the data content are combed, and the risk level is determined by combining with an early warning rule.
In the present embodiment, the risk level refers to a level of abnormal query risk determined according to the number and category of items of personal privacy data.
And S262, carrying out risk early warning according to the risk grade.
Specifically, if the data content contains the personal privacy data, the number of items (the number of inquired data items) and the type (the type of inquired data items) of the personal privacy data are further sorted, then, an early warning rule is specified, for example, high risk is found when the type of inquired personal privacy data exceeds 5 or the number of inquired data exceeds 100, medium risk is found when the type of inquired personal privacy data exceeds 3 or the number of inquired data exceeds 50, low risk is found when the type of inquired personal privacy data exceeds 1 or the number of inquired data exceeds 10, and finally, early warning of different levels is carried out on the inquiry behavior of the application system for the non-account status personal privacy data.
Fig. 6 is a schematic block diagram of a data abnormal query analysis apparatus 300 according to an embodiment of the present invention. As shown in fig. 6, the present invention further provides a data abnormal query analyzing apparatus 300 corresponding to the above data abnormal query analyzing method. The data abnormal query analyzing apparatus 300 includes a unit for executing the data abnormal query analyzing method, and the apparatus may be configured in a server. Specifically, referring to fig. 6, the data anomaly query analysis device 300 includes a data acquisition unit 301, an information judgment unit 302, a content acquisition unit 303, a content judgment unit 304, and a risk generation unit 305.
A data obtaining unit 301, configured to obtain a query request of a database and interface access information of an application system to obtain query request data; an information determining unit 302, configured to determine whether an interface of the application system contains user account information according to the query request data; a content obtaining unit 303, configured to obtain data content returned by the database if the interface of the application system does not contain user account information; a content judgment unit 304, configured to judge whether the data content has personal privacy data; and a risk generating unit 305, configured to generate abnormal inquiry risk information with the account-free personal privacy data if the data content has the personal privacy data.
In an embodiment, as shown in fig. 7, the information judging unit 302 includes a field judging subunit 3021, a first determining subunit 3022, and a second determining subunit 3023.
A field judgment subunit 3021, configured to judge whether all user ID or token fields in a request header of the interface of the application system are empty; a first determining subunit 3022, configured to determine that the interface of the application system contains user account information if at least one of a user ID or a token field in a request header of the interface of the application system is not null; a second determining subunit 3023, configured to determine that the interface of the application system does not contain the user account information if all of the user ID or the token field in the request header of the interface of the application system is empty.
Specifically, the content judgment unit 304 may include: the device comprises a model identification unit, a matching unit, a weighted average unit, a judgment unit and an output unit.
The model identification unit is used for inputting the data content into an identification model to carry out personal privacy data identification so as to obtain an identification result; the matching unit is used for matching the data content with a preset data identification strategy to obtain a matching score; the weighted average unit is used for determining the data category according to the matching score and the recognition result; a judging unit configured to judge whether the data category is personal privacy data; and the output unit is used for outputting the data type if the data type is the personal privacy data.
In an embodiment, the content determining unit 304 further includes a model generating unit, configured to train the deep learning network through a plurality of data with specific type tags of the personal privacy data as a sample set and using a multi-GPU parallel computing framework, so as to obtain a recognition model.
In an embodiment, the model generating unit includes a sample set obtaining subunit, a network constructing subunit, a derivation subunit, a parameter updating subunit, and a determining subunit.
The system comprises a sample set acquisition subunit, a data analysis subunit and a data analysis subunit, wherein the sample set acquisition subunit is used for acquiring a plurality of data with specific type labels of personal privacy data to obtain a sample set and dividing the sample set into a training set; the network construction subunit is used for constructing a ResNet deep learning network; the derivation subunit is configured to load the training set to multiple GPU nodes to perform gradient derivation to obtain derivation results of all the nodes; the parameter updating subunit is used for carrying out weighted average on the derivation results of all the nodes, updating the network parameters of the ResNet deep learning network and synchronously updating all the GPU nodes; a judging subunit, configured to judge whether the ResNet deep learning network converges; if the ResNet deep learning network is converged, determining the ResNet deep learning network as an identification model; and if the ResNet deep learning network does not converge, executing the training set to be loaded to a plurality of GPU nodes so as to carry out gradient derivation and obtain derivation results of all the nodes.
In one embodiment, the matching unit includes a policy definition subunit and a policy matching subunit.
The strategy definition subunit is used for defining a regular expression, a dictionary and a keyword identification rule of specific types of personal privacy data, and performing multi-mode combination on the identification rule to form a plurality of data identification strategies; and the strategy matching subunit is used for matching the data content with a preset data identification strategy to obtain a matching score.
In one embodiment, the weighted average unit includes a type score value operator unit and a filtering subunit.
The type score calculating subunit is used for carrying out weighted average on the matching scores and the identification results to obtain scores of each specific type; and the screening subunit is used for screening out the specific type with the highest score to obtain the data category.
Fig. 8 is a schematic block diagram of a data abnormal query analyzing apparatus 300 according to another embodiment of the present invention. As shown in fig. 8, the data abnormal query analyzing apparatus 300 of the present embodiment is the above embodiment, and an analyzing unit 306 is added.
An analyzing unit 306, configured to analyze the personal privacy data in the data content, and perform risk early warning in combination with an early warning rule.
In one embodiment, referring to fig. 9, the analyzing unit 306 includes a combing subunit 3061 and an early warning subunit 3062.
The combing subunit 3061, is used for combing the number of items and the category of the personal privacy data in the data content, and determining the risk level by combining the early warning rule; and the early warning subunit 3062 is used for carrying out risk early warning according to the risk level.
It should be noted that, as can be clearly understood by those skilled in the art, the specific implementation processes of the data anomaly query analysis apparatus 300 and each unit may refer to the corresponding descriptions in the foregoing method embodiments, and for convenience and brevity of description, no further description is provided here.
The data anomaly query analysis apparatus 300 may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 10.
Referring to fig. 10, fig. 10 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server, wherein the server may be an independent server or a server cluster composed of a plurality of servers.
Referring to fig. 10, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer programs 5032 include program instructions that, when executed, cause the processor 502 to perform a data anomaly query analysis method.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 may be caused to execute a data exception query analysis method.
The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 10 is a block diagram of only a portion of the configuration relevant to the present teachings and is not intended to limit the computing device 500 to which the present teachings may be applied, and that a particular computing device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following steps:
acquiring a query request of a database and interface access information of an application system to obtain query request data; judging whether an interface of the application system contains user account information or not according to the query request data; if the interface of the application system does not contain the user account information, acquiring data content returned by the database; judging whether the data content has personal privacy data or not; and if the data content has the personal privacy data, generating abnormal inquiry risk information of the personal privacy data without the account number.
In an embodiment, after the step of generating abnormal risk information of inquiry with account-free personal privacy data, the processor 502 further performs the following steps:
and analyzing the personal privacy data in the data content, and performing risk early warning by combining with an early warning rule.
In an embodiment, when the processor 502 implements the step of analyzing the personal privacy data in the data content and performing risk pre-warning by combining with the pre-warning rule, the following steps are specifically implemented:
combing the number and the category of personal privacy data in the data content, and determining the risk level by combining with an early warning rule; and carrying out risk early warning according to the risk grade.
In an embodiment, when the step of determining whether the interface of the application system contains the user account information according to the query request data is implemented by the processor 502, the following steps are specifically implemented:
judging whether all user ID or token fields in a request header of an interface of the application system are empty; if at least one of a user ID or a token field in a request header of an interface of the application system is not null, determining that the interface of the application system contains user account information; and if the user ID or token field in the request header of the interface of the application system is all null, determining that the interface of the application system does not contain user account information.
In an embodiment, after the step of determining whether the interface of the application system contains the user account information according to the query request data is implemented, the processor 502 further implements the following steps:
and if the interface of the application system contains the user account information, entering an end step.
It should be understood that in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the steps of:
acquiring a query request of a database and interface access information of an application system to obtain query request data; judging whether an interface of the application system contains user account information or not according to the query request data; if the interface of the application system does not contain the user account information, acquiring data content returned by the database; judging whether the data content has personal privacy data or not; and if the data content has the personal privacy data, generating abnormal inquiry risk information of the personal privacy data without the account number.
In an embodiment, after the step of generating abnormal inquiry risk information with account-free personal privacy data is implemented by the processor by executing the computer program, the following steps are further implemented:
and analyzing the personal privacy data in the data content, and performing risk early warning by combining with an early warning rule.
In an embodiment, when the processor executes the computer program to analyze the personal privacy data in the data content and perform the risk pre-warning step in combination with the pre-warning rule, the following steps are specifically implemented:
combing the number and the category of personal privacy data in the data content, and determining the risk level by combining with an early warning rule; and carrying out risk early warning according to the risk grade.
In an embodiment, when the processor executes the computer program to implement the step of determining whether the interface of the application system contains the user account information according to the query request data, the following steps are specifically implemented:
judging whether all user ID or token fields in a request header of an interface of the application system are empty; if at least one of a user ID or a token field in a request header of an interface of the application system is not null, determining that the interface of the application system contains user account information; and if the user ID or token field in the request header of the interface of the application system is all null, determining that the interface of the application system does not contain user account information.
In an embodiment, after the step of determining whether the interface of the application system contains the user account information according to the query request data is implemented by the processor by executing the computer program, the following steps are further implemented:
and if the interface of the application system contains the user account information, entering an end step.
The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. The data abnormal query analysis method is characterized by comprising the following steps:
acquiring a query request of a database and interface access information of an application system to obtain query request data;
judging whether an interface of the application system contains user account information or not according to the query request data;
if the interface of the application system does not contain the user account information, acquiring data content returned by the database;
judging whether the data content has personal privacy data or not; specifically, the data content is input into a recognition model for personal privacy data recognition to obtain a recognition result; matching the data content with a preset data identification strategy to obtain a matching score; determining the data category according to the matching score and the recognition result; judging whether the data category is personal privacy data; if the data type is personal privacy data, outputting the data type; if the data type is not the personal privacy data, outputting the information that the data content does not belong to the personal privacy data;
the identification model is obtained by taking a plurality of data with specific type labels of personal privacy data as a sample set and training a deep learning network by adopting a multi-GPU parallel computing frame;
the recognition model is obtained by training a deep learning network by using a plurality of data with specific type labels of personal privacy data as a sample set and adopting a multi-GPU parallel computing frame, and can comprise:
acquiring a plurality of data with specific type labels of personal privacy data to obtain a sample set, and dividing the sample set into training sets;
the training set refers to a data set divided by a sample set and used for training a model;
constructing a ResNet deep learning network;
loading the training set to a plurality of GPU nodes to perform gradient derivation to obtain derivation results of all the nodes;
carrying out weighted average on the derivation results of all the nodes, updating the network parameters of the ResNet deep learning network, and synchronously updating all GPU nodes;
judging whether the ResNet deep learning network is converged;
if the ResNet deep learning network is converged, determining the ResNet deep learning network as an identification model;
if the ResNet deep learning network does not converge, executing the training set to be loaded to a plurality of GPU nodes to carry out gradient derivation so as to obtain derivation results of all the nodes;
and if the data content has the personal privacy data, generating abnormal inquiry risk information of the personal privacy data without the account number.
2. The data abnormal query analysis method according to claim 1, wherein after generating the abnormal query risk information of the account-free personal privacy data, the method further comprises:
and analyzing the personal privacy data in the data content, and performing risk early warning by combining with an early warning rule.
3. The data anomaly query analysis method according to claim 2, wherein the analyzing the personal privacy data in the data content and performing risk early warning by combining with early warning rules comprises:
combing the number and the category of personal privacy data in the data content, and determining the risk level by combining with an early warning rule;
and carrying out risk early warning according to the risk grade.
4. The method for analyzing abnormal data query according to claim 1, wherein the determining whether the interface of the application system contains user account information according to the query request data includes:
judging whether all user ID or token fields in a request header of an interface of the application system are empty;
if at least one of a user ID or a token field in a request header of an interface of the application system is not null, determining that the interface of the application system contains user account information;
and if the user ID or token field in the request header of the interface of the application system is all null, determining that the interface of the application system does not contain user account information.
5. The method for analyzing abnormal data query according to claim 1, wherein after determining whether the interface of the application system contains the user account information according to the query request data, the method further comprises:
and if the interface of the application system contains the user account information, entering an end step.
6. The data abnormal inquiry analysis device is characterized by comprising:
the data acquisition unit is used for acquiring a query request of a database and interface access information of an application system to obtain query request data;
the information judging unit is used for judging whether an interface of the application system contains user account information or not according to the query request data;
the content acquisition unit is used for acquiring data content returned by the database if the interface of the application system does not contain user account information;
a content judgment unit for judging whether the data content has personal privacy data;
the risk generating unit is used for generating abnormal inquiry risk information of the private data without the account number if the private data exists in the data content;
the content judgment unit includes: the device comprises a model identification unit, a matching unit, a weighted average unit, a judgment unit and an output unit;
the model identification unit is used for inputting the data content into an identification model to carry out personal privacy data identification so as to obtain an identification result; the matching unit is used for matching the data content with a preset data identification strategy to obtain a matching score; the weighted average unit is used for determining the data category according to the matching score and the recognition result; a judging unit configured to judge whether the data category is personal privacy data; an output unit configured to output the data type if the data type is personal privacy data;
the content judgment unit also comprises a model generation unit which is used for training the deep learning network by taking a plurality of data with specific type labels of the personal privacy data as a sample set and adopting a multi-GPU parallel computing frame to obtain a recognition model;
the model generation unit comprises a sample set acquisition subunit, a network construction subunit, a derivation subunit, a parameter updating subunit and a judgment subunit;
the system comprises a sample set acquisition subunit, a data analysis subunit and a data analysis subunit, wherein the sample set acquisition subunit is used for acquiring a plurality of data with specific type labels of personal privacy data to obtain a sample set and dividing the sample set into a training set; the network construction subunit is used for constructing a ResNet deep learning network; the derivation subunit is configured to load the training set to multiple GPU nodes to perform gradient derivation to obtain derivation results of all the nodes; the parameter updating subunit is used for carrying out weighted average on the derivation results of all the nodes, updating the network parameters of the ResNet deep learning network and synchronously updating all the GPU nodes; a judging subunit, configured to judge whether the ResNet deep learning network converges; if the ResNet deep learning network is converged, determining the ResNet deep learning network as an identification model; and if the ResNet deep learning network does not converge, executing the training set to be loaded to a plurality of GPU nodes so as to carry out gradient derivation and obtain derivation results of all the nodes.
7. The data anomaly query analysis device according to claim 6, further comprising:
and the analysis unit is used for analyzing the personal privacy data in the data content and carrying out risk early warning by combining with early warning rules.
8. The data abnormal query analysis device according to claim 7, wherein the analysis unit includes:
the combing subunit is used for combing the number and the category of the items of the personal privacy data in the data content and determining the risk level by combining with the early warning rule;
and the early warning subunit is used for carrying out risk early warning according to the risk level.
9. A computer arrangement, characterized in that the computer arrangement comprises a memory having stored thereon a computer program and a processor implementing the method according to any of claims 1-5 when executing the computer program.
10. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111631123.1A CN113986956B (en) | 2021-12-29 | 2021-12-29 | Data exception query analysis method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111631123.1A CN113986956B (en) | 2021-12-29 | 2021-12-29 | Data exception query analysis method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113986956A CN113986956A (en) | 2022-01-28 |
CN113986956B true CN113986956B (en) | 2022-03-25 |
Family
ID=79734820
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111631123.1A Active CN113986956B (en) | 2021-12-29 | 2021-12-29 | Data exception query analysis method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113986956B (en) |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102065147A (en) * | 2011-01-07 | 2011-05-18 | 深圳市易聆科信息技术有限公司 | Method and device for obtaining user login information based on enterprise application system |
CN108696490A (en) * | 2017-04-11 | 2018-10-23 | 腾讯科技(深圳)有限公司 | The recognition methods of account permission and device |
US11736292B2 (en) * | 2017-10-23 | 2023-08-22 | Huawei Technologies Co., Ltd. | Access token management method, terminal, and server |
CN111031035B (en) * | 2019-12-12 | 2022-04-19 | 支付宝(杭州)信息技术有限公司 | Sensitive data access behavior monitoring method and device |
US20210334955A1 (en) * | 2020-04-24 | 2021-10-28 | Nvidia Corporation | Image annotation using one or more neural networks |
CN111800509B (en) * | 2020-07-07 | 2022-07-01 | 北京尚隐科技有限公司 | Personal information access request system and method for applying same |
-
2021
- 2021-12-29 CN CN202111631123.1A patent/CN113986956B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN113986956A (en) | 2022-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230013306A1 (en) | Sensitive Data Classification | |
US11956272B2 (en) | Identifying legitimate websites to remove false positives from domain discovery analysis | |
US10637826B1 (en) | Policy compliance verification using semantic distance and nearest neighbor search of labeled content | |
CN114760149B (en) | Data cross-border compliance management and control method and device, computer equipment and storage medium | |
US20200394448A1 (en) | Methods for more effectively moderating one or more images and devices thereof | |
CN110570199A (en) | User identity detection method and system based on user input behaviors | |
CN114416843A (en) | Sensitive data sharing detection method and device, computer equipment and storage medium | |
CN111767543B (en) | Replay attack vulnerability determination method, device, equipment and readable storage medium | |
CN115115369A (en) | Data processing method, device, equipment and storage medium | |
CN110097258B (en) | User relationship network establishment method, device and computer readable storage medium | |
CN113986956B (en) | Data exception query analysis method and device, computer equipment and storage medium | |
CN113988226B (en) | Data desensitization validity verification method and device, computer equipment and storage medium | |
CN114297713A (en) | Private data acquisition and comparison method and device, computer equipment and storage medium | |
CN114363082B (en) | Network attack detection method, device, equipment and computer readable storage medium | |
CN116431912A (en) | User portrait pushing method and device | |
CN113987309B (en) | Personal privacy data identification method and device, computer equipment and storage medium | |
CN115964478A (en) | Network attack detection method, model training method and device, equipment and medium | |
CN114154556A (en) | Training method and device of sample prediction model, electronic equipment and storage medium | |
CN113869904A (en) | Suspicious data identification method, device, electronic equipment, medium and computer program | |
CN111800409A (en) | Interface attack detection method and device | |
CN113452648A (en) | Method, device, equipment and computer readable medium for detecting network attack | |
CN111460422B (en) | Method and device for generating verification code | |
CN116934417A (en) | Object recognition method, device, computer equipment, storage medium and program product | |
CN117114879A (en) | Service determination method and device for compliance determination, electronic equipment and storage medium | |
CN118612090A (en) | Identification method and device for internet of things assets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |