CN116150194B - Data acquisition method, device, electronic equipment and computer readable medium - Google Patents
Data acquisition method, device, electronic equipment and computer readable medium Download PDFInfo
- Publication number
- CN116150194B CN116150194B CN202310432781.0A CN202310432781A CN116150194B CN 116150194 B CN116150194 B CN 116150194B CN 202310432781 A CN202310432781 A CN 202310432781A CN 116150194 B CN116150194 B CN 116150194B
- Authority
- CN
- China
- Prior art keywords
- field
- target
- access request
- data directory
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000002159 abnormal effect Effects 0.000 claims abstract description 184
- 238000012545 processing Methods 0.000 claims abstract description 54
- 230000004044 response Effects 0.000 claims description 31
- 238000001914 filtration Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012216 screening Methods 0.000 claims description 11
- 230000011218 segmentation Effects 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000012544 monitoring process Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 abstract 1
- 239000013598 vector Substances 0.000 description 163
- 238000005457 optimization Methods 0.000 description 39
- 238000001514 detection method Methods 0.000 description 30
- 230000005856 abnormality Effects 0.000 description 19
- 238000012986 modification Methods 0.000 description 15
- 230000004048 modification Effects 0.000 description 15
- 230000006870 function Effects 0.000 description 13
- 230000006835 compression Effects 0.000 description 12
- 238000007906 compression Methods 0.000 description 12
- 230000007246 mechanism Effects 0.000 description 12
- 230000005284 excitation Effects 0.000 description 11
- 238000000605 extraction Methods 0.000 description 10
- 238000012549 training Methods 0.000 description 10
- 230000006399 behavior Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000010606 normalization Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000002457 bidirectional effect Effects 0.000 description 4
- 238000013145 classification model Methods 0.000 description 4
- 238000011176 pooling Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 206010000117 Abnormal behaviour Diseases 0.000 description 3
- 230000004913 activation Effects 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 235000019580 granularity Nutrition 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000002349 favourable effect Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Embodiments of the present disclosure disclose a data acquisition method, apparatus, electronic device, and computer readable medium. One embodiment of the method comprises the following steps: responding to the determination that the access mode is a mode of accessing the target database through the client, and acquiring an access request; acquiring a historical abnormal access request set; determining whether the access request is an abnormal access request; responding to the determination that the access request is not an abnormal access request, and analyzing the access request to obtain a field to be detected; dividing a field to be detected to obtain a divided field sequence; performing field combination processing on the split field sequence to obtain a target data directory level field and a target database level field; and acquiring the target data according to the target data directory level field and the target database level field. According to the embodiment, the multi-level fields are combined into the labeling MySQL field through the preset format, so that data in a larger range can be obtained, the compatibility of a database is improved, the operation load of a system is reduced, and the data query efficiency is improved.
Description
Technical Field
Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a data acquisition method, apparatus, electronic device, and computer readable medium.
Background
When data is extracted from the database, abnormal access detection needs to be carried out on the access request, so that data leakage is avoided. Apache Doris is a database with a 3-layer metadata hierarchy, and MySQL database or MySQL protocol compliant database product employs a two-layer metadata design. For data acquisition, the following methods are generally adopted: after abnormal access detection is carried out on the access request, different data directory levels are switched through a switch command so as to acquire data under the different data directory levels.
However, the inventors have found that when data is acquired in the above manner, there are often the following technical problems:
first, when a user accesses target data using a switch command, the switch command is not a standard MySQL command request, which results in increasing access requests of a database system, processing load of the system, and reducing access experience of the user, thereby resulting in performance degradation of the system.
Secondly, an abnormal access rule base is constructed through association rules, abnormal access detection is carried out on the access request, and because the history access request needs to be scanned frequently, the data in the rule base does not have timeliness and low detection accuracy, so that the safety of a database system is reduced, and the operation amount and the processing load of the system are increased.
Thirdly, determining the abnormal access type of the access request by adopting a multi-layer convolutional neural network, wherein the abnormal access type can be determined, but the problem of insufficient extraction of important characteristic information exists, so that the classification accuracy is low.
Fourth, for performance optimization of the access request, in the prior art, the access request statement is mainly optimized according to experience of database personnel, so that the execution efficiency of the access request is low, and the load and the processing load of the database system are high.
The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, may contain information that does not form the prior art that is already known to those of ordinary skill in the art in this country.
Disclosure of Invention
The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure propose data acquisition methods, apparatus, electronic devices, and computer readable media to solve one or more of the technical problems mentioned in the background section above.
In a first aspect, some embodiments of the present disclosure provide a data acquisition method, including: responding to the determination that the access mode is a mode of accessing the target database through the client, and acquiring an access request; acquiring a historical abnormal access request set aiming at the target database; determining whether the access request is an abnormal access request according to the historical abnormal access request set; responding to the determination that the access request is not an abnormal access request, and analyzing the access request to obtain a field to be detected; dividing the field to be detected in sequence according to a preset field to obtain a divided field sequence; performing field combination processing on the split field sequence to obtain a target data directory level field and a target database level field; and acquiring target data according to the target data directory level field and the target database level field.
In a second aspect, some embodiments of the present disclosure provide a data acquisition apparatus comprising: the first acquisition unit is configured to acquire an access request in response to determining that the access mode is a mode of accessing the target database through the client; a second acquisition unit configured to acquire a set of history abnormal access requests for the target database; a determining unit configured to determine whether the access request is an abnormal access request based on the history of abnormal access request sets; the analysis unit is configured to respond to the determination that the access request is not an abnormal access request, and analyze the access request to obtain a field to be detected; the segmentation unit is configured to segment the field to be detected according to preset fields in sequence to obtain a segmented field sequence; the field combination processing unit is configured to perform field combination processing on the split field sequence to obtain a target data directory level field and a target database level field; and a third acquisition unit configured to acquire the target data based on the target data directory hierarchy field and the target database hierarchy field.
In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.
In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a method as described in any of the implementations of the first aspect.
The above embodiments of the present disclosure have the following advantages: according to the data acquisition method of some embodiments of the present disclosure, the multi-level fields are combined into the standard MySQL field through the predetermined format, so that data in a larger range can be acquired, compatibility of a database is improved, operation load of a system is reduced, and data query efficiency is improved. Specifically, the associated degradation in system performance is caused by: when a user accesses target data by using a switch command, the switch command is not a standard MySQL command request, so that the access request of a database system is increased, the processing load of the system is increased, the access experience of the user is reduced, and the performance of the system is further reduced. Based on this, the data acquisition method of some embodiments of the present disclosure may first acquire an access request in response to determining that the access manner is a manner of accessing the target database through the client. Next, a set of historical abnormal access requests for the target database is obtained. Here, the resulting set of historical abnormal access requests is used to subsequently determine whether the current access request is an abnormal access request. And determining whether the access request is an abnormal access request according to the historical abnormal access request set. Here, determining whether to be an abnormal access request may improve security of the database, avoiding causing data leakage. And then, in response to determining that the access request is not an abnormal access request, analyzing the access request to obtain a field to be detected. The obtained field to be detected is favorable for cross-level access to the database, and the range of data acquisition is enlarged. And then, dividing the field to be detected according to the preset field in sequence to obtain a divided field sequence. Here, splitting the field to be detected may improve accuracy of splitting the field. And then, carrying out field combination processing on the split field sequence to obtain a target data directory level field and a target database level field. The field combination processing is performed on the split field sequence so as to obtain more accurate target data directory level fields and target database level fields, thereby facilitating the subsequent acquisition of target data. And finally, acquiring target data according to the target data directory level field and the target database level field. Therefore, the data acquisition method can combine the multi-level fields into one field through the preset format, and can acquire data in a larger range, so that modification of an access client is avoided, and modification cost is reduced.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
FIG. 1 is a flow chart of some embodiments of a data acquisition method according to the present disclosure;
FIG. 2 is a schematic diagram of the structure of some embodiments of a data acquisition device according to the present disclosure;
fig. 3 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates a flow 100 of some embodiments of a data acquisition method according to the present disclosure. The data acquisition method comprises the following steps:
In some embodiments, the execution body of the data acquisition method may acquire the access request in response to determining that the access manner is a manner of accessing the target database through the client. Wherein the target database may be a database comprising a 3-layer metadata hierarchy. For example, the target database may be Apache Doris, which contains a multi-source data directory. The client may be a MySQL protocol compliant client that includes a layer 2 metadata hierarchy. The access manner may be a manner of accessing the target database. The access request may be a request to access a structured query statement of the target database.
In some embodiments, the executing entity may obtain a set of historical abnormal access requests for the target database. The historical abnormal access request set may be a request set of abnormal SQL (Structured Query Language ) statements that access data in the target database before the current time. The abnormal SQL statement may be an SQL statement in which the data in the target database has corruption or leakage.
In some embodiments, the executing entity may determine whether the access request is an abnormal access request according to the historical abnormal access request set. The abnormal access request may be an access request that damages or leaks data in the database. For example, the exception access request may include at least one of: an access request comprising a temporal exception, an access request comprising an operation command exception, an access request comprising an accessed data object exception.
As an example, the execution subject may first perform feature extraction on the history abnormal access request set using a grammar-based feature extraction method and an environment-based feature extraction method to obtain a behavior feature data set. And secondly, processing the behavior characteristic data set to obtain a numerical behavior characteristic data set. And inputting the numerical behavior characteristic data set into an initial abnormal behavior classification model to obtain an abnormal behavior classification model. And then, extracting the characteristics of the access request to obtain behavior characteristic data. Finally, the behavior feature data is input into the abnormal behavior classification model to determine whether the access request is an abnormal access request.
Optionally, the executing body may further include the following steps after determining whether the access request is an abnormal access request according to the historical abnormal access request set:
and responding to the determination that the access request is an abnormal access request, carrying out abnormal access early warning on the access request, and sending the access request to a monitoring terminal. The abnormal access early warning may be a pop-up abnormal warning window.
In some optional implementations of some embodiments, the determining whether the access request is an abnormal access request according to the historical abnormal access request set may further include the following steps:
the first step is to split the history abnormal access request set to obtain a history abnormal associated field set. Wherein, the history exception association field in the history exception association field set may include: historical user information and historical anomaly fields. The history exception field may be a field in the history exception access request set in which an exception occurs. The historical user information may be information characterizing the identity of the user. For example, the history user information may be account information of the history user. In practice, the execution body may first perform word segmentation processing on the historical abnormal access request set to obtain a historical field set. Then, a historical user information field set and a historical abnormality field set are extracted from the historical field set, and each historical user information in the historical user information field set and each corresponding historical abnormality field in the historical abnormality field set are combined to obtain a historical abnormality association field set.
And secondly, splitting the access request to obtain a target associated field. Wherein, the target association field includes: target user information and a target field. The target field may be a field that characterizes an access behavior of the access request.
And thirdly, comparing the target associated field with each historical abnormality associated field in the historical abnormality associated field set according to the target user information to generate a comparison result, thereby obtaining a comparison result set. And the comparison result in the comparison result set represents whether the target association field is the same as the historical abnormal association field. The comparison result set includes: the target associated field is the same as the historical anomaly associated field and the target associated field is different from the historical anomaly associated field.
As an example, the execution subject may first extract, from the history abnormality related field set, a history abnormality related field set corresponding to the target user information as the user history abnormality related field set, using the target user information. And then, comparing the target field with each user history abnormality associated field in the user history abnormality associated field set to generate a comparison result, thereby obtaining a comparison result set.
And step four, determining whether the access request is an abnormal access request or not according to the comparison result set.
In some optional implementations of some embodiments, after determining whether the access request is an abnormal access request according to the comparison result set, the method may further include the following steps:
the first step is to input the history abnormal field set to an abnormal classification detection model to obtain an abnormal access type set. The set of exception access types may be an exception type set of the set of historical exception association fields. The abnormal access types in the abnormal access type set may include at least one of: a time exception type, an operation command exception type, and a data object exception type. The anomaly classification detection model may be a model that determines the anomaly type of the access request. The anomaly classification detection model may include, but is not limited to, at least one of: XGBoost (eXtreme Gradient Boosting) model, GBDT (Gradient Boosting Decision Tree, gradient lifting decision tree) model.
And a second step of determining a modification template of each abnormal access type in the abnormal access type set according to the abnormal access type set to obtain a modification template set. The modification template may be a template for modifying the abnormal access type to obtain a normal access request.
As an example, the execution body may, in response to determining that the access type is a data object exception type, the modification template for the data object exception type may be: first, historical user information, a historical access data table, and a historical operation type in an abnormal access request in a data object abnormal type are acquired. And then, obtaining the access rights of the historical user information to each data table in the target database through the historical user information. The access right may be a right of an operation command type that can be performed on each data table. And finally, modifying the historical access data table into any data table with the same type as the historical operation type to obtain a normal access request.
And thirdly, responding to the determination that the access request is an abnormal access request, and extracting the characteristics of the access request to obtain a target abnormal access characteristic item. The target abnormal access feature item may be a feature item that characterizes the access request.
And step four, extracting features of the historical abnormal access request set corresponding to each abnormal access type in the abnormal access type set to generate a historical abnormal access feature item, and obtaining the historical abnormal access feature item set. Wherein, the history abnormal access characteristic item can characterize the abnormal characteristics of the abnormal access type. For example, the abnormality access characteristic item corresponding to the above-described time abnormality type may be a characteristic of a time abnormality of access to the target database. The set of historical abnormal access requests corresponding to each abnormal access type is a subset of the set of historical abnormal access requests.
And fifthly, in response to determining that the history abnormal access characteristic item which is the same as the target abnormal access characteristic item exists in the history abnormal access characteristic item set, determining a modification template of an abnormal access type corresponding to the access request so as to modify the access request, obtaining a normal access request, and sending the normal access request to a corresponding access terminal. The access terminal may be a terminal that transmits the access request.
And sixthly, in response to determining that the history abnormal access characteristic item which is the same as the target abnormal access characteristic item does not exist in the history abnormal access characteristic item set, sending the access request to a monitoring terminal so as to enable a security monitoring personnel to process the access request and obtain a processing result. The processing result may be a result of the security monitor performing an abnormal feature extraction on the access request to determine a new abnormal access type corresponding to the access request.
And seventh, receiving the processing result sent by the management terminal and modifying the abnormal access type set.
The related content in some optional implementations of some embodiments in step 103 described above is taken as an invention point of some embodiments of the present disclosure, which solves the second technical problem mentioned in the background art, namely that an abnormal access rule base is constructed through association rules, abnormal access detection is performed on an access request, and because frequent scanning is required for historical access requests, data in the rule base does not have timeliness and detection accuracy is low, resulting in reduced security of a database system, and increased computation and processing load of the system. ". Factors that cause the security of the database system to be lowered and increase the operation amount and processing load of the system are often as follows: an abnormal access rule base is constructed through association rules, abnormal access detection is carried out on access requests, and due to the fact that historical access requests need to be scanned frequently, data in the rule base are not time-efficient and low in detection accuracy, the safety of a database system is reduced, and the operation amount and the processing load of the system are increased. If the above factors are solved, the effects of improving the security of the database system and reducing the operation amount and processing load of the system can be achieved. To achieve this, first, the above-mentioned history abnormal access request set is split to obtain a history abnormal associated field set. And secondly, determining a modification template of each abnormal access type in the abnormal access type set according to the abnormal access type set to obtain a modification template set. Here, a set of modification templates is obtained to facilitate bulk modification of the abnormal access request. And splitting the access request to obtain a target associated field. And comparing the target associated field with each historical abnormality associated field in the historical abnormality associated field set according to the target user information to generate a comparison result, thereby obtaining a comparison result set. And determining whether the access request is an abnormal access request according to the comparison result set. The comparison between the target associated field and each historical abnormality associated field improves the abnormality detection accuracy, avoids frequent scanning of the database, and reduces the operation amount and processing load of the database system. And then, inputting the historical abnormal field set into an abnormal classification detection model to obtain an abnormal access type set. Here, obtaining the set of abnormal access types facilitates determining the same type of abnormal access feature item and modifying the template, reducing the amount of computation and the system load. And responding to the determination that the access request is an abnormal access request, and extracting the characteristics of the access request to obtain a target abnormal access characteristic item. And extracting features of the historical abnormal access request set corresponding to each abnormal access type in the abnormal access type set to generate a historical abnormal access feature item, and obtaining a historical abnormal access feature item set. Here, deriving the target and historical abnormal access feature sets facilitates determining an abnormal access type of the access request for modification thereof. Then, in response to determining that the history abnormal access characteristic item identical to the target abnormal access characteristic item exists in the history abnormal access characteristic item set, a modification template of an abnormal access type corresponding to the access request is determined so as to modify the access request, and a normal access request is obtained and sent to a corresponding access terminal. Here, modifying the abnormal access request facilitates improving query efficiency and improving user experience. And finally, in response to determining that the history abnormal access characteristic item which is the same as the target abnormal access characteristic item does not exist in the history abnormal access characteristic item set, sending the access request to a monitoring terminal so as to enable a security monitoring personnel to process the access request and obtain a processing result. And receiving a processing result sent by the management terminal, and modifying the abnormal access type set. Here, modifying the set of abnormal access types facilitates improving timeliness and detection accuracy of the abnormal access types. Therefore, timeliness and detection accuracy of anomaly detection are improved, frequent scanning of the database system is reduced, and calculation amount and processing load of the database system are reduced.
Alternatively, the anomaly classification detection model may be trained by:
first, a sample set is obtained. Wherein the sample set includes: a sample history abnormal field set and a sample classification result set corresponding to the sample history abnormal field set.
Second, based on the sample set, the following training steps are performed:
and 1, preprocessing the sample history abnormal field set in the sample set to obtain a processed sample history abnormal field set. Wherein the pretreatment may include, but is not limited to: and (5) carrying out numerical treatment and normalization treatment. In practice, the execution body may first perform, by using one-hot encoding, a numerical processing on the sample history abnormal field set in the sample set, to obtain a processed sample history abnormal value set. And then, carrying out normalization processing on the processed sample history abnormal value set by using Min-Max normalization to obtain a normalized sample history value field set.
And 2, inputting the processed sample history abnormal field set to a mixed sampling layer in an initial abnormal classification detection model to obtain a mixed abnormal field set. The mixed anomaly field set may be a sample set with balanced sample data size distribution of each anomaly type. The above-mentioned mixed sampling layer may be an increase in the number of samples for a small number of samples corresponding to an abnormal type in the sample set and a decrease in the number of samples for a large number of samples corresponding to an abnormal type in the sample set, so as to solve the problem of uneven distribution of the number of samples. The initial anomaly classification detection model may be a model that determines the anomaly type of the access request. The abnormality classification detection model further includes: the system comprises a plurality of multi-head self-attention mechanism layers, a plurality of residual modules, a plurality of standardized processing layers, a plurality of feedforward network layers, a plurality of dimension-increasing convolution layers with different convolution kernels, a plurality of depth-separable convolution layers with different convolution kernels, a plurality of compression excitation modules, a plurality of dimension-decreasing convolution layers with different convolution kernels, a full-connection layer, a plurality of two-way long-short-term memory network layers, a time feature updating gate, a feature reconstruction layer, a self-attention mechanism module, a gating circulation unit and a nonlinear classification layer.
And 3, inputting the mixed abnormal field set to a first multi-head self-attention mechanism layer to obtain a first characteristic vector set.
And step 4, inputting the first characteristic vector set to a first residual error module to obtain a second characteristic vector set.
And a sub-step 5 of adding the first feature vector set and the second feature vector set to obtain a third feature vector set.
And step 6, inputting the third feature vector set into the first normalization processing layer to obtain a fourth feature vector set.
And 7, inputting the fourth feature vector set to the first feedforward network layer to obtain a fifth feature vector set.
And 8, inputting the fifth feature vector set to a second multi-head self-attention mechanism layer to obtain a sixth feature vector set.
And step 9, inputting the sixth feature vector set to a second residual error module to obtain a seventh feature vector set.
And a substep 10, adding the sixth feature vector set and the seventh feature vector set to obtain an eighth feature vector set.
And step 11, inputting the eighth feature vector set to a second normalization processing layer to obtain a ninth feature vector set.
And a sub-step 12 of inputting the ninth feature vector set to the second feedforward network layer to obtain a tenth feature vector set.
And a sub-step 13 of performing nonlinear mapping on the tenth feature vector set to obtain an eleventh feature vector set.
A substep 14 of determining a covariance matrix of the eleventh feature vector set. Wherein the covariance matrix may represent a degree of similarity between feature vectors in the eleventh feature vector set.
And 15, determining a characteristic value set and a corresponding characteristic vector set of the covariance matrix. Wherein, the characteristic values and the characteristic vectors have a one-to-one correspondence.
In practice, the executing body may solve the covariance matrix by using a eigenvalue decomposition method to obtain an eigenvalue set and a corresponding eigenvector set.
And a sub-step 16 of sorting the characteristic value sets in a descending order to obtain a characteristic value sequence.
And step 17, screening out the characteristic values which are larger than or equal to the preset characteristic values from the characteristic value sequence to obtain a target characteristic value sequence. The preset feature value can represent the capability index that the average of the feature vectors corresponding to the screened feature vectors can explain the original feature vectors. For example, the preset feature value may be 1.
And a sub-step 18 of splicing the feature vectors corresponding to the target feature value sequence to obtain a target feature vector set. The target feature vector set may be a vector set that is not related to each other among feature vectors obtained after the dimension reduction.
And a substep 19, inputting the target feature vector set to an up-dimension convolution layer of a convolution kernel of 1×1 to obtain a twelfth feature vector set.
In a substep 20, the twelfth set of feature vectors is input to the first depth separable convolution layer of the 1×1 convolution kernel, resulting in a thirteenth set of feature vectors.
And a sub-step 21 of inputting the thirteenth feature vector set to the compression excitation module to obtain a fourteenth feature vector set. Wherein, the compression excitation module may include: the average pooling layer and the two layers of activation functions are fully connected layers of the hard-swish function. The average pooling layer may be a pooling layer that downsamples the first feature vector and obtains global compressed information. The fully connected layer whose two activation functions are hard-swish functions may be fully connected layers giving different weights to the channels. The compression excitation module may be a feature map operation module based on a channel attention mechanism. The compression excitation module can firstly perform compression operation on the feature map, and then perform global average pooling operation in the channel dimension direction to obtain a module of global features in the channel dimension direction of the feature map.
In the substep 22, the fourteenth feature vector set is input to the dimension-reduction convolution layer of the convolution kernel of 1×1, so as to obtain a fifteenth feature vector set.
In a substep 23, the fifteenth feature vector set is input to an up-dimension convolution layer of a convolution kernel of 2×2, to obtain a sixteenth feature vector set.
A substep 24, inputting the sixteenth set of feature vectors into the second depth separable convolution layer of the 2×2 convolution kernel to obtain a seventeenth set of feature vectors.
In the substep 25, the seventeenth feature vector set is input to the compression excitation module, so as to obtain an eighteenth feature vector set.
In a substep 26, the eighteenth feature vector set is input to a dimension-reduction convolution layer of a convolution kernel of 2×2, to obtain a nineteenth feature vector set.
In a substep 27, the nineteenth feature vector set is input to a dimension-reduction convolution layer of a convolution kernel of 3×3, to obtain a twentieth feature vector set.
In a substep 28, the twentieth feature vector set is input to an upwarp convolution layer of a convolution kernel of 3×3, to obtain a twentieth first feature vector set.
In a substep 29, the twenty-first feature vector set is input to the compression excitation module, so as to obtain a twenty-second feature vector set.
In the substep 30, the twenty-second feature vector set is input to a dimension-reduction convolution layer of a convolution kernel of 3×3, so as to obtain a twenty-third feature vector set.
In the substep 31, the twenty-third feature vector set is input to the fully-connected layer, so as to obtain a twenty-fourth feature vector set.
In the sub-step 32, the first feature vector set is input to the first two-way long-short term memory network layer, so as to obtain a twenty-fifth feature vector set. Wherein the twenty-fifth set of feature vectors may be a set of feature vectors comprising a time sequence.
In a substep 33, the twenty-fifth feature vector set is input to the temporal feature update gate, and a twenty-sixth feature vector set is obtained. Wherein the above-described twenty-sixth feature vector set may be a feature vector set comprising more fine-grained temporal features.
In a substep 34, the twenty-sixth feature vector set is input to the feature reconstruction layer, so as to obtain a twenty-seventh feature vector set.
In the substep 35, the twenty-seventh feature vector set is input to the second bidirectional long-short term memory network layer, so as to obtain the twenty-eighth feature vector set. The second bidirectional long-short-term memory network layer has finer extraction granularity than the first bidirectional long-short-term memory network layer so as to better extract time characteristic information.
In the substep 36, feature fusion is performed on the twenty-seventh feature vector set and the twenty-eighth feature vector set, so as to obtain a twenty-ninth feature vector set.
In a substep 37, the twenty-ninth feature vector set is input to the self-attention mechanism module, so as to obtain a thirty-first feature vector set. The self-attention mechanism module may be a module that gives different weights to thirty-third feature vectors in the thirty-third feature vector set.
In a substep 38, the thirty-first feature vector set is obtained by inputting the thirty-first feature vector set to the gate cycle unit. Wherein, the gate control circulation unit may include: an update gate and a reset gate.
In a substep 39, the thirty-first feature vector set is input to the nonlinear classification layer, so as to obtain an abnormal access type set. The nonlinear classification layer may be a classification layer including an activation function of hard-swish.
In response to determining that the error between the set of anomaly access types and the set of sample classification results is less than or equal to a preset threshold, determining the initial anomaly classification detection model as an anomaly classification detection model. The preset threshold may represent the detection accuracy of the initial abnormal classification model. For example, the preset threshold may be 0.7.
And thirdly, in response to determining that the error between the abnormal access type set and the sample classification result set is greater than a preset threshold, adjusting related parameters of the initial abnormal classification detection model, re-acquiring the sample set, and executing the training step again.
The above related matters are taken as an invention point of the embodiment of the disclosure, and solve the third technical problem mentioned in the background art, namely that the abnormal access type of the access request is determined by adopting the multi-layer convolutional neural network, and although the abnormal access type can be determined, the problem of insufficient extraction of important characteristic information exists, so that the classification accuracy is lower. ". Factors that lead to lower classification accuracy and higher false alarm rate tend to be as follows: the multi-layer convolutional neural network is adopted to determine the abnormal access type of the access request, and although the abnormal access type can be determined, the problem of insufficient extraction of important characteristic information exists, so that the classification accuracy is low. If the above factors are solved, the effects of improving the classification accuracy and reducing the false alarm rate can be achieved. To achieve this, first, a sample set is acquired. Here, the acquisition of the sample set facilitates subsequent training of the anomaly classification detection model. Then, for each sample in the sample set, the following training steps are performed: and preprocessing the sample history abnormal field set in the sample set to obtain a processed sample history abnormal field set. Here, preprocessing the sample history anomaly value set can make the data conform to the input of the anomaly classification detection model, and can improve the classification accuracy of the model, reduce the model training time and data redundancy. And inputting the processed sample history abnormal field set to a mixed sampling layer in an initial abnormal classification detection model to obtain a mixed abnormal field set. Here, mixed sampling may make the samples more uniform, thereby improving classification accuracy. And inputting the mixed abnormal field set into a first multi-head self-attention mechanism layer to obtain a first characteristic vector set. And inputting the first characteristic vector set to a first residual error module to obtain a second characteristic vector set. And carrying out feature vector set addition on the first feature vector set and the second feature vector set to obtain a third feature vector set. And inputting the third characteristic vector set into the first normalization processing layer to obtain a fourth characteristic vector set. And inputting the fourth characteristic vector set into the first feedforward network layer to obtain a fifth characteristic vector set. And inputting the fifth feature vector set into a second multi-head self-attention mechanism layer to obtain a sixth feature vector set. And inputting the sixth characteristic vector set to a second residual error module to obtain a seventh characteristic vector set. And adding the sixth feature vector set and the seventh feature vector set to obtain an eighth feature vector set. And inputting the eighth feature vector set to a second normalization processing layer to obtain a ninth feature vector set. And inputting the ninth feature vector set to a second feedforward network layer to obtain a tenth feature vector set. Here, the feature extraction is performed on the mixed anomaly field set through a plurality of multi-head self-attention mechanism layers, a plurality of residual error modules, a plurality of standardized processing layers and a plurality of feedforward network layers, so that the extracted features can be more comprehensive and accurate. And carrying out nonlinear mapping on the tenth characteristic vector set to obtain an eleventh characteristic vector set. And determining a covariance matrix of the eleventh eigenvector set. And determining the eigenvalue set and the corresponding eigenvector set of the covariance matrix. And sorting the characteristic value sets in a descending order to obtain a characteristic value sequence. And screening out the characteristic values which are larger than or equal to the preset characteristic values from the characteristic value sequences to obtain target characteristic value sequences. And splicing the feature vectors corresponding to the target feature value sequences to obtain a target feature vector set. Here, the data dimension reduction is performed on the eleventh feature vector set by the feature value and the feature vector, so that the operation amount and the redundant information can be reduced. And inputting the target feature vector set into an up-dimension convolution layer of a convolution kernel of 1 multiplied by 1 to obtain a twelfth feature vector set. The twelfth feature vector set is input to the first depth separable convolution layer of the 1×1 convolution kernel, resulting in a thirteenth feature vector set. And inputting the thirteenth feature vector set to the compression excitation module to obtain a fourteenth feature vector set. And inputting the fourteenth feature vector set into a dimension-reducing convolution layer of a convolution kernel of 1 multiplied by 1 to obtain a fifteenth feature vector set. The fifteenth feature vector set is input to an up-dimension convolution layer of a convolution kernel of 2×2, and a sixteenth feature vector set is obtained. The sixteenth set of feature vectors is input to a second depth separable convolution layer of a 2 x 2 convolution kernel to obtain a seventeenth set of feature vectors. And inputting the seventeenth feature vector set into a compression excitation module to obtain an eighteenth feature vector set. And inputting the eighteenth feature vector set into a dimension-reducing convolution layer of a 2×2 convolution kernel to obtain a nineteenth feature vector set. And inputting the nineteenth feature vector set into a dimension-reducing convolution layer of a convolution kernel of 3×3 to obtain a twentieth feature vector set. And inputting the twentieth feature vector set into an up-dimension convolution layer of a convolution kernel of 3×3 to obtain a twenty-first feature vector set. And inputting the twenty-first feature vector set into a compression excitation module to obtain a twenty-second feature vector set. And inputting the twenty-second feature vector set into a dimension-reducing convolution layer of a convolution kernel of 3 multiplied by 3 to obtain a twenty-third feature vector set. Here, different weights are given to different channels in the feature vector through three groups of dimension-reducing convolution layers, dimension-increasing convolution layers, depth-separable convolution layers and three groups of compression excitation modules of different convolution kernels, so that classification accuracy is improved. And inputting the twenty-third feature vector set into the fully-connected layer to obtain a twenty-fourth feature vector set. And inputting the first feature vector set into a first two-way long-short-term memory network layer to obtain a twenty-fifth feature vector set. And inputting the twenty-fifth feature vector set into the time feature updating gate to obtain a twenty-sixth feature vector set. And inputting the twenty-sixth feature vector set into the feature reconstruction layer to obtain a twenty-seventh feature vector set. And inputting the twenty-seventh feature vector set into a second bidirectional long-short-term memory network layer to obtain a twenty-eighth feature vector set. And carrying out feature fusion on the twenty-seventh feature vector set and the twenty-eighth feature vector set to obtain a twenty-ninth feature vector set. In this case, the time feature extraction is performed by the two-way long-short-term memory network layers with different granularities, so that the long-time interval dependency information can be better reserved, the time feature can be better extracted, and the model training speed can be improved. And inputting the twenty-ninth feature vector set into the self-attention mechanism module to obtain a thirty-eighth feature vector set. Here, the self-attention mechanism module dynamically assigns different weight information to each attribute of the input feature vector, and directly connects the dependency relations between the sequences, so as to reduce the distance between the remotely dependent attributes. And inputting the thirty-first feature vector set into the gating circulating unit to obtain a thirty-first feature vector set. Here, training is performed through the gating circulation unit, so that training efficiency of the model can be improved, and training time is shortened. And inputting the thirty-first feature vector set into the nonlinear classification layer to obtain an abnormal access type set. And determining the initial abnormal classification detection model as an abnormal classification detection model in response to determining that the error between the abnormal access type set and the sample classification result set is less than or equal to a preset threshold. Finally, in response to determining that the error between the abnormal access type set and the sample classification result set is greater than a preset threshold, adjusting relevant parameters of the initial abnormal classification detection model, re-acquiring the sample set, and executing the training step again. Therefore, the time characteristic information and the space characteristic information of the sample history abnormal field set are respectively extracted, then the time characteristic information and the space characteristic information are subjected to characteristic information fusion, different weights are dynamically given to the characteristic information, and the hard-swish is used for reducing the calculation difficulty, improving the classification accuracy and improving the model classification accuracy.
And 104, responding to the determination that the access request is not an abnormal access request, and analyzing the access request to obtain a field to be detected.
In some embodiments, the executing entity may parse the access request to obtain the field to be detected in response to determining that the access request is not an abnormal access request. The field to be detected may be a field obtained by combining a data directory level field and a database level field according to a preset format. The preset format may be a concatenation of preset fields, data directory level fields, preset fields, and database level fields. The preset field may be underlined.
As an example, the execution body may parse and semantically analyze the access request to obtain a field to be detected.
And 105, dividing the field to be detected according to the preset field in sequence to obtain a divided field sequence.
In some embodiments, the execution body may segment the field to be detected sequentially according to a preset field to obtain a segment field sequence. Wherein, the preset field may be underlined. The split field in the split field sequence may be a field obtained by removing a preset field from the field to be detected.
As an example, the execution body may sequentially remove the fields including the preset field from the fields to be detected, to obtain the split field sequence.
And 106, performing field combination processing on the split field sequence to obtain a target data directory level field and a target database level field.
In some embodiments, the execution body may perform a field combination process on the split field sequence to obtain a target data directory level field and a target database level field. The target data directory level field may represent a field corresponding to a data level higher than the database level where the database including the target data is located. The target database hierarchy field may be a field corresponding to a database including target data.
In some optional implementations of some embodiments, the performing a field combination process on the split field sequence to obtain a target data directory level field and a target database level field may include the following steps:
step one, acquiring a data directory hierarchy field set in the target database. The set of fields in the data directory hierarchy may be a set of fields corresponding to a data directory hierarchy of the target database.
And secondly, combining the data directory hierarchy field sets by using a preset format to obtain combined data directory hierarchy field sets. The preset format may be a format obtained by splicing and combining a preset field, a data directory level field, a preset field and a database level field. The above combination may be a combination of a preset field, a data directory hierarchy field, a preset field.
And thirdly, screening the combined data directory hierarchy field matched with the field to be detected from the combined data directory hierarchy field set to serve as a target combined data directory hierarchy field. The field corresponding to the field to be detected may be a field having the same number of characters as the combined data directory hierarchy field.
Fourth, determining the data directory hierarchy field corresponding to the target combined data directory hierarchy field as the target data directory hierarchy field.
And fifthly, acquiring a database hierarchy field set corresponding to the target data directory hierarchy field.
And sixthly, combining the target data directory level field and each database level field in the database level field set by using a preset format to generate a combined field, so as to obtain a combined field set.
As an example, the execution body may perform field combination on each database hierarchy field in the data directory hierarchy field and the database hierarchy field set by using a preset field in a preset format, so as to generate a combined field with the same format as the preset format, and obtain a combined field set.
And seventh, screening out the combined fields matched with the fields to be detected from the combined field set to be used as target combined fields.
As an example, the execution body may match each combined field in the combined field set with the field to be detected by using a naive matching algorithm, to obtain a combined field matched with the field to be detected, as the target combined field.
And eighth step, determining the data directory level field and the database level field corresponding to the combined target field as a target data directory level field and a target database level field respectively.
In some embodiments, the execution body may obtain the target data according to the target data directory hierarchy field and the target database hierarchy field. The target data may be data to be queried.
As an example, the executing entity may query the target database with an access request including the target data directory level field and the target database level field to obtain target data.
Optionally, after 107, the above execution body may further execute the following steps:
first, a set of query requests is obtained in response to determining that the access manner is a manner of accessing a metadata database. The metadata database may be an information_schema database. The query request in the query request set may be a request for querying data in the information_schema database.
And secondly, screening out the query requests meeting preset conditions from the query request set, and taking the query requests as target query requests. The preset condition is that the data table field in the query request comprises a first preset field. The first preset field may be a field in a data table. The first preset field may be a table_category field.
And thirdly, updating the target query request to obtain an updated target query request. The target query request may be a request including two filtering conditions. The above-described filtering condition may be a condition that restricts data to be queried. For example, the two filtering conditions may be the conditions of table_schema and table_name. The updated target query request may be a request including three filtering conditions. For example, the three filtering conditions may be the conditions of table_ catalog, table _schema and table_name, respectively. The update may be an update to the number of filter criteria in the target query request. For example, the target query request may be select information_schema_table_schema= "_catalog1_db1" and table_name= "tb1". The updated target query request may be selected from information_schema_column_table_category= "category 1" and table_schema= "db1" and table_name= "tb1".
Fourth, according to the updated target inquiry request, the target data are obtained.
As an example, the execution body may parse the updated target query request in a syntax and semantic manner to obtain a data directory level field, a database level field, and a data table field where the target data to be queried is located, so as to obtain the target data.
In some optional implementations of some embodiments, the obtaining the target data according to the updated target query request may include the following steps:
and the first step is to analyze the updated target query request to obtain a keyword set. Wherein the keyword set may be a word set of some inherent usages in the updated target query request. For example, the keyword set may include: select, from, and where. In practice, the execution body may analyze the updated target query request by using a preset analysis tool to obtain a keyword set.
And secondly, scoring each keyword in the keyword set to generate a scoring value, and obtaining a scoring value set. The scoring value may represent a value of how efficient the keyword is in querying the database in the updated target query request. The higher the score value, the less efficient the keyword is in querying the database.
And thirdly, determining the complexity of the updated target query request according to the evaluation value set. Wherein, the complexity may characterize the degree of query efficiency when the updated target query requests the target data.
As an example, the execution entity may determine the complexity of the updated target query request by using the score value corresponding to each keyword and the number of keywords in the updated target query request. The scoring values are determined by query definitions, occurrence numbers and association relationships of the individual keywords. For example, if the updated target query request includes only select and from, the select score is 0 and the from score is 0. When the updated target query request includes 2 choices, the score of the choices is 1, and the corresponding keywords join may be set to 2, unit to 2, group by to 3. Meanwhile, when the updated target query request comprises two units, the unit score is 3, and the like, and particularly, keywords can be scored and limited based on query definition, occurrence times, association relations and the like.
And step four, responding to the fact that the complexity is larger than or equal to a preset complexity threshold value, acquiring field data corresponding to each keyword in the keyword set, and obtaining a field data set. The predetermined complexity threshold may be 15. When the updated target query request does not include sub-query statements, the field data may be field data between adjacent keywords in the updated target query request. For example, the target query request may be: select username from userinfo, wherein the username is a username field. usenfo is the name of the user information table in the database. Keywords are select and from. When the keyword is select, the field data corresponding to select is username, and when the keyword is from, the field data corresponding to from is userinfo. When the updated target query request includes sub-query statements, the field data may include keywords and non-keywords, for example, when the SQL statement includes sub-queries, the target query request is: select from emp where sal < (select avg (sal) from emp), where emp is the employee data table name in the database. sal is the salary data column name in the employee data table. avg () is a function of the operational average. The key words are as follows: 2 choices, 2 from,1 where, when the keyword is where, the field data corresponding to where is sal < (select AVG (sal) from emp).
And fifthly, extracting the fields in the field data set to obtain the request parameters. The field data set and the request parameter may be a field data set and a request parameter of the updated target access request. The request parameters may include at least one field. The request parameter may be field data to be queried in the updated target query request. For example, the target query request may be: select username from userinfo. The request parameter may be username. The target query request is: select from emp where sal < (select avg (sal) from emp). The corresponding request parameters may be sal and avg (sal). It should be specifically noted that, different SQL statements and corresponding request parameters are different, which can be understood that the request parameters are set based on the query requirement and the query definition of the SQL statement, the query requirement is used for representing the requirement before writing the SQL statement, and the query definition is used for representing the definition after writing the SQL statement.
And sixthly, acquiring a corresponding optimization set from a preset optimization database according to the request parameters. The preset optimizing database may be a database storing optimizing access requests. The preset optimization database comprises various request parameters, an optimization example set corresponding to each request parameter and an optimization strategy corresponding to each request parameter. The optimization set comprises a plurality of optimization examples and an optimization strategy corresponding to each optimization example.
As an example, the executing body may perform a similarity operation from each optimization request parameter in a preset optimization database through the request parameter to obtain a similarity value set. And screening an optimized set corresponding to a preset similarity threshold value or more from the similarity value set. The preset similarity threshold may be 0.8.
And seventh, determining the similarity between the updated target query request and each optimization example in the plurality of optimization examples to obtain a similarity set. Wherein the similarity characterizes a degree of similarity of the updated target query request to the optimization example.
As an example, the execution body may first convert the updated target query request and the plurality of optimization examples into a target vector and an optimization vector set. Wherein, the optimization examples and the optimization vector sets have a one-to-one correspondence in quantity. And then, determining the cosine value of each optimized vector in the target vector and the optimized vector set to obtain a cosine value set. And finally, determining the cosine numerical value set as a similarity set.
And eighth step, selecting an optimization strategy corresponding to the optimization example with the maximum similarity from the similarity set as a target optimization strategy to optimize the updated target query statement so as to obtain an optimized target query request.
And ninth, comparing the optimized query result obtained by the optimized target query request with the target query result of the updated target query request to obtain a comparison result. The comparison result represents whether the optimized query result and the target query result are the same or not. The comparison result comprises: the optimized query result is the same as the target query result after the optimization. The optimized query result and the target query result are different after the optimization.
And tenth, in response to determining that the comparison result is the same as the optimized query result and the target query result, acquiring the target data through the optimized target query request.
The technical scheme and the related content are taken as an invention point of the embodiment of the disclosure, so that the technical problem four' of optimizing the performance of the access request in the background art is solved, and in the prior art, the access request statement is mainly optimized according to the experience of database personnel, so that the execution efficiency of the access request is low, and the load and the processing load of a database system are higher. ". Factors that cause the database system load and the processing load to be high are often as follows: for performance optimization of an access request, in the prior art, access request sentences are mainly optimized according to experience of database personnel, so that the execution efficiency of the access request is low, and the load and the processing load of a database system are high. If the above factors are solved, the effect that can be achieved is to reduce the database system load and the processing load. To achieve this, first, the updated target query request is parsed to obtain a keyword set. The keyword set is obtained, so that finer granularity processing is facilitated for the updated target query request, execution logic of the updated target query request and a place with low query efficiency are determined, and subsequent optimization is facilitated. And secondly, scoring each keyword in the keyword set to generate a scoring value, thereby obtaining a scoring value set. And determining the complexity of the updated target query request according to the evaluation value set. Here, determining the complexity of the updated target query request based on the score facilitates determining subsequent optimizations for the updated target query statement, as well as specific optimization modes. And thirdly, in response to determining that the complexity is greater than or equal to a preset complexity threshold, acquiring field data corresponding to each keyword in the keyword set, and obtaining a field data set. And extracting the fields from the field data set to obtain the request parameters. The obtained request parameters are convenient for determining the target optimization strategy, and the query efficiency is improved. And then, according to the request parameters, acquiring a corresponding optimization set from a preset optimization database. And determining the similarity between the updated target query request and each optimization example in the plurality of optimization examples to obtain a similarity set. And screening an optimization strategy corresponding to the optimization example with the maximum similarity from the similarity set, and taking the optimization strategy as a target optimization strategy to optimize the updated target query statement so as to obtain an optimized target query request. The target optimization statement is determined according to the request parameters of the updated target query request, so that the updated target query request is optimized, the query efficiency is improved, and the load and the operation load of the database system are reduced. And finally, comparing the optimized query result obtained by the optimized target query request with the target query result of the updated target query request to obtain a comparison result. And in response to determining that the comparison result is the same as the optimized query result and the target query result, acquiring the target data through the optimized target query request. The optimized query result is compared with the target query result so as to determine whether the optimized target query request is correct or not, and the query accuracy of the query request is ensured. Thus, the complexity of the updated access request is determined by extracting the keyword from the updated access request. And optimizing the updated access request according to the complexity and the preset optimization database, so that the query efficiency of the updated access request is improved, and the load and the processing load of the database system are reduced.
In some optional implementations of some embodiments, the updating the target query request to obtain an updated target query request may include the following steps:
the first step, word segmentation processing is carried out on the target query request to obtain a field set. Wherein the set of fields may be a set of fields in the target query request. For example, the field set may include: select, from, information _schema, columns, where, table_schema, _catal1_db1, and, table_name, and tb1.
And secondly, responding to the fact that the field set comprises a second preset field, and analyzing the attribute field corresponding to the second preset field. The second preset field may be one field in the filtering condition. The second preset field may be a table_schema. The attribute field corresponding to the second preset field may be an attribute value field of the second preset field in the corresponding database. The attribute value field may be a text field or a numeric field. The corresponding database may be a database to be queried in the target query request. The attribute field corresponding to the second preset field may be_catag1_db1.
And thirdly, responding to the analysis to the target data directory level field and the target database level field, and adding a first preset field into the target query request to obtain a first added target query request. Wherein, the attribute field of the first preset field in the first post-addition target query request is the target data directory hierarchy field.
As an example, when the execution body parses the target data target level field and the target database level field, a filtering condition corresponding to the first preset field may be added to the filtering condition in the target query request. For example, the filtering condition in the target query request may be a condition of table_schema= "_cataog1_db1" and table= "tb1", and after adding the filtering condition corresponding to the first preset field, the filtering condition may be a condition of table_cataog= "cataog" and table_schema= "db1" and table= "tb 1".
Fourth, updating the first added target query request to obtain an updated target query request. Wherein, the attribute field of the second preset field in the updated target query request is the target database level field. The updating may be updating an attribute field of the second preset field in the added target query request, that is, determining the target database hierarchy field as a new attribute field of the second preset field.
Optionally, after the adding the first preset field in the target query request in response to parsing the target data directory level field and the target database level field to obtain the first post-adding target query request, the method further includes the following steps:
and the first step is to add a first preset field in the target query request to obtain a second added target query request in response to the fact that the target data directory level field is not analyzed. Wherein, the second post-addition target query request may be a request for increasing the filtering condition in the target query request.
And secondly, determining the attribute field of the first preset field in the second added target query request as a default data directory level field. The default data directory level field may be a field corresponding to an original data directory level in the target database.
The above embodiments of the present disclosure have the following advantages: according to the data acquisition method of some embodiments of the present disclosure, the multi-level fields are combined into the standard MySQL field through the predetermined format, so that data in a larger range can be acquired, compatibility of a database is improved, operation load of a system is reduced, and data query efficiency is improved. Specifically, the associated degradation in system performance is caused by: when a user accesses target data by using a switch command, the switch command is not a standard MySQL command request, so that the access request of a database system is increased, the processing load of the system is increased, the access experience of the user is reduced, and the performance of the system is further reduced. Based on this, the data acquisition method of some embodiments of the present disclosure may first acquire an access request in response to determining that the access manner is a manner of accessing the target database through the client. Next, a set of historical abnormal access requests for the target database is obtained. Here, the resulting set of historical abnormal access requests is used to subsequently determine whether the current access request is an abnormal access request. And determining whether the access request is an abnormal access request according to the historical abnormal access request set. Here, determining whether to be an abnormal access request may improve security of the database, avoiding causing data leakage. And then, in response to determining that the access request is not an abnormal access request, analyzing the access request to obtain a field to be detected. The obtained field to be detected is favorable for cross-level access to the database, and the range of data acquisition is enlarged. And then, dividing the field to be detected according to the preset field in sequence to obtain a divided field sequence. Here, splitting the field to be detected may improve accuracy of splitting the field. And then, carrying out field combination processing on the split field sequence to obtain a target data directory level field and a target database level field. The field combination processing is performed on the split field sequence so as to obtain more accurate target data directory level fields and target database level fields, thereby facilitating the subsequent acquisition of target data. And finally, acquiring target data according to the target data directory level field and the target database level field. Therefore, the data acquisition method can combine the multi-level fields into one field through the preset format, and can acquire data in a larger range, so that modification of an access client is avoided, and modification cost is reduced.
With further reference to fig. 2, as an implementation of the method shown in the above figures, the present disclosure provides some embodiments of a data acquisition apparatus, which correspond to those method embodiments shown in fig. 1, and which are particularly applicable in various electronic devices.
As shown in fig. 2, a data acquisition apparatus 200 includes: a first acquisition unit 201, a second acquisition unit 202, a determination unit 203, an analysis unit 204, a segmentation unit 205, a field combination processing unit 206, and a third acquisition unit 207. Wherein the acquisition unit 201 is configured to: and responding to the mode of determining the access mode to access the target database through the client, and acquiring an access request. The second acquisition unit 202 is configured to: a set of historical abnormal access requests for the target database is obtained. The determination unit 203 is configured to: and determining whether the access request is an abnormal access request according to the historical abnormal access request set. The parsing unit 204 is configured to: and in response to determining that the access request is not an abnormal access request, analyzing the access request to obtain a field to be detected. The dividing unit 205 is configured to: and dividing the field to be detected in sequence according to the preset field to obtain a divided field sequence. The field combination processing unit 206 is configured to: and carrying out field combination processing on the split field sequence to obtain a target data directory level field and a target database level field. The third acquisition unit 207 is configured to: and acquiring target data according to the target data directory level field and the target database level field.
It will be appreciated that the elements described in the data acquisition device 200 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations, features and advantages described above with respect to the method are equally applicable to the data acquisition device 200 and the units contained therein, and are not described here again.
Referring now to fig. 3, a schematic diagram of an electronic device (e.g., electronic device) 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 3 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 3 may represent one device or a plurality of devices as needed.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 309, or from storage device 308, or from ROM 302. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.
It should be noted that, in some embodiments of the present disclosure, the computer readable medium may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (Hyper Text Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: responding to the determination that the access mode is a mode of accessing the target database through the client, and acquiring an access request; acquiring a historical abnormal access request set aiming at the target database; determining whether the access request is an abnormal access request according to the historical abnormal access request set; responding to the determination that the access request is not an abnormal access request, and analyzing the access request to obtain a field to be detected; dividing the field to be detected in sequence according to a preset field to obtain a divided field sequence; performing field combination processing on the split field sequence to obtain a target data directory level field and a target database level field; and acquiring target data according to the target data directory level field and the target database level field.
Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes a first acquisition unit, a second acquisition unit, a determination unit, an parsing unit, a segmentation unit, a field combination processing unit, and a third acquisition unit. Where the names of these units do not constitute a limitation on the unit itself in some cases, for example, the first obtaining unit may also be described as "a unit that obtains an access request in response to determining that the access manner is a manner of accessing the target database by the client".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.
Claims (8)
1. A data acquisition method comprising:
responding to the determination that the access mode is a mode of accessing the target database through the client, and acquiring an access request;
acquiring a historical abnormal access request set aiming at the target database;
determining whether the access request is an abnormal access request according to the historical abnormal access request set;
responding to the determination that the access request is not an abnormal access request, and analyzing the access request to obtain a field to be detected;
dividing the field to be detected according to a preset field in sequence to obtain a divided field sequence;
acquiring a data directory hierarchy field set in the target database;
combining the data directory hierarchy field sets by using a preset format to obtain combined data directory hierarchy field sets;
screening the combined data directory hierarchy field matched with the field to be detected from the combined data directory hierarchy field set to serve as a target combined data directory hierarchy field;
determining the data directory hierarchy field corresponding to the target combined data directory hierarchy field as a target data directory hierarchy field;
acquiring a database hierarchy field set corresponding to the target data directory hierarchy field;
Combining each database hierarchy field in the target data directory hierarchy field and the database hierarchy field set by using a preset format to generate a combined field, so as to obtain a combined field set;
screening out the combined fields matched with the fields to be detected from the combined field set to be used as target combined fields;
respectively determining the data directory level field and the database level field corresponding to the target combined field as a target data directory level field and a target database level field;
and acquiring target data according to the target data directory level field and the target database level field.
2. The method of claim 1, wherein after the determining whether the access request is an abnormal access request from the set of historical abnormal access requests, further comprising:
and responding to the determination that the access request is an abnormal access request, carrying out abnormal access early warning on the access request, and sending the access request to a monitoring terminal.
3. The method of claim 1, wherein the method further comprises:
acquiring a query request set in response to determining that the access mode is a mode of accessing the metadata database;
Screening out a query request meeting preset conditions from the query request set as a target query request, wherein the preset conditions are conditions that a data table field in the query request comprises a first preset field;
updating the target query request to obtain an updated target query request;
and acquiring the target data according to the updated target query request.
4. The method of claim 3, wherein the updating the target-query request to obtain an updated target-query request comprises:
word segmentation processing is carried out on the target query request to obtain a field set;
analyzing attribute fields corresponding to a second preset field in response to determining that the field set comprises the second preset field;
in response to the analysis of the target data directory hierarchy field and the target database hierarchy field, adding a first preset field into the target query request to obtain a first added target query request, wherein an attribute field of the first preset field in the first added target query request is the target data directory hierarchy field;
and updating the first added target query request to obtain an updated target query request, wherein an attribute field of a second preset field in the updated target query request is the target database level field.
5. The method of claim 4, wherein after the adding a first preset field in the target query request in response to parsing to the target data directory level field and the target database level field, obtaining a first post-added target query request, further comprising:
adding a first preset field in the target query request to obtain a second added target query request in response to the target data directory level field not being analyzed;
and determining the attribute field of the first preset field in the second added target query request as a default data directory hierarchy field.
6. A data acquisition device, comprising:
the first acquisition unit is configured to acquire an access request in response to determining that the access mode is a mode of accessing the target database through the client;
a second acquisition unit configured to acquire a set of historical abnormal access requests for the target database;
a first determining unit configured to determine whether the access request is an abnormal access request according to the history abnormal access request set;
the analysis unit is configured to respond to the determination that the access request is not an abnormal access request, and analyze the access request to obtain a field to be detected;
The segmentation unit is configured to segment the field to be detected according to preset fields in sequence to obtain a segmented field sequence;
a third acquisition unit configured to acquire a set of data directory hierarchy fields in the target database;
the combining unit is configured to combine the data directory hierarchy field sets by using a preset format to obtain a combined data directory hierarchy field set;
a first filtering unit configured to filter out, from the set of combined data directory hierarchy fields, a combined data directory hierarchy field that matches the field to be detected as a target combined data directory hierarchy field;
a second determining unit configured to determine a data directory hierarchy field corresponding to the target combined data directory hierarchy field as a target data directory hierarchy field;
a fourth acquisition unit configured to acquire a database hierarchy field set corresponding to the target data directory hierarchy field;
the combination processing unit is configured to perform combination processing on the target data directory level field and each database level field in the database level field set by using a preset format so as to generate a combined field and obtain a combined field set;
The second screening unit is configured to screen out combined fields matched with the fields to be detected from the combined field set to serve as target combined fields;
a third determining unit configured to determine a data directory hierarchy field and a database hierarchy field corresponding to the target combined field as a target data directory hierarchy field and a target database hierarchy field, respectively;
and a fifth acquisition unit configured to acquire target data based on the target data directory hierarchy field and the target database hierarchy field.
7. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.
8. A computer readable medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310432781.0A CN116150194B (en) | 2023-04-21 | 2023-04-21 | Data acquisition method, device, electronic equipment and computer readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310432781.0A CN116150194B (en) | 2023-04-21 | 2023-04-21 | Data acquisition method, device, electronic equipment and computer readable medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116150194A CN116150194A (en) | 2023-05-23 |
CN116150194B true CN116150194B (en) | 2023-07-14 |
Family
ID=86339231
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310432781.0A Active CN116150194B (en) | 2023-04-21 | 2023-04-21 | Data acquisition method, device, electronic equipment and computer readable medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116150194B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116881974B (en) * | 2023-09-06 | 2023-11-24 | 中关村科学城城市大脑股份有限公司 | Data processing method and device based on data acquisition request and electronic equipment |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190259491A1 (en) * | 2018-02-22 | 2019-08-22 | International Business Machines Corporation | Instance level metadata population of a pacs database |
CN113946839A (en) * | 2020-07-16 | 2022-01-18 | 南京中兴软件有限责任公司 | Data access method, data access device, storage medium and electronic device |
CN115622803B (en) * | 2022-12-02 | 2023-04-14 | 北京景安云信科技有限公司 | Authority control system and method based on protocol analysis |
-
2023
- 2023-04-21 CN CN202310432781.0A patent/CN116150194B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN116150194A (en) | 2023-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102485179B1 (en) | Method, device, electronic device and computer storage medium for determining description information | |
US20170330084A1 (en) | Clarification of Submitted Questions in a Question and Answer System | |
CN110334209B (en) | Text classification method, device, medium and electronic equipment | |
JP2020027649A (en) | Method, apparatus, device and storage medium for generating entity relationship data | |
CN110597844B (en) | Unified access method for heterogeneous database data and related equipment | |
US20200175390A1 (en) | Word embedding model parameter advisor | |
CN114281968B (en) | Model training and corpus generation method, device, equipment and storage medium | |
CN108694221A (en) | Data real-time analysis method, module, equipment and device | |
US20240220772A1 (en) | Method of evaluating data, training method, electronic device, and storage medium | |
CN114091426A (en) | Method and device for processing field data in data warehouse | |
CN116150194B (en) | Data acquisition method, device, electronic equipment and computer readable medium | |
CN114861889A (en) | Deep learning model training method, target object detection method and device | |
US10229194B2 (en) | Providing known distribution patterns associated with specific measures and metrics | |
CN117407414A (en) | Method, device, equipment and medium for processing structured query statement | |
US20230274161A1 (en) | Entity linking method, electronic device, and storage medium | |
US20180336242A1 (en) | Apparatus and method for generating a multiple-event pattern query | |
CN111368036B (en) | Method and device for searching information | |
CN115292506A (en) | Knowledge graph ontology construction method and device applied to office field | |
CN114201607A (en) | Information processing method and device | |
CN114860872A (en) | Data processing method, device, equipment and storage medium | |
CN114357180A (en) | Knowledge graph updating method and electronic equipment | |
CN113609309A (en) | Knowledge graph construction method and device, storage medium and electronic equipment | |
CN118069122B (en) | Structured query statement multiplexing method, device, electronic equipment and medium | |
CN117891979B (en) | Method and device for constructing blood margin map, electronic equipment and readable medium | |
CN116737762B (en) | Structured query statement generation method, device and computer readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |