WO2021135919A1

WO2021135919A1 - Machine learning-based sql statement security testing method and apparatus, device, and medium

Info

Publication number: WO2021135919A1
Application number: PCT/CN2020/136341
Authority: WO
Inventors: 吴添立
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-05-27
Filing date: 2020-12-15
Publication date: 2021-07-08
Also published as: CN111783132A

Abstract

A machine learning-based SQL statement security testing method and apparatus, a device, and a medium. The method comprises: upon receiving an SQL access request, analyzing the access request, and obtaining test data (S201); using a TF-IDF algorithm to perform feature extraction on the test data, and obtaining feature word segments (S202); using a hidden Markov model to perform text abnormality testing on the feature word segments, and obtaining a test result (S203); if the test result is abnormal, confirming that the test data contains a threat, and performing interception processing on the SQL access request (S204). The present invention implements rapid testing of the reasonability of SQL access requests, and helps to improve security testing efficiency. The present invention also relates to blockchain technology, in that obtained feature word segments and corresponding test results are stored in a blockchain, so as to implement rapid interception of abnormal access requests, and ensure database security.

Description

SQL statement security detection method, device, equipment and medium based on machine learning

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office with the application number 2020104640093 and the invention titled "Machine Learning-based SQL Statement Security Detection Method, Apparatus, Equipment and Medium" on May 27, 2020, all of which The content is incorporated in this application by reference.

Technical field

This application relates to the field of network security, and in particular to a method, device, equipment and medium for SQL statement security detection based on machine learning.

Background technique

With the continuous development of web applications, web security issues have become increasingly prominent. SQL injection vulnerabilities are the most common vulnerabilities in web security problems. Successful use of SQL injection can lead to database information leakage, malicious operation of the database, and even remote control of the server, performing arbitrary operations, etc. The harm is also extremely serious. For the automatic detection of malicious SQL injection requests, the current main approach is detection based on black rules. In the process of implementing this application, the inventor realized that the prior art has at least the following problems: due to frequent SQL access requests, this rule-based The security defense of the database is passive and lagging, unable to detect unknown attacks, and takes a long time, resulting in low database security detection efficiency.

Summary of the invention

The embodiments of the present application provide a SQL statement security detection method, device, equipment, and medium based on machine learning, so as to improve the efficiency of database security detection.

In order to solve the above technical problems, an embodiment of the present application provides a SQL statement security detection method based on machine learning, including:

When the SQL access request is received, the access request is parsed to obtain the data to be detected;

Using the TF-IDF algorithm, perform feature extraction on the to-be-detected data to obtain feature word segmentation;

Use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;

If the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.

In order to solve the above technical problems, an embodiment of the present application also provides a SQL statement security detection device based on machine learning, including:

The request parsing module is used to parse the access request to obtain the data to be detected when the SQL access request is received;

The feature word segmentation module is used to use the TF-IDF algorithm to perform feature extraction on the data to be detected to obtain the feature word segmentation;

An anomaly detection module, configured to use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;

The request interception module is configured to, if the detection result is abnormal, confirm that the data to be detected is threatened, and intercept the SQL access request.

In order to solve the above technical problems, an embodiment of the present application also provides a computer device, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes all The following steps are implemented when the computer-readable instructions are described:

In order to solve the above technical problems, embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions implement the following steps when executed by a processor:

The SQL statement security detection method, device, device, and medium based on machine learning provided by the embodiments of the present application, when receiving a SQL access request, parse the access request to obtain the data to be detected, and use the TF-IDF algorithm to detect the data to be detected. Perform feature extraction to obtain feature segmentation, use hidden Markov model to perform text anomaly detection on feature segmentation, obtain the detection result, and realize the rapid detection of the rationality of SQL access requests, which is conducive to improving the efficiency of security detection. The detection result is When there is an abnormality, confirm that the data to be detected is threatened, and intercept the SQL access request, so as to quickly intercept the abnormal access request and ensure the security of the database.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.

Figure 1 is an exemplary system architecture diagram to which the present application can be applied;

FIG. 2 is a flowchart of an embodiment of the SQL statement security detection method based on machine learning of the present application;

Fig. 3 is a schematic structural diagram of an embodiment of a SQL sentence security detection device based on machine learning according to the present application;

Fig. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.

Detailed ways

Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field of the application; the terms used in the specification of the application herein are only for describing specific embodiments. The purpose is not to limit the application; the terms "including" and "having" in the specification and claims of the application and the above-mentioned description of the drawings and any variations thereof are intended to cover non-exclusive inclusions. The terms "first", "second", etc. in the specification and claims of the present application or the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence.

The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

Please refer to FIG. 1. As shown in FIG. 1, the system architecture 100 may include

terminal devices

101, 102, and 103, a network 104 and a server 105. The network 104 is used to provide a medium for communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.

The user can use the

terminal devices

101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.

The

terminal devices

101, 102, 103 may be various electronic devices with a display screen and support web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture E interface display perts Group Audio Layer III. The moving picture expert compresses the standard audio layer 3), MP4 (Moving Picture E interface displays perts Group Audio Layer IV, the moving picture expert compresses the standard audio layer 4) player, laptop portable computer and desktop computer, etc.

The server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the

terminal devices

101, 102, and 103.

It should be noted that the SQL statement security detection method based on machine learning provided by the embodiment of the present application is executed by the server, and accordingly, the SQL statement security detection device based on machine learning is provided in the server.

It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there may be any number of terminal devices, networks, and servers. The

terminal devices

101, 102, and 103 in the embodiments of the present application may specifically correspond to application systems in actual production.

Please refer to FIG. 2. FIG. 2 shows a SQL statement security detection method based on machine learning provided by an embodiment of the present application. The method is applied to the server in FIG. 1 as an example for description, and the details are as follows:

S201: When the SQL access request is received, the access request is parsed to obtain the data to be detected.

Specifically, when the client accesses the database, it first connects to the server, and then sends a database access request containing the request parameters to the server, and the server obtains the data that the client needs to access from the database according to the request and returns it to the client In this process, if the request parameters contain some illegal characters, it may cause harm to the database. These illegal characters include but are not limited to: malicious SQL injection instructions, unauthorized instructions, and SQL attack instructions. Before parsing and executing the access request, it is necessary to check the security of the request parameters. In this embodiment, when the server receives the SQL access request, it obtains the request parameters contained in the access request and parses the parameters. Get the data to be tested.

Among them, parsing the access request refers to obtaining request parameters included in the access request, and then determining the SQL statement to be detected according to the request parameters.

Wherein, the request parameter refers to the attribute field included in the access request, which can specifically be a uniform resource locator URL, an operation instruction or a matching rule. In this embodiment, it mainly refers to the POST parameters, GET parameters, and GET parameters in the access request. Attribute fields such as COOKIE parameters are detected.

S202: Using the TF-IDF algorithm, perform feature extraction on the data to be detected to obtain feature word segmentation.

Specifically, through the TF-IDF algorithm, feature extraction is performed on the data to be detected, and a word segmentation that can reflect the core characteristics of the data to be detected is obtained as a feature word segmentation.

Among them, TF-IDF (term frequency-inverse document frequency) is a commonly used weighting technique for information retrieval and data mining. TF means term frequency (Term Frequency), IDF means inverse document frequency index (Inverse Document Frequency), TF-IDF is a statistical method used to evaluate a word for a document set or one of the documents in a corpus The degree of importance. The importance of a word increases in proportion to the number of times it appears in the document, but at the same time it decreases in inverse proportion to the frequency of its appearance in the corpus.

In this embodiment, the word segmentation is performed on the data to be detected, and then the proportion of the obtained word segmentation in the data to be detected is used as the word frequency TF, and the appearance frequency of the word segmentation in the preset corpus is used as the inverse text frequency index IDF , And then calculate the TF-IDF corresponding to the word segmentation, and determine the accuracy of the word segmentation as an independent word segmentation based on whether the TF-IDF is in the preset range, and determine when the TF-IDF of the word segmentation is in the preset range The word segmentation has a high probability of being a readable string, and then the request can be segmented, and its word composition can be obtained and vectorized.

Among them, feature extraction includes word segmentation, generalization, and feature vector transformation.

For example, in a specific embodiment, a data to be detected is a uniform resource identifier URL, which is specifically "publico/anadir.jsp?id=2&nombre=Jam%F3n+Ib%E9rico&precio=85&cantidad=%27%3B+ DROP+TABLE+usuarios%3B+SELECT+*+FROM+datos+WHRE+nombre+LIKE+%27%25&B1=A%F1adir+al+carrito", after the word segmentation, we get "publico analysis jsp id 2 nombre Jam F3n Ib E9rico precio 85 cantidad 27 3B DROP TABLE usuarios 3B SELECT*FROM datos WHERE nombre LIKE 27 25 B1 A F1adir al carrito".

S203: Use the hidden Markov model to perform text anomaly detection on the feature segmentation, and obtain the detection result.

Specifically, after the feature segmentation of the sentence to be detected is obtained, the feature segmentation is input into a pre-trained hidden Markov model, and the presence of abnormal text in the feature segmentation is detected through the hidden Markov model.

Among them, Hidden Markov Model (HMM) is a statistical model, which is used to describe a Markov process with hidden unknown parameters. Hidden Markov model is a kind of Markov chain. Its state cannot be directly observed, but it can be observed through a sequence of observation vectors. Each observation vector is expressed in various states through certain probability density distributions. An observation vector is generated by a sequence of states with a corresponding probability density distribution.

Among them, text anomaly detection refers to detecting whether the data to be detected contains illegal text characters. The illegal text characters of IE may become a hidden danger of database access security.

The hidden Markov model is used to perform text abnormality detection on the feature segmentation, and the specific process of obtaining the detection result can be referred to the description of the subsequent embodiments. In order to avoid repetition, it will not be repeated here.

S204: If the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.

Specifically, when the detection result is abnormal, that is, the sentence to be detected contains illegal characters that may threaten the database data, at this time, the SQL access request is intercepted to ensure the security of the database.

In this embodiment, when the SQL access request is received, the access request is parsed to obtain the data to be detected, and the TF-IDF algorithm is used to extract the features of the data to be detected to obtain the feature word segmentation, and the hidden Markov model is used to analyze the features. Word segmentation performs text anomaly detection and obtains the detection results to realize rapid detection of the rationality of SQL access requests, which is conducive to improving the efficiency of security detection. When the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is made Intercept processing to achieve rapid interception of abnormal access requests to ensure the security of the database.

In an embodiment, the obtained characteristic word segmentation and the detection result corresponding to the characteristic word segmentation can be saved on the blockchain network, and the data information can be shared between different platforms through the blockchain storage, and the data can also be prevented from being tampered with .

Among them, the blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

In some optional implementation manners of this embodiment, in step S202, using the TF-IDF algorithm to perform feature extraction on the data to be detected to obtain feature segmentation includes:

Through word combination, the data to be tested is divided into words to obtain the initial word segmentation;

Count the proportion of the initial word segmentation in the sentence to be tested, and use the proportion as the word frequency of the initial word segmentation;

Count the IDF of the reverse document frequency of the initial word segmentation in the preset corpus;

Calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation that belongs to the important feature as the feature segmentation.

Specifically, through the TF-IDF algorithm, feature extraction is performed on the data to be detected to obtain feature word segmentation.

It should be noted that the word segmentation in this embodiment cannot be simply understood as the division of "words", and it can specifically be divided into a character string, such as "SELECT*FROM".

The following formula is used to calculate the reverse document frequency IDF of the initial word segmentation in the preset corpus:

Among them, |D| is the total number of files in the corpus. |{j: t _i ∈ d _j }| represents the number of files containing the i-th initial participle t _i _{(ie, the number of n i, j} ≠ 0). If the initial participle is not in the corpus, it will cause the denominator to be zero. Therefore, in general, use |{j:t _i ∈d _j }|+1 as the denominator to avoid the case where the denominator is 0.

Determine whether the initial word segmentation is an important feature according to the product. Specifically, it can be based on the word segmentation results of historical SQL attacks as a reference basis and set a threshold. When the product reaches the threshold, the word segmentation is determined as an important feature.

In this embodiment, the TF-IDF algorithm is used to identify important features related to SQL security access from the features of the data to be detected as feature segmentation, and subsequent use of the feature segmentation for security detection is beneficial to improve the accuracy of security detection.

In some optional implementation manners of this embodiment, in step S203, a hidden Markov model is used to perform text anomaly detection on the feature segmentation, and the detection result obtained includes:

Convert characteristic word segmentation into state representation;

For the i-th state, predict the probability distribution of the i+1-th state through the observation sequence of the hidden Markov model, and use the state corresponding to the maximum probability value in the probability distribution as the predicted state corresponding to the i+1-th state , Where i is a positive integer;

If the i+1th state matches the predicted state corresponding to the i+1th state, the detection result is confirmed to be normal;

If the i+1th state does not match the predicted state corresponding to the i+1th state, the detection result is confirmed to be abnormal.

Specifically, the feature segmentation is converted into each state in the hidden Markov model, and the next state of each state is predicted through the observation sequence of the hidden Markov model, and the prediction result of the next state is compared with the real one. The status is matched. If the match fails, it is confirmed that there is a characteristic word segmentation that may cause a security risk, and the SQL access request flag is abnormal.

It should be noted that in the implementation process of this embodiment, the hidden Markov model can be tuned by the simple average method or the weighted average method. At the same time, non-negative weights must be used to ensure that the integrated performance is better than the single best individual learner. Due to insufficient samples or noise in real tasks, sometimes the weights are not completely reliable. Therefore, the simple tuning or the weighted average method needs to be judged according to the actual situation.

In this embodiment, the Hidden Markov Model is used to quickly determine whether the feature segmentation is reasonable, and then to determine whether the access request is abnormal, which is beneficial to improve the efficiency of security detection.

In some optional implementation manners of this embodiment, the SQL statement security detection method based on machine learning further includes:

From the detection log, obtain the abnormal state of the detection result, and the corresponding feature word segmentation as a reference word segmentation;

Calculate the text similarity between the feature segmentation and the reference segmentation to obtain the text similarity value;

If the text similarity value is less than the preset similarity threshold, the feature segmentation is regarded as an abnormal segmentation, and the detection result is determined to be abnormal.

Specifically, the server saves the feature segmentation corresponding to each abnormal state in the detection log. The feature segmentation stored in the detection log reaches a certain number, and the feature segmentation corresponding to the abnormal state can be used to perform the feature segmentation with the feature segmentation obtained after step S202. For similarity calculation, if the similarity is greater than the preset threshold, it is determined that the feature segmentation obtained in step S202 has an abnormal state, otherwise, the method of step S203 will continue to be used for judgment. This method is beneficial to quickly screen out possible abnormalities. Status of the access request.

Among them, the detection log is a log file storing the characteristic word segmentation corresponding to the abnormal state.

Among them, the calculation of the text similarity between the feature segmentation and the reference segmentation can be specifically implemented through Euclidean distance, similarity algorithm, etc.

Among them, the preset similarity threshold can be set according to actual needs, which is not limited here.

In this embodiment, the feature segmentation corresponding to the abnormal state of the detection result is obtained from the detection log as a reference segmentation, and then the similarity between the reference segmentation and the obtained feature segmentation is calculated to quickly determine whether the feature segmentation will cause an abnormality State, and then judge the risk of the feature segmentation.

In some optional implementations of this embodiment, after step S203 and before step S204, the method for detecting SQL statement security based on machine learning further includes performing a secondary check on the data to be detected whose detection result is abnormal. , Specifically including:

If the detection result is abnormal, perform lexical segmentation of the data to be tested based on the SQL statement to obtain the segmentation to be verified;

Use the preset character verification function to verify the sensitive vocabulary of the word to be verified by scanning the string, and obtain the scan result;

If the scan result is that there are sensitive words in the word segmentation to be verified, the detection result is confirmed to be abnormal.

Specifically, when the database is open for normal access, access requests are received more frequently. Using keyword-based methods to perform security checks on each access request will consume a lot of time, resulting in inefficiency, and at the same time, it is also easy to cause access The access failure caused by the request is too late to process, the hidden Markov model is used to effectively improve the detection efficiency, and the access request that may be threatened can be quickly identified, in order to further verify the accuracy of the threat access request detected by the hidden Markov model and ensure normal access Excuse me, it will not be intercepted by misjudgment. In this embodiment, the abnormal access request detected in step S203 is subjected to a secondary check by means of string scanning to ensure the accuracy of interception.

Among them, lexical segmentation refers to segmenting the sentence to be tested into each segmentation to be verified according to the grammatical rules of the SQL sentence.

Among them, the preset character verification function refers to a function for verifying characters, which specifically includes but is not limited to logical, isalpha, etc., and a custom function can also be used, which is not limited here.

Among them, the sensitive vocabulary is a pre-defined vocabulary that has a certain risk to database security, such as user, system, etc. The key characters in the detected abnormal access request can be added to the original sensitive vocabulary.

In this embodiment, by performing secondary detection on the abnormal access request detected by the hidden Markov model, the accuracy of abnormal judgment is ensured, which is beneficial to improve the rationality and accuracy of access interception.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.

FIG. 3 shows a principle block diagram of a SQL statement security detection device based on machine learning that corresponds to the SQL statement security detection method based on machine learning in the above embodiment one-to-one. As shown in FIG. 3, the SQL sentence security detection device based on machine learning includes a request parsing module 31, a feature word segmentation module 32, an abnormality detection module 33 and a request interception module 34. The detailed description of each functional module is as follows:

The request parsing module 31 is used to parse the access request to obtain the data to be detected when the SQL access request is received;

The feature word segmentation module 32 is used to use the TF-IDF algorithm to perform feature extraction on the data to be detected to obtain feature word segmentation;

The anomaly detection module 33 is used to perform text anomaly detection on the feature segmentation using the hidden Markov model, and obtain the detection result;

The request interception module 34 is configured to, if the detection result is abnormal, confirm that the data to be detected is threatened, and intercept the SQL access request.

Optionally, the feature word segmentation module 32 includes:

The data segmentation unit is used to segment the data to be tested by way of word combination to obtain the initial segmentation;

The word frequency statistics unit is used to count the proportion of the initial word segmentation in the sentence to be tested, and the proportion is used as the word frequency of the initial word segmentation;

Frequency statistics unit, used to count the reverse document frequency IDF of the initial word segmentation in the preset corpus;

The word segmentation determining unit is used to calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation belonging to the important feature as the characteristic word segmentation.

Optionally, the abnormality detection module 33 includes:

The state conversion unit is used to convert the characteristic word segmentation into a state representation;

The state prediction unit is used to predict the probability distribution of the i+1th state through the observation sequence of the hidden Markov model for the i-th state, and use the state corresponding to the maximum probability value in the probability distribution as the i+1th state The predicted state corresponding to each state, where i is a positive integer;

The first matching unit is configured to confirm that the detection result is normal if the i+1th state matches the predicted state corresponding to the i+1th state;

The second matching unit is used to confirm that the detection result is abnormal if the i+1th state does not match the corresponding predicted state of the i+1th state.

Optionally, the SQL statement security detection device based on machine learning further includes:

The reference word segmentation determination module is used to obtain the abnormal state of the detection result from the detection log, and the corresponding feature word segmentation as a reference word segmentation;

The similarity calculation module is used to calculate the text similarity between the feature segmentation and the reference segmentation to obtain the text similarity value;

The abnormality determination module is used to, if the text similarity value is less than the preset similarity threshold, use the characteristic word segmentation as an abnormal word segmentation, and determine that the detection result is abnormal.

The lexical segmentation module is used to perform lexical segmentation on the data to be tested based on the SQL statement if the detection result is abnormal to obtain the segmentation to be verified;

The sensitive vocabulary verification module is used to use the preset character verification function to verify the sensitive vocabulary of the word to be verified by scanning the string to obtain the scanning result;

The result confirmation module is used for confirming that the detection result is abnormal if the scanning result is that there are sensitive words in the word segmentation to be verified.

The storage module is used to store the characteristic word segmentation and the detection result corresponding to the characteristic word segmentation in the blockchain.

For the specific limitation of the SQL statement security detection device based on machine learning, please refer to the above limitation on the SQL statement security detection method based on machine learning, which will not be repeated here. Each module in the above-mentioned machine learning-based SQL statement security detection device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In order to solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 4 for details. FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.

The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other in communication via a system bus. It should be pointed out that the figure only shows the computer device 4 with the components connected to the memory 41, the processor 42, and the network interface 43. However, it should be understood that it is not required to implement all the shown components, and alternative implementations can be made More or fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.

The computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.

The memory 41 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or D interface display memory, etc.), random access memory (RAM) , Static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, for example, a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, Flash Card, etc. Of course, the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device. In this embodiment, the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as program codes for controlling electronic files. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 42 is generally used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to run program codes or process data stored in the memory 41, for example, run program codes for controlling electronic files.

The network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.

This application also provides another implementation manner, that is, a computer-readable storage medium is provided. The computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium stores An interface display program, the interface display program may be executed by at least one processor, so that the at least one processor executes the steps of the SQL statement security detection method based on machine learning as described above.

Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.

Obviously, the embodiments described above are only a part of the embodiments of the present application, rather than all of the embodiments. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. The present application can be implemented in many different forms. On the contrary, the purpose of providing these examples is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although this application has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it is still possible for those skilled in the art to modify the technical solutions described in each of the foregoing specific embodiments, or equivalently replace some of the technical features. . All equivalent structures made by using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are similarly within the scope of patent protection of this application.

Claims

A SQL statement security detection method based on machine learning, wherein the SQL statement security detection method based on machine learning includes:

When the SQL access request is received, the access request is parsed to obtain the data to be detected;

Using the TF-IDF algorithm, perform feature extraction on the to-be-detected data to obtain feature word segmentation;

Use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;

If the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
8. The method for security detection of SQL sentences based on machine learning according to claim 1, wherein said adopting TF-IDF algorithm to extract features of said data to be detected to obtain feature segmentation comprises:

Perform word segmentation on the to-be-detected data by way of word combination to obtain initial word segmentation;

Count the proportion of the initial word segmentation in the sentence to be detected, and use the proportion as the word frequency of the initial word segmentation;

Count the IDF of the reverse document frequency of the initial word segmentation in the preset corpus;

Calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation belonging to the important feature as the characteristic word segmentation.
5. The method for security detection of SQL sentences based on machine learning according to claim 1, wherein said using a hidden Markov model to perform text anomaly detection on said feature segmentation, and obtaining a detection result comprises:

Convert the characteristic word segmentation into a state representation;

For the i-th state, the probability distribution of the i+1-th state is predicted through the observation sequence of the hidden Markov model, and the state corresponding to the maximum probability value in the probability distribution is regarded as the state corresponding to the i+1-th state Forecast state, where i is a positive integer;

If the i+1th state matches the predicted state corresponding to the i+1th state, the detection result is confirmed to be normal;

If the i+1th state does not match the predicted state corresponding to the i+1th state, the detection result is confirmed to be abnormal.
8. The method for security detection of SQL statements based on machine learning according to claim 1, wherein the method for security detection of SQL statements based on machine learning further comprises:

From the detection log, obtain the abnormal state of the detection result, and the corresponding feature word segmentation as a reference word segmentation;

Calculate the text similarity between the feature segmentation and the reference segmentation to obtain a text similarity value;

If the text similarity value is less than the preset similarity threshold, the characteristic word segmentation is used as an abnormal word segmentation, and the detection result is determined to be abnormal.
The method for security detection of SQL sentences based on machine learning according to claim 1, wherein after said using Hidden Markov Model to perform text anomaly detection on said feature word segmentation, and obtaining a detection result, and after said If the detection result is abnormal, before confirming that the data to be detected is threatened, the SQL statement security detection method based on machine learning further includes:

If the detection result is abnormal, perform lexical segmentation on the data to be detected based on the SQL sentence to obtain the testimony segmentation;

Use a preset character verification function to perform sensitive vocabulary verification on the word to be verified by means of string scanning, to obtain a scan result;

If the scanning result is that there are sensitive words in the word segmentation to be verified, it is confirmed that the detection result is abnormal.
5. The method for security detection of SQL sentences based on machine learning according to claim 1, wherein after said using Hidden Markov Model to perform text anomaly detection on said feature segmentation, and obtaining a detection result, the method further comprises:

The characteristic word segmentation and the detection result corresponding to the characteristic word segmentation are stored in the blockchain.
A SQL statement security detection device based on machine learning, wherein the SQL statement security detection device based on machine learning includes:

The request parsing module is used to parse the access request to obtain the data to be detected when the SQL access request is received;

The feature word segmentation module is used to use the TF-IDF algorithm to perform feature extraction on the data to be detected to obtain feature word segmentation;

An anomaly detection module, configured to use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;

The request interception module is configured to, if the detection result is abnormal, confirm that the data to be detected is threatened, and intercept the SQL access request.
8. The SQL sentence security detection device based on machine learning according to claim 7, wherein the characteristic word segmentation module comprises:

The data word segmentation unit is used to perform word segmentation on the to-be-detected data by way of word combination to obtain initial word segmentation;

A word frequency counting unit, configured to count the proportion of the initial word segmentation in the sentence to be detected, and use the proportion as the word frequency of the initial word segmentation;

A frequency statistics unit, used to count the reverse document frequency IDF of the initial word segmentation in the preset corpus;

The word segmentation determining unit is used to calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation that belongs to the important feature Participate the features.
A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, wherein the processor implements the following steps when the processor executes the computer-readable instructions:

When the SQL access request is received, the access request is parsed to obtain the data to be detected;

Using the TF-IDF algorithm, perform feature extraction on the to-be-detected data to obtain feature word segmentation;

Use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;

If the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
9. The computer device according to claim 9, wherein said using the TF-IDF algorithm to extract features from the data to be detected to obtain feature word segmentation comprises:

Perform word segmentation on the to-be-detected data by way of word combination to obtain initial word segmentation;

Count the proportion of the initial word segmentation in the sentence to be detected, and use the proportion as the word frequency of the initial word segmentation;

Count the IDF of the reverse document frequency of the initial word segmentation in the preset corpus;

Calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation belonging to the important feature as the characteristic word segmentation.
9. The computer device according to claim 9, wherein said using a hidden Markov model to perform text anomaly detection on said feature segmentation, and obtaining a detection result comprises:

Convert the characteristic word segmentation into a state representation;

For the i-th state, the probability distribution of the i+1-th state is predicted through the observation sequence of the hidden Markov model, and the state corresponding to the maximum probability value in the probability distribution is regarded as the state corresponding to the i+1-th state Forecast state, where i is a positive integer;

If the i+1th state matches the predicted state corresponding to the i+1th state, the detection result is confirmed to be normal;

If the i+1th state does not match the predicted state corresponding to the i+1th state, the detection result is confirmed to be abnormal.
9. The computer device of claim 9, wherein the processor further implements the following steps when executing the computer-readable instructions:

From the detection log, obtain the abnormal state of the detection result, and the corresponding feature word segmentation as a reference word segmentation;

Calculate the text similarity between the feature segmentation and the reference segmentation to obtain a text similarity value;

If the text similarity value is less than the preset similarity threshold, the characteristic word segmentation is used as an abnormal word segmentation, and the detection result is determined to be abnormal.
9. The computer device according to claim 9, wherein, after said using a hidden Markov model to perform text anomaly detection on said feature segmentation, and obtaining a detection result, and, in said if said detection result is abnormal, Before confirming that the data to be detected is threatened, the processor further implements the following steps when executing the computer-readable instructions:

If the detection result is abnormal, perform lexical segmentation on the data to be detected based on the SQL sentence to obtain the segmentation to be verified;

Use a preset character verification function to perform sensitive vocabulary verification on the word to be verified by means of string scanning, to obtain a scan result;

If the scanning result is that there are sensitive words in the word segmentation to be verified, it is confirmed that the detection result is abnormal.
The computer device according to claim 9, wherein, after the use of the hidden Markov model to perform text anomaly detection on the feature segmentation, and the detection result is obtained, the processor executes the computer readable instruction. The following steps:

The characteristic word segmentation and the detection result corresponding to the characteristic word segmentation are stored in the blockchain.
A computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, wherein, when the computer-readable instructions are executed by a processor, the following steps are implemented:

When the SQL access request is received, the access request is parsed to obtain the data to be detected;

Using the TF-IDF algorithm, perform feature extraction on the to-be-detected data to obtain feature word segmentation;

Use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;

If the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
15. The computer-readable storage medium according to claim 15, wherein said using the TF-IDF algorithm to extract features of the data to be detected to obtain feature word segmentation comprises:

Perform word segmentation on the to-be-detected data by way of word combination to obtain initial word segmentation;

Count the proportion of the initial word segmentation in the sentence to be detected, and use the proportion as the word frequency of the initial word segmentation;

Count the IDF of the reverse document frequency of the initial word segmentation in the preset corpus;

Calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation belonging to the important feature as the characteristic word segmentation.
15. The computer-readable storage medium according to claim 15, wherein said using a hidden Markov model to perform text anomaly detection on said feature segmentation, and obtaining a detection result comprises:

Convert the characteristic word segmentation into a state representation;

For the i-th state, the probability distribution of the i+1-th state is predicted through the observation sequence of the hidden Markov model, and the state corresponding to the maximum probability value in the probability distribution is regarded as the state corresponding to the i+1-th state Forecast state, where i is a positive integer;

If the i+1th state matches the predicted state corresponding to the i+1th state, the detection result is confirmed to be normal;

If the i+1th state does not match the predicted state corresponding to the i+1th state, the detection result is confirmed to be abnormal.
15. The computer-readable storage medium of claim 15, wherein the processor further implements the following steps when executing the computer-readable instruction:

From the detection log, obtain the abnormal state of the detection result, and the corresponding feature word segmentation as a reference word segmentation;

Calculate the text similarity between the feature segmentation and the reference segmentation to obtain a text similarity value;

If the text similarity value is less than the preset similarity threshold, the characteristic word segmentation is used as an abnormal word segmentation, and the detection result is determined to be abnormal.
The computer-readable storage medium according to claim 15, wherein, after the use of the hidden Markov model to perform text anomaly detection on the feature segmentation, and the detection result is obtained, and after the detection result is If there is an abnormality, before confirming that the data to be detected is threatened, the processor further implements the following steps when executing the computer-readable instruction:

If the detection result is abnormal, perform lexical segmentation on the data to be detected based on the SQL sentence to obtain the segmentation to be verified;

Use a preset character verification function to perform sensitive vocabulary verification on the word to be verified by means of string scanning, to obtain a scan result;

If the scanning result is that there are sensitive words in the word segmentation to be verified, it is confirmed that the detection result is abnormal.
The computer-readable storage medium of claim 15, wherein, after the use of Hidden Markov Model to perform text anomaly detection on the feature segmentation, and the detection result is obtained, the processor executes the computer-readable instruction It also implements the following steps:

The characteristic word segmentation and the detection result corresponding to the characteristic word segmentation are stored in the blockchain.