WO2021135919A1 - Machine learning-based sql statement security testing method and apparatus, device, and medium - Google Patents

Machine learning-based sql statement security testing method and apparatus, device, and medium Download PDF

Info

Publication number
WO2021135919A1
WO2021135919A1 PCT/CN2020/136341 CN2020136341W WO2021135919A1 WO 2021135919 A1 WO2021135919 A1 WO 2021135919A1 CN 2020136341 W CN2020136341 W CN 2020136341W WO 2021135919 A1 WO2021135919 A1 WO 2021135919A1
Authority
WO
WIPO (PCT)
Prior art keywords
word segmentation
segmentation
detection result
feature
state
Prior art date
Application number
PCT/CN2020/136341
Other languages
French (fr)
Chinese (zh)
Inventor
吴添立
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021135919A1 publication Critical patent/WO2021135919A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This application relates to the field of network security, and in particular to a method, device, equipment and medium for SQL statement security detection based on machine learning.
  • the embodiments of the present application provide a SQL statement security detection method, device, equipment, and medium based on machine learning, so as to improve the efficiency of database security detection.
  • an embodiment of the present application provides a SQL statement security detection method based on machine learning, including:
  • the access request is parsed to obtain the data to be detected;
  • the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
  • an embodiment of the present application also provides a SQL statement security detection device based on machine learning, including:
  • the request parsing module is used to parse the access request to obtain the data to be detected when the SQL access request is received;
  • the feature word segmentation module is used to use the TF-IDF algorithm to perform feature extraction on the data to be detected to obtain the feature word segmentation;
  • An anomaly detection module configured to use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result
  • the request interception module is configured to, if the detection result is abnormal, confirm that the data to be detected is threatened, and intercept the SQL access request.
  • an embodiment of the present application also provides a computer device, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes all The following steps are implemented when the computer-readable instructions are described:
  • the access request is parsed to obtain the data to be detected;
  • the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
  • embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions implement the following steps when executed by a processor:
  • the access request is parsed to obtain the data to be detected;
  • the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
  • the SQL statement security detection method, device, device, and medium based on machine learning provided by the embodiments of the present application, when receiving a SQL access request, parse the access request to obtain the data to be detected, and use the TF-IDF algorithm to detect the data to be detected. Perform feature extraction to obtain feature segmentation, use hidden Markov model to perform text anomaly detection on feature segmentation, obtain the detection result, and realize the rapid detection of the rationality of SQL access requests, which is conducive to improving the efficiency of security detection.
  • the detection result is When there is an abnormality, confirm that the data to be detected is threatened, and intercept the SQL access request, so as to quickly intercept the abnormal access request and ensure the security of the database.
  • Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of the SQL statement security detection method based on machine learning of the present application
  • Fig. 3 is a schematic structural diagram of an embodiment of a SQL sentence security detection device based on machine learning according to the present application
  • Fig. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • the system architecture 100 may include terminal devices 101, 102, and 103, a network 104 and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
  • the terminal devices 101, 102, 103 may be various electronic devices with a display screen and support web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture E interface display perts Group Audio Layer III. The moving picture expert compresses the standard audio layer 3), MP4 (Moving Picture E interface displays perts Group Audio Layer IV, the moving picture expert compresses the standard audio layer 4) player, laptop portable computer and desktop computer, etc.
  • the server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
  • the SQL statement security detection method based on machine learning provided by the embodiment of the present application is executed by the server, and accordingly, the SQL statement security detection device based on machine learning is provided in the server.
  • terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there may be any number of terminal devices, networks, and servers.
  • the terminal devices 101, 102, and 103 in the embodiments of the present application may specifically correspond to application systems in actual production.
  • FIG. 2 shows a SQL statement security detection method based on machine learning provided by an embodiment of the present application. The method is applied to the server in FIG. 1 as an example for description, and the details are as follows:
  • the client accesses the database, it first connects to the server, and then sends a database access request containing the request parameters to the server, and the server obtains the data that the client needs to access from the database according to the request and returns it to the client
  • the request parameters contain some illegal characters, it may cause harm to the database.
  • These illegal characters include but are not limited to: malicious SQL injection instructions, unauthorized instructions, and SQL attack instructions.
  • the server receives the SQL access request, it obtains the request parameters contained in the access request and parses the parameters. Get the data to be tested.
  • parsing the access request refers to obtaining request parameters included in the access request, and then determining the SQL statement to be detected according to the request parameters.
  • the request parameter refers to the attribute field included in the access request, which can specifically be a uniform resource locator URL, an operation instruction or a matching rule. In this embodiment, it mainly refers to the POST parameters, GET parameters, and GET parameters in the access request. Attribute fields such as COOKIE parameters are detected.
  • TF-IDF feature extraction
  • a word segmentation that can reflect the core characteristics of the data to be detected is obtained as a feature word segmentation.
  • TF-IDF term frequency-inverse document frequency
  • IDF inverse document frequency index
  • TF-IDF is a statistical method used to evaluate a word for a document set or one of the documents in a corpus The degree of importance. The importance of a word increases in proportion to the number of times it appears in the document, but at the same time it decreases in inverse proportion to the frequency of its appearance in the corpus.
  • the word segmentation is performed on the data to be detected, and then the proportion of the obtained word segmentation in the data to be detected is used as the word frequency TF, and the appearance frequency of the word segmentation in the preset corpus is used as the inverse text frequency index IDF , And then calculate the TF-IDF corresponding to the word segmentation, and determine the accuracy of the word segmentation as an independent word segmentation based on whether the TF-IDF is in the preset range, and determine when the TF-IDF of the word segmentation is in the preset range
  • the word segmentation has a high probability of being a readable string, and then the request can be segmented, and its word composition can be obtained and vectorized.
  • feature extraction includes word segmentation, generalization, and feature vector transformation.
  • S203 Use the hidden Markov model to perform text anomaly detection on the feature segmentation, and obtain the detection result.
  • the feature segmentation is input into a pre-trained hidden Markov model, and the presence of abnormal text in the feature segmentation is detected through the hidden Markov model.
  • Hidden Markov Model is a statistical model, which is used to describe a Markov process with hidden unknown parameters.
  • Hidden Markov model is a kind of Markov chain. Its state cannot be directly observed, but it can be observed through a sequence of observation vectors. Each observation vector is expressed in various states through certain probability density distributions. An observation vector is generated by a sequence of states with a corresponding probability density distribution.
  • text anomaly detection refers to detecting whether the data to be detected contains illegal text characters.
  • the illegal text characters of IE may become a hidden danger of database access security.
  • the hidden Markov model is used to perform text abnormality detection on the feature segmentation, and the specific process of obtaining the detection result can be referred to the description of the subsequent embodiments. In order to avoid repetition, it will not be repeated here.
  • the SQL access request is intercepted to ensure the security of the database.
  • the access request when the SQL access request is received, the access request is parsed to obtain the data to be detected, and the TF-IDF algorithm is used to extract the features of the data to be detected to obtain the feature word segmentation, and the hidden Markov model is used to analyze the features.
  • Word segmentation performs text anomaly detection and obtains the detection results to realize rapid detection of the rationality of SQL access requests, which is conducive to improving the efficiency of security detection.
  • the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is made Intercept processing to achieve rapid interception of abnormal access requests to ensure the security of the database.
  • the obtained characteristic word segmentation and the detection result corresponding to the characteristic word segmentation can be saved on the blockchain network, and the data information can be shared between different platforms through the blockchain storage, and the data can also be prevented from being tampered with .
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • step S202 using the TF-IDF algorithm to perform feature extraction on the data to be detected to obtain feature segmentation includes:
  • the data to be tested is divided into words to obtain the initial word segmentation
  • feature extraction is performed on the data to be detected to obtain feature word segmentation.
  • word segmentation in this embodiment cannot be simply understood as the division of "words”, and it can specifically be divided into a character string, such as "SELECT*FROM".
  • is the total number of files in the corpus.
  • represents the number of files containing the i-th initial participle t i (ie, the number of n i, j ⁇ 0). If the initial participle is not in the corpus, it will cause the denominator to be zero. Therefore, in general, use
  • Determine whether the initial word segmentation is an important feature according to the product can be based on the word segmentation results of historical SQL attacks as a reference basis and set a threshold. When the product reaches the threshold, the word segmentation is determined as an important feature.
  • the TF-IDF algorithm is used to identify important features related to SQL security access from the features of the data to be detected as feature segmentation, and subsequent use of the feature segmentation for security detection is beneficial to improve the accuracy of security detection.
  • step S203 a hidden Markov model is used to perform text anomaly detection on the feature segmentation, and the detection result obtained includes:
  • i-th state For the i-th state, predict the probability distribution of the i+1-th state through the observation sequence of the hidden Markov model, and use the state corresponding to the maximum probability value in the probability distribution as the predicted state corresponding to the i+1-th state , Where i is a positive integer;
  • the detection result is confirmed to be normal
  • the detection result is confirmed to be abnormal.
  • the feature segmentation is converted into each state in the hidden Markov model, and the next state of each state is predicted through the observation sequence of the hidden Markov model, and the prediction result of the next state is compared with the real one. The status is matched. If the match fails, it is confirmed that there is a characteristic word segmentation that may cause a security risk, and the SQL access request flag is abnormal.
  • the hidden Markov model can be tuned by the simple average method or the weighted average method.
  • non-negative weights must be used to ensure that the integrated performance is better than the single best individual learner. Due to insufficient samples or noise in real tasks, sometimes the weights are not completely reliable. Therefore, the simple tuning or the weighted average method needs to be judged according to the actual situation.
  • the Hidden Markov Model is used to quickly determine whether the feature segmentation is reasonable, and then to determine whether the access request is abnormal, which is beneficial to improve the efficiency of security detection.
  • the SQL statement security detection method based on machine learning further includes:
  • the feature segmentation is regarded as an abnormal segmentation, and the detection result is determined to be abnormal.
  • the server saves the feature segmentation corresponding to each abnormal state in the detection log.
  • the feature segmentation stored in the detection log reaches a certain number, and the feature segmentation corresponding to the abnormal state can be used to perform the feature segmentation with the feature segmentation obtained after step S202.
  • the similarity is greater than the preset threshold, it is determined that the feature segmentation obtained in step S202 has an abnormal state, otherwise, the method of step S203 will continue to be used for judgment. This method is beneficial to quickly screen out possible abnormalities. Status of the access request.
  • the detection log is a log file storing the characteristic word segmentation corresponding to the abnormal state.
  • the calculation of the text similarity between the feature segmentation and the reference segmentation can be specifically implemented through Euclidean distance, similarity algorithm, etc.
  • the preset similarity threshold can be set according to actual needs, which is not limited here.
  • the feature segmentation corresponding to the abnormal state of the detection result is obtained from the detection log as a reference segmentation, and then the similarity between the reference segmentation and the obtained feature segmentation is calculated to quickly determine whether the feature segmentation will cause an abnormality State, and then judge the risk of the feature segmentation.
  • the method for detecting SQL statement security based on machine learning further includes performing a secondary check on the data to be detected whose detection result is abnormal. , Specifically including:
  • the scan result is that there are sensitive words in the word segmentation to be verified, the detection result is confirmed to be abnormal.
  • the abnormal access request detected in step S203 is subjected to a secondary check by means of string scanning to ensure the accuracy of interception.
  • lexical segmentation refers to segmenting the sentence to be tested into each segmentation to be verified according to the grammatical rules of the SQL sentence.
  • the preset character verification function refers to a function for verifying characters, which specifically includes but is not limited to logical, isalpha, etc., and a custom function can also be used, which is not limited here.
  • the sensitive vocabulary is a pre-defined vocabulary that has a certain risk to database security, such as user, system, etc.
  • the key characters in the detected abnormal access request can be added to the original sensitive vocabulary.
  • FIG. 3 shows a principle block diagram of a SQL statement security detection device based on machine learning that corresponds to the SQL statement security detection method based on machine learning in the above embodiment one-to-one.
  • the SQL sentence security detection device based on machine learning includes a request parsing module 31, a feature word segmentation module 32, an abnormality detection module 33 and a request interception module 34.
  • the detailed description of each functional module is as follows:
  • the request parsing module 31 is used to parse the access request to obtain the data to be detected when the SQL access request is received;
  • the feature word segmentation module 32 is used to use the TF-IDF algorithm to perform feature extraction on the data to be detected to obtain feature word segmentation;
  • the anomaly detection module 33 is used to perform text anomaly detection on the feature segmentation using the hidden Markov model, and obtain the detection result;
  • the request interception module 34 is configured to, if the detection result is abnormal, confirm that the data to be detected is threatened, and intercept the SQL access request.
  • the feature word segmentation module 32 includes:
  • the data segmentation unit is used to segment the data to be tested by way of word combination to obtain the initial segmentation
  • the word frequency statistics unit is used to count the proportion of the initial word segmentation in the sentence to be tested, and the proportion is used as the word frequency of the initial word segmentation;
  • Frequency statistics unit used to count the reverse document frequency IDF of the initial word segmentation in the preset corpus
  • the word segmentation determining unit is used to calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation belonging to the important feature as the characteristic word segmentation.
  • the abnormality detection module 33 includes:
  • the state conversion unit is used to convert the characteristic word segmentation into a state representation
  • the state prediction unit is used to predict the probability distribution of the i+1th state through the observation sequence of the hidden Markov model for the i-th state, and use the state corresponding to the maximum probability value in the probability distribution as the i+1th state
  • the first matching unit is configured to confirm that the detection result is normal if the i+1th state matches the predicted state corresponding to the i+1th state;
  • the second matching unit is used to confirm that the detection result is abnormal if the i+1th state does not match the corresponding predicted state of the i+1th state.
  • the SQL statement security detection device based on machine learning further includes:
  • the reference word segmentation determination module is used to obtain the abnormal state of the detection result from the detection log, and the corresponding feature word segmentation as a reference word segmentation;
  • the similarity calculation module is used to calculate the text similarity between the feature segmentation and the reference segmentation to obtain the text similarity value
  • the abnormality determination module is used to, if the text similarity value is less than the preset similarity threshold, use the characteristic word segmentation as an abnormal word segmentation, and determine that the detection result is abnormal.
  • the SQL statement security detection device based on machine learning further includes:
  • the lexical segmentation module is used to perform lexical segmentation on the data to be tested based on the SQL statement if the detection result is abnormal to obtain the segmentation to be verified;
  • the sensitive vocabulary verification module is used to use the preset character verification function to verify the sensitive vocabulary of the word to be verified by scanning the string to obtain the scanning result;
  • the result confirmation module is used for confirming that the detection result is abnormal if the scanning result is that there are sensitive words in the word segmentation to be verified.
  • the SQL statement security detection device based on machine learning further includes:
  • the storage module is used to store the characteristic word segmentation and the detection result corresponding to the characteristic word segmentation in the blockchain.
  • Each module in the above-mentioned machine learning-based SQL statement security detection device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other in communication via a system bus. It should be pointed out that the figure only shows the computer device 4 with the components connected to the memory 41, the processor 42, and the network interface 43. However, it should be understood that it is not required to implement all the shown components, and alternative implementations can be made More or fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 41 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or D interface display memory, etc.), random access memory (RAM) , Static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4.
  • the memory 41 may also be an external storage device of the computer device 4, for example, a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, Flash Card, etc.
  • the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device.
  • the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as program codes for controlling electronic files.
  • the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 42 is generally used to control the overall operation of the computer device 4.
  • the processor 42 is configured to run program codes or process data stored in the memory 41, for example, run program codes for controlling electronic files.
  • the network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
  • the computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium stores An interface display program, the interface display program may be executed by at least one processor, so that the at least one processor executes the steps of the SQL statement security detection method based on machine learning as described above.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A machine learning-based SQL statement security testing method and apparatus, a device, and a medium. The method comprises: upon receiving an SQL access request, analyzing the access request, and obtaining test data (S201); using a TF-IDF algorithm to perform feature extraction on the test data, and obtaining feature word segments (S202); using a hidden Markov model to perform text abnormality testing on the feature word segments, and obtaining a test result (S203); if the test result is abnormal, confirming that the test data contains a threat, and performing interception processing on the SQL access request (S204). The present invention implements rapid testing of the reasonability of SQL access requests, and helps to improve security testing efficiency. The present invention also relates to blockchain technology, in that obtained feature word segments and corresponding test results are stored in a blockchain, so as to implement rapid interception of abnormal access requests, and ensure database security.

Description

基于机器学习的SQL语句安全检测方法、装置、设备及介质SQL statement security detection method, device, equipment and medium based on machine learning
本申请要求于2020年5月27日,提交中国专利局、申请号为2020104640093发明名称为“基于机器学习的SQL语句安全检测方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the Chinese Patent Office with the application number 2020104640093 and the invention titled "Machine Learning-based SQL Statement Security Detection Method, Apparatus, Equipment and Medium" on May 27, 2020, all of which The content is incorporated in this application by reference.
技术领域Technical field
本申请涉及网络安全领域,尤其涉及一种基于机器学习的SQL语句安全检测方法、装置、设备及介质。This application relates to the field of network security, and in particular to a method, device, equipment and medium for SQL statement security detection based on machine learning.
背景技术Background technique
随着web应用的不断发展,web安全问题也日益突出。SQL注入漏洞是web安全问题中最常见的漏洞,成功利用SQL注入可导致数据库信息泄漏、数据库被恶意操作,甚至远程控制服务器,执行任意操作等,其危害也极其严重。对于恶意SQL注入请求的自动检测,当前主要做法是基于黑规则的检测,在实现本申请的过程中,发明人意识到现有技术至少存在如下问题:由于SQL访问请求比较频繁,这种基于规则库的安全防御被动、滞后,无法检测未知的攻击,并且耗时较长,导致数据库安全检测效率低。With the continuous development of web applications, web security issues have become increasingly prominent. SQL injection vulnerabilities are the most common vulnerabilities in web security problems. Successful use of SQL injection can lead to database information leakage, malicious operation of the database, and even remote control of the server, performing arbitrary operations, etc. The harm is also extremely serious. For the automatic detection of malicious SQL injection requests, the current main approach is detection based on black rules. In the process of implementing this application, the inventor realized that the prior art has at least the following problems: due to frequent SQL access requests, this rule-based The security defense of the database is passive and lagging, unable to detect unknown attacks, and takes a long time, resulting in low database security detection efficiency.
发明内容Summary of the invention
本申请实施例提供一种基于机器学习的SQL语句安全检测方法、装置、设备和介质,以提高数据库安全检测效率。The embodiments of the present application provide a SQL statement security detection method, device, equipment, and medium based on machine learning, so as to improve the efficiency of database security detection.
为了解决上述技术问题,本申请实施例提供一种基于机器学习的SQL语句安全检测方法,包括:In order to solve the above technical problems, an embodiment of the present application provides a SQL statement security detection method based on machine learning, including:
在接收到SQL访问请求时,对所述访问请求进行解析,得到待检测数据;When the SQL access request is received, the access request is parsed to obtain the data to be detected;
采用TF-IDF算法,对所述待检测数据进行特征提取,得到特征分词;Using the TF-IDF algorithm, perform feature extraction on the to-be-detected data to obtain feature word segmentation;
使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果;Use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;
若所述检测结果为存在异常,则确认所述待检测数据存在威胁,并对SQL访问请求进行拦截处理。If the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
为了解决上述技术问题,本申请实施例还提供一种基于机器学习的SQL语句安全检测装置,包括:In order to solve the above technical problems, an embodiment of the present application also provides a SQL statement security detection device based on machine learning, including:
请求解析模块,用于在接收到SQL访问请求时,对所述访问请求进行解析,得到待检测数据;The request parsing module is used to parse the access request to obtain the data to be detected when the SQL access request is received;
特征分词模块,用于采用TF-IDF算法,对所述待检测数据进行特征提取,得到特征 分词;The feature word segmentation module is used to use the TF-IDF algorithm to perform feature extraction on the data to be detected to obtain the feature word segmentation;
异常检测模块,用于使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果;An anomaly detection module, configured to use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;
请求拦截模块,用于若所述检测结果为存在异常,则确认所述待检测数据存在威胁,并对SQL访问请求进行拦截处理。The request interception module is configured to, if the detection result is abnormal, confirm that the data to be detected is threatened, and intercept the SQL access request.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:In order to solve the above technical problems, an embodiment of the present application also provides a computer device, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes all The following steps are implemented when the computer-readable instructions are described:
在接收到SQL访问请求时,对所述访问请求进行解析,得到待检测数据;When the SQL access request is received, the access request is parsed to obtain the data to be detected;
采用TF-IDF算法,对所述待检测数据进行特征提取,得到特征分词;Using the TF-IDF algorithm, perform feature extraction on the to-be-detected data to obtain feature word segmentation;
使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果;Use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;
若所述检测结果为存在异常,则确认所述待检测数据存在威胁,并对SQL访问请求进行拦截处理。If the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下步骤:In order to solve the above technical problems, embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions implement the following steps when executed by a processor:
在接收到SQL访问请求时,对所述访问请求进行解析,得到待检测数据;When the SQL access request is received, the access request is parsed to obtain the data to be detected;
采用TF-IDF算法,对所述待检测数据进行特征提取,得到特征分词;Using the TF-IDF algorithm, perform feature extraction on the to-be-detected data to obtain feature word segmentation;
使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果;Use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;
若所述检测结果为存在异常,则确认所述待检测数据存在威胁,并对SQL访问请求进行拦截处理。If the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
本申请实施例提供的基于机器学习的SQL语句安全检测方法、装置、设备及介质,在接收到SQL访问请求时,对访问请求进行解析,得到待检测数据,采用TF-IDF算法,对待检测数据进行特征提取,得到特征分词,使用隐马尔可夫模型对特征分词进行文本异常检测,得到检测结果,实现对SQL访问请求的合理性的快速检测,有利于提高安全检测的效率,在检测结果为存在异常时,确认待检测数据存在威胁,并对SQL访问请求进行拦截处理,实现快速对异常的访问请求进行拦截,确保数据库的安全性。The SQL statement security detection method, device, device, and medium based on machine learning provided by the embodiments of the present application, when receiving a SQL access request, parse the access request to obtain the data to be detected, and use the TF-IDF algorithm to detect the data to be detected. Perform feature extraction to obtain feature segmentation, use hidden Markov model to perform text anomaly detection on feature segmentation, obtain the detection result, and realize the rapid detection of the rationality of SQL access requests, which is conducive to improving the efficiency of security detection. The detection result is When there is an abnormality, confirm that the data to be detected is threatened, and intercept the SQL access request, so as to quickly intercept the abnormal access request and ensure the security of the database.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.
图1是本申请可以应用于其中的示例性系统架构图;Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
图2是本申请的基于机器学习的SQL语句安全检测方法的一个实施例的流程图;FIG. 2 is a flowchart of an embodiment of the SQL statement security detection method based on machine learning of the present application;
图3是根据本申请的基于机器学习的SQL语句安全检测装置的一个实施例的结构示 意图;Fig. 3 is a schematic structural diagram of an embodiment of a SQL sentence security detection device based on machine learning according to the present application;
图4是根据本申请的计算机设备的一个实施例的结构示意图。Fig. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
具体实施方式Detailed ways
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field of the application; the terms used in the specification of the application herein are only for describing specific embodiments. The purpose is not to limit the application; the terms "including" and "having" in the specification and claims of the application and the above-mentioned description of the drawings and any variations thereof are intended to cover non-exclusive inclusions. The terms "first", "second", etc. in the specification and claims of the present application or the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
请参阅图1,如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。Please refer to FIG. 1. As shown in FIG. 1, the system architecture 100 may include terminal devices 101, 102, and 103, a network 104 and a server 105. The network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。The user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture E界面显示perts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture E界面显示perts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, 103 may be various electronic devices with a display screen and support web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture E interface display perts Group Audio Layer III. The moving picture expert compresses the standard audio layer 3), MP4 (Moving Picture E interface displays perts Group Audio Layer IV, the moving picture expert compresses the standard audio layer 4) player, laptop portable computer and desktop computer, etc.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
需要说明的是,本申请实施例所提供的基于机器学习的SQL语句安全检测方法由服务器执行,相应地,基于机器学习的SQL语句安全检测装置设置于服务器中。It should be noted that the SQL statement security detection method based on machine learning provided by the embodiment of the present application is executed by the server, and accordingly, the SQL statement security detection device based on machine learning is provided in the server.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器,本申请实施例中的终端设备101、102、103具体可以对应的是实际生产中的应用系统。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there may be any number of terminal devices, networks, and servers. The terminal devices 101, 102, and 103 in the embodiments of the present application may specifically correspond to application systems in actual production.
请参阅图2,图2示出本申请实施例提供的一种基于机器学习的SQL语句安全检测方法,以该方法应用在图1中的服务端为例进行说明,详述如下:Please refer to FIG. 2. FIG. 2 shows a SQL statement security detection method based on machine learning provided by an embodiment of the present application. The method is applied to the server in FIG. 1 as an example for description, and the details are as follows:
S201:在接收到SQL访问请求时,对访问请求进行解析,得到待检测数据。S201: When the SQL access request is received, the access request is parsed to obtain the data to be detected.
具体地,客户端在进行数据库访问时,先与服务端连接,进而将包含请求参数的数据库访问请求发送给服务端,服务端根据该请求从数据库中获取客户端需要访问的数据并返回给客户端,在这个过程中,如果请求参数中包含一些非法字符,可能会对数据库带来危害,这些非法字符包括但不限于:恶意SQL注入指令、越权指令和SQL攻击指令等,因而,在服务端对访问请求进行解析并执行之前,有必要对请求参数进行安全性的检测,本实施例中,服务端在接收到SQL访问请求时,获取访问请求中包含的请求参数,并对参数进行解析,得到待检测数据。Specifically, when the client accesses the database, it first connects to the server, and then sends a database access request containing the request parameters to the server, and the server obtains the data that the client needs to access from the database according to the request and returns it to the client In this process, if the request parameters contain some illegal characters, it may cause harm to the database. These illegal characters include but are not limited to: malicious SQL injection instructions, unauthorized instructions, and SQL attack instructions. Before parsing and executing the access request, it is necessary to check the security of the request parameters. In this embodiment, when the server receives the SQL access request, it obtains the request parameters contained in the access request and parses the parameters. Get the data to be tested.
其中,对访问请求进行解析是指获取访问请求中包含的请求参数,进而根据请求参数,确定待检测的SQL语句。Among them, parsing the access request refers to obtaining request parameters included in the access request, and then determining the SQL statement to be detected according to the request parameters.
其中,请求参数是指访问请求中包含的属性字段,具体可以是一个统一资源定位符URL、一个操作指令或者一段匹配规则,在本实施例中,主要对访问请求中的POST参数、GET参数和COOKIE参数等属性字段进行检测。Wherein, the request parameter refers to the attribute field included in the access request, which can specifically be a uniform resource locator URL, an operation instruction or a matching rule. In this embodiment, it mainly refers to the POST parameters, GET parameters, and GET parameters in the access request. Attribute fields such as COOKIE parameters are detected.
S202:采用TF-IDF算法,对待检测数据进行特征提取,得到特征分词。S202: Using the TF-IDF algorithm, perform feature extraction on the data to be detected to obtain feature word segmentation.
具体地,通过TF-IDF算法,对待检测数据进行特征提取,得到能体现待检测数据核心特征的分词,作为特征分词。Specifically, through the TF-IDF algorithm, feature extraction is performed on the data to be detected, and a word segmentation that can reflect the core characteristics of the data to be detected is obtained as a feature word segmentation.
其中,TF-IDF(term frequency–inverse document frequency)是一种用于信息检索与数据挖掘的常用加权技术。TF意思是词频(Term Frequency),IDF意思是逆文本频率指数(Inverse Document Frequency),TF-IDF是一种统计方法,用以评估一字词对于一个文件集或一个语料库中的其中一份文件的重要程度。字词的重要性随着它在文件中出现的次数成正比增加,但同时会随着它在语料库中出现的频率成反比下降。Among them, TF-IDF (term frequency-inverse document frequency) is a commonly used weighting technique for information retrieval and data mining. TF means term frequency (Term Frequency), IDF means inverse document frequency index (Inverse Document Frequency), TF-IDF is a statistical method used to evaluate a word for a document set or one of the documents in a corpus The degree of importance. The importance of a word increases in proportion to the number of times it appears in the document, but at the same time it decreases in inverse proportion to the frequency of its appearance in the corpus.
在本实施例中,对待检测数据进行分词泛华,再将得到的分词在待检测数据中的比重,作为词频TF,并将该分词在预设语料中的出现频率,作为逆文本频率指数IDF,进而计算得到该分词对应的TF-IDF,并依据该TF-IDF是否处于预设范围,来判断将该分词作为独立分词的准确性,在该分词的TF-IDF处于预设范围时,确定该分词很大概率是可读的字符串,就可以对请求进行分割,获取其单词构成,并进行向量化。In this embodiment, the word segmentation is performed on the data to be detected, and then the proportion of the obtained word segmentation in the data to be detected is used as the word frequency TF, and the appearance frequency of the word segmentation in the preset corpus is used as the inverse text frequency index IDF , And then calculate the TF-IDF corresponding to the word segmentation, and determine the accuracy of the word segmentation as an independent word segmentation based on whether the TF-IDF is in the preset range, and determine when the TF-IDF of the word segmentation is in the preset range The word segmentation has a high probability of being a readable string, and then the request can be segmented, and its word composition can be obtained and vectorized.
其中,特征提取包括分词、泛化、特征向量转化。Among them, feature extraction includes word segmentation, generalization, and feature vector transformation.
例如,在一具体实施方式中,一待检测数据为一统一资源标识符URL,其具体为“publico/anadir.jsp?id=2&nombre=Jam%F3n+Ib%E9rico&precio=85&cantidad=%27%3B+DROP+TABLE+usuarios%3B+SELECT+*+FROM+datos+WHRE+no mbre+LIKE+%27%25&B1=A%F1adir+al+carrito”,经过分词泛华之后,得到“publico anadir jsp id 2 nombre Jam F3n Ib E9rico precio 85 cantidad 27 3B DROP TABLE usuarios 3B SELECT*FROM datos WHERE nombre LIKE 27 25 B1 A F1adir al carrito”。For example, in a specific embodiment, a data to be detected is a uniform resource identifier URL, which is specifically "publico/anadir.jsp?id=2&nombre=Jam%F3n+Ib%E9rico&precio=85&cantidad=%27%3B+ DROP+TABLE+usuarios%3B+SELECT+*+FROM+datos+WHRE+nombre+LIKE+%27%25&B1=A%F1adir+al+carrito", after the word segmentation, we get "publico analysis jsp id 2 nombre Jam F3n Ib E9rico precio 85 cantidad 27 3B DROP TABLE usuarios 3B SELECT*FROM datos WHERE nombre LIKE 27 25 B1 A F1adir al carrito".
S203:使用隐马尔可夫模型对特征分词进行文本异常检测,得到检测结果。S203: Use the hidden Markov model to perform text anomaly detection on the feature segmentation, and obtain the detection result.
具体地,在得到待检测语句的特征分词后,将特征分词输入到预先训练好的隐马尔可夫模型中,通过该隐马尔可夫模型对特征分词中是否存在异常文本进行检测。Specifically, after the feature segmentation of the sentence to be detected is obtained, the feature segmentation is input into a pre-trained hidden Markov model, and the presence of abnormal text in the feature segmentation is detected through the hidden Markov model.
其中,隐马尔可夫模型(Hidden Markov Model,HMM)是统计模型,它用来描述一个含有隐含未知参数的马尔可夫过程。隐马尔可夫模型是马尔可夫链的一种,它的状态不能直接观察到,但能通过观测向量序列观察到,每个观测向量都是通过某些概率密度分布表现为各种状态,每一个观测向量是由一个具有相应概率密度分布的状态序列产生。Among them, Hidden Markov Model (HMM) is a statistical model, which is used to describe a Markov process with hidden unknown parameters. Hidden Markov model is a kind of Markov chain. Its state cannot be directly observed, but it can be observed through a sequence of observation vectors. Each observation vector is expressed in various states through certain probability density distributions. An observation vector is generated by a sequence of states with a corresponding probability density distribution.
其中,文本异常检测是指检测待检测数据中,是否包含非法的文本字符,IE非法的文本字符可能成为数据库的访问安全隐患。Among them, text anomaly detection refers to detecting whether the data to be detected contains illegal text characters. The illegal text characters of IE may become a hidden danger of database access security.
使用隐马尔可夫模型对特征分词进行文本异常检测,得到检测结果的具体过程,可参考后续实施例的描述,为避免重复,此处不再赘述。The hidden Markov model is used to perform text abnormality detection on the feature segmentation, and the specific process of obtaining the detection result can be referred to the description of the subsequent embodiments. In order to avoid repetition, it will not be repeated here.
S204:若检测结果为存在异常,则确认待检测数据存在威胁,并对SQL访问请求进行拦截处理。S204: If the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
具体地,在检测结果为存在异常时,也即,待检测语句中包含可能对数据库数据存在威胁的非法字符,此时,对该SQL访问请求执行拦截处理,确保数据库的安全。Specifically, when the detection result is abnormal, that is, the sentence to be detected contains illegal characters that may threaten the database data, at this time, the SQL access request is intercepted to ensure the security of the database.
在本实施例中,在接收到SQL访问请求时,对访问请求进行解析,得到待检测数据,采用TF-IDF算法,对待检测数据进行特征提取,得到特征分词,使用隐马尔可夫模型对特征分词进行文本异常检测,得到检测结果,实现对SQL访问请求的合理性的快速检测,有利于提高安全检测的效率,在检测结果为存在异常时,确认待检测数据存在威胁,并对SQL访问请求进行拦截处理,实现快速对异常的访问请求进行拦截,确保数据库的安全性。In this embodiment, when the SQL access request is received, the access request is parsed to obtain the data to be detected, and the TF-IDF algorithm is used to extract the features of the data to be detected to obtain the feature word segmentation, and the hidden Markov model is used to analyze the features. Word segmentation performs text anomaly detection and obtains the detection results to realize rapid detection of the rationality of SQL access requests, which is conducive to improving the efficiency of security detection. When the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is made Intercept processing to achieve rapid interception of abnormal access requests to ensure the security of the database.
在一实施例中,可将得到的特征分词和特征分词对应的检测结果保存在区块链网络上,通过区块链存储,实现数据信息在不同平台之间的共享,也可防止数据被篡改。In an embodiment, the obtained characteristic word segmentation and the detection result corresponding to the characteristic word segmentation can be saved on the blockchain network, and the data information can be shared between different platforms through the blockchain storage, and the data can also be prevented from being tampered with .
其中,区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层。Among them, the blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
在本实施例的一些可选的实现方式中,步骤S202中,采用TF-IDF算法,对待检测数据进行特征提取,得到特征分词包括:In some optional implementation manners of this embodiment, in step S202, using the TF-IDF algorithm to perform feature extraction on the data to be detected to obtain feature segmentation includes:
通过单词组合的方式,对待检测数据来进行分词划分,得到初始分词;Through word combination, the data to be tested is divided into words to obtain the initial word segmentation;
统计初始分词在待检测语句中的比重,将比重作为初始分词的词频;Count the proportion of the initial word segmentation in the sentence to be tested, and use the proportion as the word frequency of the initial word segmentation;
统计初始分词在预设语料库中的逆向文件频率IDF;Count the IDF of the reverse document frequency of the initial word segmentation in the preset corpus;
计算初始分词的词频TF与逆向文件频率IDF的乘积,并根据乘积确定初始分词是否为重要特征,将属于重要特征的初始分词,确定为特征分词。Calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation that belongs to the important feature as the feature segmentation.
具体地,通过TF-IDF算法,对待检测数据进行特征提取,得到特征分词。Specifically, through the TF-IDF algorithm, feature extraction is performed on the data to be detected to obtain feature word segmentation.
需要说明的是,本实施例中的分词,不可简单理解为“词语”的划分,其具体也可以是划分为一个字符串,例如“SELECT*FROM”。It should be noted that the word segmentation in this embodiment cannot be simply understood as the division of "words", and it can specifically be divided into a character string, such as "SELECT*FROM".
采用如下公式计算初始分词在预设语料库中的逆向文件频率IDF:The following formula is used to calculate the reverse document frequency IDF of the initial word segmentation in the preset corpus:
Figure PCTCN2020136341-appb-000001
Figure PCTCN2020136341-appb-000001
其中,|D|是语料库中的文件总数。|{j:t i∈d j}|表示包含第i个初始分词t i的文件数目(即n i,j≠0的数目)。如果该初始分词不在语料库中,就会导致分母为零,因此,一般情况下使用|{j:t i∈d j}|+1作为分母,避免分母为0的情况。 Among them, |D| is the total number of files in the corpus. |{j: t i ∈ d j }| represents the number of files containing the i-th initial participle t i (ie, the number of n i, j ≠ 0). If the initial participle is not in the corpus, it will cause the denominator to be zero. Therefore, in general, use |{j:t i ∈d j }|+1 as the denominator to avoid the case where the denominator is 0.
根据乘积确定初始分词是否为重要特征,具体可以是将历史SQL攻击的分词结果作为参考依据,设置一个阈值,在乘积达到阈值,将该分词确定为重要特征。Determine whether the initial word segmentation is an important feature according to the product. Specifically, it can be based on the word segmentation results of historical SQL attacks as a reference basis and set a threshold. When the product reaches the threshold, the word segmentation is determined as an important feature.
在本实施例中,通过TF-IDF算法,从待检测数据特征与SQL安全访问相关的重要特征,作为特征分词,后续使用该特征分词进行安全检测,有利于提高安全检测的准确率。In this embodiment, the TF-IDF algorithm is used to identify important features related to SQL security access from the features of the data to be detected as feature segmentation, and subsequent use of the feature segmentation for security detection is beneficial to improve the accuracy of security detection.
在本实施例的一些可选的实现方式中,步骤S203中,使用隐马尔可夫模型对特征分词进行文本异常检测,得到检测结果包括:In some optional implementation manners of this embodiment, in step S203, a hidden Markov model is used to perform text anomaly detection on the feature segmentation, and the detection result obtained includes:
将特征分词转化为状态表示;Convert characteristic word segmentation into state representation;
针对第i个状态,通过隐马尔可夫模型的观测序列,预测第i+1个状态的概率分布,将概率分布中,最大概率值对应的状态,作为第i+1个状态对应的预测状态,其中,i为正整数;For the i-th state, predict the probability distribution of the i+1-th state through the observation sequence of the hidden Markov model, and use the state corresponding to the maximum probability value in the probability distribution as the predicted state corresponding to the i+1-th state , Where i is a positive integer;
若第i+1个状态与第i+1个状态对应的预测状态匹配,则确认检测结果为正常;If the i+1th state matches the predicted state corresponding to the i+1th state, the detection result is confirmed to be normal;
若第i+1个状态与第i+1个状态对应预测状态不匹配,则确认检测结果为异常。If the i+1th state does not match the predicted state corresponding to the i+1th state, the detection result is confirmed to be abnormal.
具体地,将特征分词转化为隐马尔可夫模型中的每个状态,通过隐马尔可夫模型的观测序列,对每个状态的下一个状态进行预测,并将下一个状态的预测结果与真实状态进行匹配,若匹配失败,则确认存在可能导致安全风险的特征分词,将SQL访问请求标记位异常。Specifically, the feature segmentation is converted into each state in the hidden Markov model, and the next state of each state is predicted through the observation sequence of the hidden Markov model, and the prediction result of the next state is compared with the real one. The status is matched. If the match fails, it is confirmed that there is a characteristic word segmentation that may cause a security risk, and the SQL access request flag is abnormal.
需要说明的是,本实施例在实施的过程中,可以通过简单平均法或者加权平均法来对隐马尔可夫模型进行调优。同时必须使用非负权重才能确保集成性能优于单一最佳个体学习器。由于现实任务中样本不充分或者存在噪声,有时候使得权重不完全可靠,因此使用简单调优还是加权平均法需要按照实际情况来判断使用。It should be noted that in the implementation process of this embodiment, the hidden Markov model can be tuned by the simple average method or the weighted average method. At the same time, non-negative weights must be used to ensure that the integrated performance is better than the single best individual learner. Due to insufficient samples or noise in real tasks, sometimes the weights are not completely reliable. Therefore, the simple tuning or the weighted average method needs to be judged according to the actual situation.
在本实施例中,通过隐马尔可夫模型来快速判断特征分词是否合理,进而确定访问请求是否异常,有利于提高安全检测的效率。In this embodiment, the Hidden Markov Model is used to quickly determine whether the feature segmentation is reasonable, and then to determine whether the access request is abnormal, which is beneficial to improve the efficiency of security detection.
在本实施例的一些可选的实现方式中,基于机器学习的SQL语句安全检测方法还包括:In some optional implementation manners of this embodiment, the SQL statement security detection method based on machine learning further includes:
从检测日志中,获取检测结果为异常的状态,对应的特征分词,作为参考分词;From the detection log, obtain the abnormal state of the detection result, and the corresponding feature word segmentation as a reference word segmentation;
计算特征分词与参考分词的文本相似度,得到文本相似度值;Calculate the text similarity between the feature segmentation and the reference segmentation to obtain the text similarity value;
若文本相似度值小于预设相似度阈值,则将特征分词作为异常分词,并确定检测结果 为异常。If the text similarity value is less than the preset similarity threshold, the feature segmentation is regarded as an abnormal segmentation, and the detection result is determined to be abnormal.
具体地,服务端将每次异常状态对应的特征分词保存到检测日志中,在检测日志中保存的特征分词达到一定数量,可采用异常状态对应的特征分词来与步骤S202之后得到的特征分词进行相似度计算,若相似度大于预设阈值,则确定步骤S202中得到的特征分词存在异常状态,否则,将继续使用步骤S203的方式进行判断,采用这种方式,有利于快速筛选出可能产生异常状态的访问请求。Specifically, the server saves the feature segmentation corresponding to each abnormal state in the detection log. The feature segmentation stored in the detection log reaches a certain number, and the feature segmentation corresponding to the abnormal state can be used to perform the feature segmentation with the feature segmentation obtained after step S202. For similarity calculation, if the similarity is greater than the preset threshold, it is determined that the feature segmentation obtained in step S202 has an abnormal state, otherwise, the method of step S203 will continue to be used for judgment. This method is beneficial to quickly screen out possible abnormalities. Status of the access request.
其中,检测日志为存储有异常状态对应的特征分词的日志文件。Among them, the detection log is a log file storing the characteristic word segmentation corresponding to the abnormal state.
其中,计算特征分词与参考分词的文本相似度,具体可以通过欧式距离、相似度算法等来实现。Among them, the calculation of the text similarity between the feature segmentation and the reference segmentation can be specifically implemented through Euclidean distance, similarity algorithm, etc.
其中,预设相似度阈值可根据实际需要来设定,此处不做限定。Among them, the preset similarity threshold can be set according to actual needs, which is not limited here.
在本实施例中,通过从检测日志中,获取检测结果为异常的状态对应的特征分词,作为参考分词,再计算参考分词与得到的特征分词的相似度,来快速确定特征分词是否会导致异常状态,进而判断该特征分词的风险。In this embodiment, the feature segmentation corresponding to the abnormal state of the detection result is obtained from the detection log as a reference segmentation, and then the similarity between the reference segmentation and the obtained feature segmentation is calculated to quickly determine whether the feature segmentation will cause an abnormality State, and then judge the risk of the feature segmentation.
在本实施例的一些可选的实现方式中,在步骤S203之后,并且,步骤S204之前,该基于机器学习的SQL语句安全检测方法还包括对检测结果为异常的待检测数据进行二次校验,具体包括:In some optional implementations of this embodiment, after step S203 and before step S204, the method for detecting SQL statement security based on machine learning further includes performing a secondary check on the data to be detected whose detection result is abnormal. , Specifically including:
若检测结果为异常,则基于SQL语句对待检测数据进行词法分词,得到待验证分词;If the detection result is abnormal, perform lexical segmentation of the data to be tested based on the SQL statement to obtain the segmentation to be verified;
使用预设的字符验证函数,通过字符串扫描的方式,对待验证分词进行敏感词汇验证,得到扫描结果;Use the preset character verification function to verify the sensitive vocabulary of the word to be verified by scanning the string, and obtain the scan result;
若扫描结果为待验证分词中存在敏感词汇,则确认检测结果为存在异常。If the scan result is that there are sensitive words in the word segmentation to be verified, the detection result is confirmed to be abnormal.
具体地,在数据库开放正常访问时,接收到的访问请求比较频繁,使用基于关键词的方式,对每个访问请求进行安全检测,会耗费大量的时间,导致效率低下,同时,也容易引起访问请求来不及处理导致的访问失败,采用隐马尔可夫模型有效提高检测效率,快速找出可能存在威胁的访问请求,为进一步验证隐马尔可夫模型检测出的威胁访问请求的准确性,确保正常访问请问不会被误判而拦截,本实施例通过字符串扫描的方式,针对步骤S203中检测的异常访问请求,进行二次校验,确保拦截的准确率。Specifically, when the database is open for normal access, access requests are received more frequently. Using keyword-based methods to perform security checks on each access request will consume a lot of time, resulting in inefficiency, and at the same time, it is also easy to cause access The access failure caused by the request is too late to process, the hidden Markov model is used to effectively improve the detection efficiency, and the access request that may be threatened can be quickly identified, in order to further verify the accuracy of the threat access request detected by the hidden Markov model and ensure normal access Excuse me, it will not be intercepted by misjudgment. In this embodiment, the abnormal access request detected in step S203 is subjected to a secondary check by means of string scanning to ensure the accuracy of interception.
其中,词法分词是指按照SQL语句的语法规则,将待检测语句分割成每个待验证分词。Among them, lexical segmentation refers to segmenting the sentence to be tested into each segmentation to be verified according to the grammatical rules of the SQL sentence.
其中,预设的字符验证函数是指用于对字符进行验证的函数,具体包括但不限于:islogical、isalpha等,也可以使用自定义的函数,此处不做限制。Among them, the preset character verification function refers to a function for verifying characters, which specifically includes but is not limited to logical, isalpha, etc., and a custom function can also be used, which is not limited here.
其中,敏感词汇为预先定义的对数据库安全具有一定风险的词汇,例如:user,system等,在可以将检测到的异常访问请求中的关键字符,加入到原有的敏感词汇之中。Among them, the sensitive vocabulary is a pre-defined vocabulary that has a certain risk to database security, such as user, system, etc. The key characters in the detected abnormal access request can be added to the original sensitive vocabulary.
在本实施例中,通过对隐马尔可夫模型检测出的异常访问请求进行二次检测,确保异常判断的准确性,有利于提高访问拦截的合理性和准确性。In this embodiment, by performing secondary detection on the abnormal access request detected by the hidden Markov model, the accuracy of abnormal judgment is ensured, which is beneficial to improve the rationality and accuracy of access interception.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.
图3示出与上述实施例基于机器学习的SQL语句安全检测方法一一对应的基于机器学习的SQL语句安全检测装置的原理框图。如图3所示,该基于机器学习的SQL语句安全检测装置包括请求解析模块31、特征分词模块32、异常检测模块33和请求拦截模块34。各功能模块详细说明如下:FIG. 3 shows a principle block diagram of a SQL statement security detection device based on machine learning that corresponds to the SQL statement security detection method based on machine learning in the above embodiment one-to-one. As shown in FIG. 3, the SQL sentence security detection device based on machine learning includes a request parsing module 31, a feature word segmentation module 32, an abnormality detection module 33 and a request interception module 34. The detailed description of each functional module is as follows:
请求解析模块31,用于在接收到SQL访问请求时,对访问请求进行解析,得到待检测数据;The request parsing module 31 is used to parse the access request to obtain the data to be detected when the SQL access request is received;
特征分词模块32,用于采用TF-IDF算法,对待检测数据进行特征提取,得到特征分词;The feature word segmentation module 32 is used to use the TF-IDF algorithm to perform feature extraction on the data to be detected to obtain feature word segmentation;
异常检测模块33,用于使用隐马尔可夫模型对特征分词进行文本异常检测,得到检测结果;The anomaly detection module 33 is used to perform text anomaly detection on the feature segmentation using the hidden Markov model, and obtain the detection result;
请求拦截模块34,用于若检测结果为存在异常,则确认待检测数据存在威胁,并对SQL访问请求进行拦截处理。The request interception module 34 is configured to, if the detection result is abnormal, confirm that the data to be detected is threatened, and intercept the SQL access request.
可选地,特征分词模块32包括:Optionally, the feature word segmentation module 32 includes:
数据分词单元,用于通过单词组合的方式,对待检测数据来进行分词划分,得到初始分词;The data segmentation unit is used to segment the data to be tested by way of word combination to obtain the initial segmentation;
词频统计单元,用于统计初始分词在待检测语句中的比重,将比重作为初始分词的词频;The word frequency statistics unit is used to count the proportion of the initial word segmentation in the sentence to be tested, and the proportion is used as the word frequency of the initial word segmentation;
频率统计单元,用于统计初始分词在预设语料库中的逆向文件频率IDF;Frequency statistics unit, used to count the reverse document frequency IDF of the initial word segmentation in the preset corpus;
分词确定单元,用于计算初始分词的词频TF与逆向文件频率IDF的乘积,并根据乘积确定初始分词是否为重要特征,将属于重要特征的初始分词,确定为特征分词。The word segmentation determining unit is used to calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation belonging to the important feature as the characteristic word segmentation.
可选地,异常检测模块33包括:Optionally, the abnormality detection module 33 includes:
状态转化单元,用于将特征分词转化为状态表示;The state conversion unit is used to convert the characteristic word segmentation into a state representation;
状态预测单元,用于针对第i个状态,通过隐马尔可夫模型的观测序列,预测第i+1个状态的概率分布,将概率分布中,最大概率值对应的状态,作为第i+1个状态对应的预测状态,其中,i为正整数;The state prediction unit is used to predict the probability distribution of the i+1th state through the observation sequence of the hidden Markov model for the i-th state, and use the state corresponding to the maximum probability value in the probability distribution as the i+1th state The predicted state corresponding to each state, where i is a positive integer;
第一匹配单元,用于若第i+1个状态与第i+1个状态对应的预测状态匹配,则确认检测结果为正常;The first matching unit is configured to confirm that the detection result is normal if the i+1th state matches the predicted state corresponding to the i+1th state;
第二匹配单元,用于若第i+1个状态与第i+1个状态对应预测状态不匹配,则确认检测结果为异常。The second matching unit is used to confirm that the detection result is abnormal if the i+1th state does not match the corresponding predicted state of the i+1th state.
可选地,基于机器学习的SQL语句安全检测装置还包括:Optionally, the SQL statement security detection device based on machine learning further includes:
参考分词确定模块,用于从检测日志中,获取检测结果为异常的状态,对应的特征分词,作为参考分词;The reference word segmentation determination module is used to obtain the abnormal state of the detection result from the detection log, and the corresponding feature word segmentation as a reference word segmentation;
相似度计算模块,用于计算特征分词与参考分词的文本相似度,得到文本相似度值;The similarity calculation module is used to calculate the text similarity between the feature segmentation and the reference segmentation to obtain the text similarity value;
异常判定模块,用于若文本相似度值小于预设相似度阈值,则将特征分词作为异常分词,并确定检测结果为异常。The abnormality determination module is used to, if the text similarity value is less than the preset similarity threshold, use the characteristic word segmentation as an abnormal word segmentation, and determine that the detection result is abnormal.
可选地,基于机器学习的SQL语句安全检测装置还包括:Optionally, the SQL statement security detection device based on machine learning further includes:
词法分词模块,用于若检测结果为异常,则基于SQL语句对待检测数据进行词法分词,得到待验证分词;The lexical segmentation module is used to perform lexical segmentation on the data to be tested based on the SQL statement if the detection result is abnormal to obtain the segmentation to be verified;
敏感词汇验证模块,用于使用预设的字符验证函数,通过字符串扫描的方式,对待验证分词进行敏感词汇验证,得到扫描结果;The sensitive vocabulary verification module is used to use the preset character verification function to verify the sensitive vocabulary of the word to be verified by scanning the string to obtain the scanning result;
结果确认模块,用于若扫描结果为待验证分词中存在敏感词汇,则确认检测结果为存在异常。The result confirmation module is used for confirming that the detection result is abnormal if the scanning result is that there are sensitive words in the word segmentation to be verified.
可选地,该基于机器学习的SQL语句安全检测装置还包括:Optionally, the SQL statement security detection device based on machine learning further includes:
存储模块,用于将特征分词和特征分词对应的检测结果存储至区块链中。The storage module is used to store the characteristic word segmentation and the detection result corresponding to the characteristic word segmentation in the blockchain.
关于基于机器学习的SQL语句安全检测装置的具体限定可以参见上文中对于基于机器学习的SQL语句安全检测方法的限定,在此不再赘述。上述基于机器学习的SQL语句安全检测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the SQL statement security detection device based on machine learning, please refer to the above limitation on the SQL statement security detection method based on machine learning, which will not be repeated here. Each module in the above-mentioned machine learning-based SQL statement security detection device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。In order to solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 4 for details. FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.
所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43。需要指出的是,图中仅示出了具有组件连接存储器41、处理器42、网络接口43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other in communication via a system bus. It should be pointed out that the figure only shows the computer device 4 with the components connected to the memory 41, the processor 42, and the network interface 43. However, it should be understood that it is not required to implement all the shown components, and alternative implementations can be made More or fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
所述存储器41至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或D界面显示存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以 既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如电子文件的控制的程序代码等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 41 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or D interface display memory, etc.), random access memory (RAM) , Static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, for example, a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, Flash Card, etc. Of course, the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device. In this embodiment, the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as program codes for controlling electronic files. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的程序代码或者处理数据,例如运行电子文件的控制的程序代码。The processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 42 is generally used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to run program codes or process data stored in the memory 41, for example, run program codes for controlling electronic files.
所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。The network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质存储有界面显示程序,所述界面显示程序可被至少一个处理器执行,以使所述至少一个处理器执行如上述的基于机器学习的SQL语句安全检测方法的步骤。This application also provides another implementation manner, that is, a computer-readable storage medium is provided. The computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium stores An interface display program, the interface display program may be executed by at least one processor, so that the at least one processor executes the steps of the SQL statement security detection method based on machine learning as described above.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the embodiments described above are only a part of the embodiments of the present application, rather than all of the embodiments. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. The present application can be implemented in many different forms. On the contrary, the purpose of providing these examples is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although this application has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it is still possible for those skilled in the art to modify the technical solutions described in each of the foregoing specific embodiments, or equivalently replace some of the technical features. . All equivalent structures made by using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are similarly within the scope of patent protection of this application.

Claims (20)

  1. 一种基于机器学习的SQL语句安全检测方法,其中,所述基于机器学习的SQL语句安全检测方法包括:A SQL statement security detection method based on machine learning, wherein the SQL statement security detection method based on machine learning includes:
    在接收到SQL访问请求时,对所述访问请求进行解析,得到待检测数据;When the SQL access request is received, the access request is parsed to obtain the data to be detected;
    采用TF-IDF算法,对所述待检测数据进行特征提取,得到特征分词;Using the TF-IDF algorithm, perform feature extraction on the to-be-detected data to obtain feature word segmentation;
    使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果;Use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;
    若所述检测结果为存在异常,则确认所述待检测数据存在威胁,并对SQL访问请求进行拦截处理。If the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
  2. 如权利要求1所述的基于机器学习的SQL语句安全检测方法,其中,所述采用TF-IDF算法,对所述待检测数据进行特征提取,得到特征分词包括:8. The method for security detection of SQL sentences based on machine learning according to claim 1, wherein said adopting TF-IDF algorithm to extract features of said data to be detected to obtain feature segmentation comprises:
    通过单词组合的方式,对所述待检测数据来进行分词划分,得到初始分词;Perform word segmentation on the to-be-detected data by way of word combination to obtain initial word segmentation;
    统计所述初始分词在所述待检测语句中的比重,将所述比重作为所述初始分词的词频;Count the proportion of the initial word segmentation in the sentence to be detected, and use the proportion as the word frequency of the initial word segmentation;
    统计所述初始分词在预设语料库中的逆向文件频率IDF;Count the IDF of the reverse document frequency of the initial word segmentation in the preset corpus;
    计算所述初始分词的词频TF与所述逆向文件频率IDF的乘积,并根据所述乘积确定所述初始分词是否为重要特征,将属于重要特征的所述初始分词,确定为所述特征分词。Calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation belonging to the important feature as the characteristic word segmentation.
  3. 如权利要求1所述的基于机器学习的SQL语句安全检测方法,其中,所述使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果包括:5. The method for security detection of SQL sentences based on machine learning according to claim 1, wherein said using a hidden Markov model to perform text anomaly detection on said feature segmentation, and obtaining a detection result comprises:
    将所述特征分词转化为状态表示;Convert the characteristic word segmentation into a state representation;
    针对第i个状态,通过隐马尔可夫模型的观测序列,预测第i+1个状态的概率分布,将所述概率分布中,最大概率值对应的状态,作为第i+1个状态对应的预测状态,其中,i为正整数;For the i-th state, the probability distribution of the i+1-th state is predicted through the observation sequence of the hidden Markov model, and the state corresponding to the maximum probability value in the probability distribution is regarded as the state corresponding to the i+1-th state Forecast state, where i is a positive integer;
    若第i+1个状态与第i+1个状态对应的预测状态匹配,则确认检测结果为正常;If the i+1th state matches the predicted state corresponding to the i+1th state, the detection result is confirmed to be normal;
    若第i+1个状态与第i+1个状态对应预测状态不匹配,则确认检测结果为异常。If the i+1th state does not match the predicted state corresponding to the i+1th state, the detection result is confirmed to be abnormal.
  4. 如权利要求1所述的基于机器学习的SQL语句安全检测方法,其中,所述基于机器学习的SQL语句安全检测方法还包括:8. The method for security detection of SQL statements based on machine learning according to claim 1, wherein the method for security detection of SQL statements based on machine learning further comprises:
    从检测日志中,获取检测结果为异常的状态,对应的特征分词,作为参考分词;From the detection log, obtain the abnormal state of the detection result, and the corresponding feature word segmentation as a reference word segmentation;
    计算所述特征分词与所述参考分词的文本相似度,得到文本相似度值;Calculate the text similarity between the feature segmentation and the reference segmentation to obtain a text similarity value;
    若所述文本相似度值小于预设相似度阈值,则将所述特征分词作为异常分词,并确定所述检测结果为异常。If the text similarity value is less than the preset similarity threshold, the characteristic word segmentation is used as an abnormal word segmentation, and the detection result is determined to be abnormal.
  5. 如权利要求1所述的基于机器学习的SQL语句安全检测方法,其中,在所述使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果之后,并且,在所述若所述检测结果为存在异常,则确认所述待检测数据存在威胁之前,所述基于机器学习的SQL语句安全检测方法还包括:The method for security detection of SQL sentences based on machine learning according to claim 1, wherein after said using Hidden Markov Model to perform text anomaly detection on said feature word segmentation, and obtaining a detection result, and after said If the detection result is abnormal, before confirming that the data to be detected is threatened, the SQL statement security detection method based on machine learning further includes:
    若所述检测结果为异常,则基于SQL语句对所述待检测数据进行词法分词,得到待验 证分词;If the detection result is abnormal, perform lexical segmentation on the data to be detected based on the SQL sentence to obtain the testimony segmentation;
    使用预设的字符验证函数,通过字符串扫描的方式,对所述待验证分词进行敏感词汇验证,得到扫描结果;Use a preset character verification function to perform sensitive vocabulary verification on the word to be verified by means of string scanning, to obtain a scan result;
    若所述扫描结果为所述待验证分词中存在敏感词汇,则确认所述检测结果为存在异常。If the scanning result is that there are sensitive words in the word segmentation to be verified, it is confirmed that the detection result is abnormal.
  6. 如权利要求1所述的基于机器学习的SQL语句安全检测方法,其中,在所述使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果之后,还包括:5. The method for security detection of SQL sentences based on machine learning according to claim 1, wherein after said using Hidden Markov Model to perform text anomaly detection on said feature segmentation, and obtaining a detection result, the method further comprises:
    将所述特征分词和所述特征分词对应的检测结果存储至区块链中。The characteristic word segmentation and the detection result corresponding to the characteristic word segmentation are stored in the blockchain.
  7. 一种基于机器学习的SQL语句安全检测装置,其中,所述基于机器学习的SQL语句安全检测装置包括:A SQL statement security detection device based on machine learning, wherein the SQL statement security detection device based on machine learning includes:
    请求解析模块,用于在接收到SQL访问请求时,对所述访问请求进行解析,得到待检测数据;The request parsing module is used to parse the access request to obtain the data to be detected when the SQL access request is received;
    特征分词模块,用于采用TF-IDF算法,对所述待检测数据进行特征提取,得到特征分词;The feature word segmentation module is used to use the TF-IDF algorithm to perform feature extraction on the data to be detected to obtain feature word segmentation;
    异常检测模块,用于使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果;An anomaly detection module, configured to use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;
    请求拦截模块,用于若所述检测结果为存在异常,则确认所述待检测数据存在威胁,并对SQL访问请求进行拦截处理。The request interception module is configured to, if the detection result is abnormal, confirm that the data to be detected is threatened, and intercept the SQL access request.
  8. 如权利要求7所述的基于机器学习的SQL语句安全检测装置,其中,所述特征分词模块包括:8. The SQL sentence security detection device based on machine learning according to claim 7, wherein the characteristic word segmentation module comprises:
    数据分词单元,用于通过单词组合的方式,对所述待检测数据来进行分词划分,得到初始分词;The data word segmentation unit is used to perform word segmentation on the to-be-detected data by way of word combination to obtain initial word segmentation;
    词频统计单元,用于统计所述初始分词在所述待检测语句中的比重,将所述比重作为所述初始分词的词频;A word frequency counting unit, configured to count the proportion of the initial word segmentation in the sentence to be detected, and use the proportion as the word frequency of the initial word segmentation;
    频率统计单元,用于统计所述初始分词在预设语料库中的逆向文件频率IDF;A frequency statistics unit, used to count the reverse document frequency IDF of the initial word segmentation in the preset corpus;
    分词确定单元,用于计算所述初始分词的词频TF与所述逆向文件频率IDF的乘积,并根据所述乘积确定所述初始分词是否为重要特征,将属于重要特征的所述初始分词,确定为所述特征分词。The word segmentation determining unit is used to calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation that belongs to the important feature Participate the features.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, wherein the processor implements the following steps when the processor executes the computer-readable instructions:
    在接收到SQL访问请求时,对所述访问请求进行解析,得到待检测数据;When the SQL access request is received, the access request is parsed to obtain the data to be detected;
    采用TF-IDF算法,对所述待检测数据进行特征提取,得到特征分词;Using the TF-IDF algorithm, perform feature extraction on the to-be-detected data to obtain feature word segmentation;
    使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果;Use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;
    若所述检测结果为存在异常,则确认所述待检测数据存在威胁,并对SQL访问请求进行拦截处理。If the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
  10. 如权利要求9所述的计算机设备,其中,所述采用TF-IDF算法,对所述待检测数 据进行特征提取,得到特征分词包括:9. The computer device according to claim 9, wherein said using the TF-IDF algorithm to extract features from the data to be detected to obtain feature word segmentation comprises:
    通过单词组合的方式,对所述待检测数据来进行分词划分,得到初始分词;Perform word segmentation on the to-be-detected data by way of word combination to obtain initial word segmentation;
    统计所述初始分词在所述待检测语句中的比重,将所述比重作为所述初始分词的词频;Count the proportion of the initial word segmentation in the sentence to be detected, and use the proportion as the word frequency of the initial word segmentation;
    统计所述初始分词在预设语料库中的逆向文件频率IDF;Count the IDF of the reverse document frequency of the initial word segmentation in the preset corpus;
    计算所述初始分词的词频TF与所述逆向文件频率IDF的乘积,并根据所述乘积确定所述初始分词是否为重要特征,将属于重要特征的所述初始分词,确定为所述特征分词。Calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation belonging to the important feature as the characteristic word segmentation.
  11. 如权利要求9所述的计算机设备,其中,所述使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果包括:9. The computer device according to claim 9, wherein said using a hidden Markov model to perform text anomaly detection on said feature segmentation, and obtaining a detection result comprises:
    将所述特征分词转化为状态表示;Convert the characteristic word segmentation into a state representation;
    针对第i个状态,通过隐马尔可夫模型的观测序列,预测第i+1个状态的概率分布,将所述概率分布中,最大概率值对应的状态,作为第i+1个状态对应的预测状态,其中,i为正整数;For the i-th state, the probability distribution of the i+1-th state is predicted through the observation sequence of the hidden Markov model, and the state corresponding to the maximum probability value in the probability distribution is regarded as the state corresponding to the i+1-th state Forecast state, where i is a positive integer;
    若第i+1个状态与第i+1个状态对应的预测状态匹配,则确认检测结果为正常;If the i+1th state matches the predicted state corresponding to the i+1th state, the detection result is confirmed to be normal;
    若第i+1个状态与第i+1个状态对应预测状态不匹配,则确认检测结果为异常。If the i+1th state does not match the predicted state corresponding to the i+1th state, the detection result is confirmed to be abnormal.
  12. 如权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还实现如下步骤:9. The computer device of claim 9, wherein the processor further implements the following steps when executing the computer-readable instructions:
    从检测日志中,获取检测结果为异常的状态,对应的特征分词,作为参考分词;From the detection log, obtain the abnormal state of the detection result, and the corresponding feature word segmentation as a reference word segmentation;
    计算所述特征分词与所述参考分词的文本相似度,得到文本相似度值;Calculate the text similarity between the feature segmentation and the reference segmentation to obtain a text similarity value;
    若所述文本相似度值小于预设相似度阈值,则将所述特征分词作为异常分词,并确定所述检测结果为异常。If the text similarity value is less than the preset similarity threshold, the characteristic word segmentation is used as an abnormal word segmentation, and the detection result is determined to be abnormal.
  13. 如权利要求9所述的计算机设备,其中,在所述使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果之后,并且,在所述若所述检测结果为存在异常,则确认所述待检测数据存在威胁之前,所述处理器执行所述计算机可读指令时还实现如下步骤:9. The computer device according to claim 9, wherein, after said using a hidden Markov model to perform text anomaly detection on said feature segmentation, and obtaining a detection result, and, in said if said detection result is abnormal, Before confirming that the data to be detected is threatened, the processor further implements the following steps when executing the computer-readable instructions:
    若所述检测结果为异常,则基于SQL语句对所述待检测数据进行词法分词,得到待验证分词;If the detection result is abnormal, perform lexical segmentation on the data to be detected based on the SQL sentence to obtain the segmentation to be verified;
    使用预设的字符验证函数,通过字符串扫描的方式,对所述待验证分词进行敏感词汇验证,得到扫描结果;Use a preset character verification function to perform sensitive vocabulary verification on the word to be verified by means of string scanning, to obtain a scan result;
    若所述扫描结果为所述待验证分词中存在敏感词汇,则确认所述检测结果为存在异常。If the scanning result is that there are sensitive words in the word segmentation to be verified, it is confirmed that the detection result is abnormal.
  14. 如权利要求9所述的计算机设备,其中,在所述使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果之后,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 9, wherein, after the use of the hidden Markov model to perform text anomaly detection on the feature segmentation, and the detection result is obtained, the processor executes the computer readable instruction. The following steps:
    将所述特征分词和所述特征分词对应的检测结果存储至区块链中。The characteristic word segmentation and the detection result corresponding to the characteristic word segmentation are stored in the blockchain.
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现如下步骤:A computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, wherein, when the computer-readable instructions are executed by a processor, the following steps are implemented:
    在接收到SQL访问请求时,对所述访问请求进行解析,得到待检测数据;When the SQL access request is received, the access request is parsed to obtain the data to be detected;
    采用TF-IDF算法,对所述待检测数据进行特征提取,得到特征分词;Using the TF-IDF algorithm, perform feature extraction on the to-be-detected data to obtain feature word segmentation;
    使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果;Use a hidden Markov model to perform text anomaly detection on the feature segmentation to obtain a detection result;
    若所述检测结果为存在异常,则确认所述待检测数据存在威胁,并对SQL访问请求进行拦截处理。If the detection result is abnormal, it is confirmed that the data to be detected is threatened, and the SQL access request is intercepted.
  16. 如权利要求15所述的计算机可读存储介质,其中,所述采用TF-IDF算法,对所述待检测数据进行特征提取,得到特征分词包括:15. The computer-readable storage medium according to claim 15, wherein said using the TF-IDF algorithm to extract features of the data to be detected to obtain feature word segmentation comprises:
    通过单词组合的方式,对所述待检测数据来进行分词划分,得到初始分词;Perform word segmentation on the to-be-detected data by way of word combination to obtain initial word segmentation;
    统计所述初始分词在所述待检测语句中的比重,将所述比重作为所述初始分词的词频;Count the proportion of the initial word segmentation in the sentence to be detected, and use the proportion as the word frequency of the initial word segmentation;
    统计所述初始分词在预设语料库中的逆向文件频率IDF;Count the IDF of the reverse document frequency of the initial word segmentation in the preset corpus;
    计算所述初始分词的词频TF与所述逆向文件频率IDF的乘积,并根据所述乘积确定所述初始分词是否为重要特征,将属于重要特征的所述初始分词,确定为所述特征分词。Calculate the product of the word frequency TF of the initial word segmentation and the frequency IDF of the reverse document, and determine whether the initial word segmentation is an important feature according to the product, and determine the initial word segmentation belonging to the important feature as the characteristic word segmentation.
  17. 如权利要求15所述的计算机可读存储介质,其中,所述使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果包括:15. The computer-readable storage medium according to claim 15, wherein said using a hidden Markov model to perform text anomaly detection on said feature segmentation, and obtaining a detection result comprises:
    将所述特征分词转化为状态表示;Convert the characteristic word segmentation into a state representation;
    针对第i个状态,通过隐马尔可夫模型的观测序列,预测第i+1个状态的概率分布,将所述概率分布中,最大概率值对应的状态,作为第i+1个状态对应的预测状态,其中,i为正整数;For the i-th state, the probability distribution of the i+1-th state is predicted through the observation sequence of the hidden Markov model, and the state corresponding to the maximum probability value in the probability distribution is regarded as the state corresponding to the i+1-th state Forecast state, where i is a positive integer;
    若第i+1个状态与第i+1个状态对应的预测状态匹配,则确认检测结果为正常;If the i+1th state matches the predicted state corresponding to the i+1th state, the detection result is confirmed to be normal;
    若第i+1个状态与第i+1个状态对应预测状态不匹配,则确认检测结果为异常。If the i+1th state does not match the predicted state corresponding to the i+1th state, the detection result is confirmed to be abnormal.
  18. 如权利要求15所述的计算机可读存储介质,其中,所述处理器执行所述计算机可读指令时还实现如下步骤:15. The computer-readable storage medium of claim 15, wherein the processor further implements the following steps when executing the computer-readable instruction:
    从检测日志中,获取检测结果为异常的状态,对应的特征分词,作为参考分词;From the detection log, obtain the abnormal state of the detection result, and the corresponding feature word segmentation as a reference word segmentation;
    计算所述特征分词与所述参考分词的文本相似度,得到文本相似度值;Calculate the text similarity between the feature segmentation and the reference segmentation to obtain a text similarity value;
    若所述文本相似度值小于预设相似度阈值,则将所述特征分词作为异常分词,并确定所述检测结果为异常。If the text similarity value is less than the preset similarity threshold, the characteristic word segmentation is used as an abnormal word segmentation, and the detection result is determined to be abnormal.
  19. 如权利要求15所述的计算机可读存储介质,其中,在所述使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果之后,并且,在所述若所述检测结果为存在异常,则确认所述待检测数据存在威胁之前,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer-readable storage medium according to claim 15, wherein, after the use of the hidden Markov model to perform text anomaly detection on the feature segmentation, and the detection result is obtained, and after the detection result is If there is an abnormality, before confirming that the data to be detected is threatened, the processor further implements the following steps when executing the computer-readable instruction:
    若所述检测结果为异常,则基于SQL语句对所述待检测数据进行词法分词,得到待验证分词;If the detection result is abnormal, perform lexical segmentation on the data to be detected based on the SQL sentence to obtain the segmentation to be verified;
    使用预设的字符验证函数,通过字符串扫描的方式,对所述待验证分词进行敏感词汇验证,得到扫描结果;Use a preset character verification function to perform sensitive vocabulary verification on the word to be verified by means of string scanning, to obtain a scan result;
    若所述扫描结果为所述待验证分词中存在敏感词汇,则确认所述检测结果为存在异常。If the scanning result is that there are sensitive words in the word segmentation to be verified, it is confirmed that the detection result is abnormal.
  20. 如权利要求15所述的计算机可读存储介质,其中,在所述使用隐马尔可夫模型对所述特征分词进行文本异常检测,得到检测结果之后,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer-readable storage medium of claim 15, wherein, after the use of Hidden Markov Model to perform text anomaly detection on the feature segmentation, and the detection result is obtained, the processor executes the computer-readable instruction It also implements the following steps:
    将所述特征分词和所述特征分词对应的检测结果存储至区块链中。The characteristic word segmentation and the detection result corresponding to the characteristic word segmentation are stored in the blockchain.
PCT/CN2020/136341 2020-05-27 2020-12-15 Machine learning-based sql statement security testing method and apparatus, device, and medium WO2021135919A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010464009.3A CN111783132A (en) 2020-05-27 2020-05-27 SQL sentence security detection method, device, equipment and medium based on machine learning
CN202010464009.3 2020-05-27

Publications (1)

Publication Number Publication Date
WO2021135919A1 true WO2021135919A1 (en) 2021-07-08

Family

ID=72753399

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136341 WO2021135919A1 (en) 2020-05-27 2020-12-15 Machine learning-based sql statement security testing method and apparatus, device, and medium

Country Status (2)

Country Link
CN (1) CN111783132A (en)
WO (1) WO2021135919A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783132A (en) * 2020-05-27 2020-10-16 平安科技(深圳)有限公司 SQL sentence security detection method, device, equipment and medium based on machine learning
CN112395304B (en) * 2020-10-30 2024-01-02 迅鳐成都科技有限公司 Data security calculation method, system and storage medium based on data behavior simulation
CN112560021A (en) * 2020-11-26 2021-03-26 新华三技术有限公司合肥分公司 Attack detection method and attack detection model
CN112766236B (en) * 2021-03-10 2023-04-07 拉扎斯网络科技(上海)有限公司 Text generation method and device, computer equipment and computer readable storage medium
CN114095241A (en) * 2021-11-18 2022-02-25 中国电信股份有限公司 Detection method, detection device and computer-readable storage medium
CN116248412B (en) * 2023-04-27 2023-08-22 中国人民解放军总医院 Shared data resource abnormality detection method, system, equipment, memory and product

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101059805A (en) * 2007-03-29 2007-10-24 复旦大学 Network flow and delaminated knowledge library based dynamic file clustering method
CN107273465A (en) * 2017-06-05 2017-10-20 环球智达科技(北京)有限公司 SQL injection detection method
CN107392016A (en) * 2017-07-07 2017-11-24 四川大学 A kind of web data storehouse attack detecting system based on agency
CN108549814A (en) * 2018-03-24 2018-09-18 西安电子科技大学 A kind of SQL injection detection method based on machine learning, database security system
CN109194677A (en) * 2018-09-21 2019-01-11 郑州云海信息技术有限公司 A kind of SQL injection attack detection, device and equipment
CN109547423A (en) * 2018-11-09 2019-03-29 上海交通大学 A kind of WEB malicious requests depth detection system and method based on machine learning
CN111783132A (en) * 2020-05-27 2020-10-16 平安科技(深圳)有限公司 SQL sentence security detection method, device, equipment and medium based on machine learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108712453A (en) * 2018-08-30 2018-10-26 杭州安恒信息技术股份有限公司 Detection method for injection attack, device and the server of logic-based regression algorithm
CN109525567A (en) * 2018-11-01 2019-03-26 郑州云海信息技术有限公司 A kind of detection method and system for implementing parameter injection attacks for website

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101059805A (en) * 2007-03-29 2007-10-24 复旦大学 Network flow and delaminated knowledge library based dynamic file clustering method
CN107273465A (en) * 2017-06-05 2017-10-20 环球智达科技(北京)有限公司 SQL injection detection method
CN107392016A (en) * 2017-07-07 2017-11-24 四川大学 A kind of web data storehouse attack detecting system based on agency
CN108549814A (en) * 2018-03-24 2018-09-18 西安电子科技大学 A kind of SQL injection detection method based on machine learning, database security system
CN109194677A (en) * 2018-09-21 2019-01-11 郑州云海信息技术有限公司 A kind of SQL injection attack detection, device and equipment
CN109547423A (en) * 2018-11-09 2019-03-29 上海交通大学 A kind of WEB malicious requests depth detection system and method based on machine learning
CN111783132A (en) * 2020-05-27 2020-10-16 平安科技(深圳)有限公司 SQL sentence security detection method, device, equipment and medium based on machine learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨连群 等 (YANG, LIANQUN ET AL.): "基于隐马尔可夫模型的新型SQL注入攻击检测方法 (A New Detection Technique of SQL Injection Based on Hidden Markov Mode)", 信息网络安全 (NETINFO SECURITY), no. 9, 30 September 2017 (2017-09-30) *

Also Published As

Publication number Publication date
CN111783132A (en) 2020-10-16

Similar Documents

Publication Publication Date Title
WO2021135919A1 (en) Machine learning-based sql statement security testing method and apparatus, device, and medium
CN111897970B (en) Text comparison method, device, equipment and storage medium based on knowledge graph
CN110992169B (en) Risk assessment method, risk assessment device, server and storage medium
US20200389495A1 (en) Secure policy-controlled processing and auditing on regulated data sets
US10503908B1 (en) Vulnerability assessment based on machine inference
US20220309053A1 (en) Method and apparatus of auditing log, electronic device, and medium
CN108932426B (en) Unauthorized vulnerability detection method and device
CN105431859A (en) Signal tokens indicative of malware
US20210336987A1 (en) Method for Detecting Structured Query Language (SQL) Injection Based on Big Data Algorithm
WO2022227535A1 (en) Method and system for recognizing mining malicious software, and storage medium
WO2021196935A1 (en) Data checking method and apparatus, electronic device, and storage medium
CN111586695B (en) Short message identification method and related equipment
WO2020232902A1 (en) Abnormal object identification method and apparatus, computing device, and storage medium
CN110321707A (en) A kind of SQL injection detection method based on big data algorithm
CN115061874A (en) Log information verification method, device, equipment and medium
WO2022126962A1 (en) Knowledge graph-based method for detecting guiding and abetting corpus and related device
Hao et al. SCScan: A SVM-based scanning system for vulnerabilities in blockchain smart contracts
US20220321598A1 (en) Method of processing security information, device and storage medium
CN116561737A (en) Password validity detection method based on user behavior base line and related equipment thereof
CN113037555B (en) Risk event marking method, risk event marking device and electronic equipment
CN114301713A (en) Risk access detection model training method, risk access detection method and risk access detection device
CN111782967A (en) Information processing method, information processing device, electronic equipment and computer readable storage medium
CN113449350A (en) Management method, device, equipment and medium for USB outgoing sensitive information
Sun et al. Padetective: A systematic approach to automate detection of promotional attackers in mobile app store
Zhang et al. An automatic approach for scoring vulnerabilities in risk assessment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20909870

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20909870

Country of ref document: EP

Kind code of ref document: A1