WO2023093100A1 - 一种api网关异常调用识别的方法、装置、设备及产品 - Google Patents

一种api网关异常调用识别的方法、装置、设备及产品 Download PDF

Info

Publication number
WO2023093100A1
WO2023093100A1 PCT/CN2022/107910 CN2022107910W WO2023093100A1 WO 2023093100 A1 WO2023093100 A1 WO 2023093100A1 CN 2022107910 W CN2022107910 W CN 2022107910W WO 2023093100 A1 WO2023093100 A1 WO 2023093100A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
sample
abnormal
abnormal call
layer
Prior art date
Application number
PCT/CN2022/107910
Other languages
English (en)
French (fr)
Inventor
李尚锴
王凯
袁明明
Original Assignee
浪潮通信信息系统有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮通信信息系统有限公司 filed Critical 浪潮通信信息系统有限公司
Publication of WO2023093100A1 publication Critical patent/WO2023093100A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Definitions

  • the present application relates to the technical field of network security, and in particular to a method, device, device and product for identifying abnormal calls of API gateways.
  • API gateway call exception identification methods can be divided into two types. The first one is to identify traditional abnormal calls of performance indicators by setting rules, based on Business logic, setting the discovery rules for abnormal calls, and alarming for abnormalities through the set business logic; the second is to process historical operation logs based on business logic, cluster and divide the log content, and encode the logs based on Deep neural network for modeling processing.
  • API call exception identification is usually a multi-category problem, that is, there may be multiple reasons for abnormal calls, the detection efficiency and detection results of the above two methods cannot be guaranteed. Therefore, the need to improve the efficiency of API gateway abnormal call identification and detection efficiency and make the detection results more accurate is an important issue that needs to be solved urgently in the industry.
  • This application provides a method, device, device and product for API gateway abnormal call identification, which are used to solve the defects of low accuracy rate and inaccurate classification of abnormal call identification in the prior art, and realize real-time, fast and high-speed identification on the API gateway side. Accurate anomaly detection.
  • This application provides a method for identifying an API gateway abnormal call, including the following steps:
  • the identification result includes whether it is an abnormal call and the abnormal type of the abnormal call; the abnormal call identification model is trained based on sample log information, sample resource performance data, and sample Internet Protocol addresses.
  • the abnormal call identification model includes a feature extraction layer, a feature screening layer, a first classification layer, a branch pruning fitting layer, a second classification layer and an identification layer;
  • the feature extraction layer is used to extract the features of the sample log information and the sample resource performance data based on the sample IP address, obtain the first sequence and the second sequence, and based on the access time and number, extract the Combining the first sequence and the second sequence and multi-category labeling of abnormal access to obtain the first data set and the first feature field sequence; wherein the first feature field sequence is obtained based on the first sequence and the second sequence wherein the first sequence is obtained based on the sample log information and the sample IP address, and the second sequence is obtained based on the sample resource performance data and the sample IP address;
  • the feature screening layer is used to perform feature screening on the first sequence to obtain a third sequence, and generate a second data set and a second feature field sequence based on the third sequence;
  • the first classification layer is used to split the second data set and the second feature field sequence to obtain a plurality of tree classifiers and the first prediction results output by the tree classifiers;
  • the branch pruning fitting layer is used to fit the tree classifier exceeding the preset accuracy according to the accuracy of the tree classifier to obtain a first-level classifier after fitting;
  • the second classification layer is used to perform feature matching on the first layer classifier and the sample label to obtain the second layer classifier and the sample identification result output by the second layer classifier.
  • the feature screening layer specifically includes:
  • the first sequence is screened based on the XGBoost algorithm added to the structural risk items of the tree, and features in the first sequence are extracted according to preset subdimensions to obtain the third sequence.
  • the first classification layer specifically includes:
  • the log information, the resource performance data and the Internet protocol address are input into the abnormal call identification model, and the identification result output by the abnormal call identification model is obtained. , including the following steps:
  • the feature extraction layer Inputting the log information, the feature of the resource performance data and the IP address into the feature extraction layer to obtain a third data set, a third feature field sequence and a label output by the feature extraction layer; wherein, The third feature field sequence is obtained based on a fourth sequence and a fifth sequence, the fourth sequence is obtained based on the log information and the IP address, and the fifth sequence is obtained based on the resource performance data and the IP address obtained;
  • the fourth feature is input into the feature screening layer to obtain a fourth data set and a fourth feature field sequence output by the feature screening layer; wherein, the fourth data set and the fourth feature field sequence are all obtained by screening the generated sixth sequence based on the fourth feature;
  • the abnormal identification model is trained through the following steps:
  • the first feature field sequence and the corresponding sample label are used as input data for training, and the abnormal call recognition model used to generate the recognition result is obtained by using a machine learning training method.
  • the present application also provides a device for identifying abnormal API gateway calls, including:
  • the collection module is used to obtain the log information, resource performance data and Internet protocol address generated when invoking;
  • An identification module configured to input the log information, the resource performance data, and the IP address into an abnormal call identification model, and obtain an identification result output by the abnormal call identification model;
  • the identification result includes whether it is an abnormal call and the abnormal type of the abnormal call; the abnormal call identification model is trained based on sample log information, sample resource performance data, and sample Internet Protocol addresses.
  • the present application also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • an exception to the API gateway described in any of the above is realized. Steps to call the identified method.
  • the present application also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method for identifying an API gateway abnormal call described in any one of the above-mentioned methods are implemented.
  • the present application also provides a computer program product, including a computer program.
  • a computer program product including a computer program.
  • the steps of any one of the methods for identifying API gateway abnormal calls described above are implemented.
  • the method, device, equipment, and product for API gateway abnormal call identification provided by this application are based on the low-latency condition, aiming at the high-precision identification requirements of API gateway abnormal identification, by obtaining server performance fields and log information, and using it as a training
  • the input data of a good abnormal call recognition model, the abnormal call recognition model outputs the abnormal recognition results, can obtain more accurate abnormal recognition results, and realize real-time, fast and high-precision abnormal detection on the API gateway side.
  • Fig. 1 is a schematic flow diagram of the method for API gateway abnormal call identification provided by the present application
  • FIG. 2 is a schematic diagram of the application of the API gateway abnormal call identification method provided by the application
  • Fig. 3 is a schematic flow diagram of training an abnormal call recognition model in the API gateway abnormal call recognition method provided by the present application
  • Fig. 4 is a logical schematic diagram of training an abnormal call recognition model in the method for API gateway abnormal call recognition provided by the present application;
  • Fig. 5 is a logical schematic diagram when a high-fitting double-layer random forest model is established by the abnormal call identification model in the API gateway abnormal call identification method provided by the present application;
  • FIG. 6 is a schematic structural diagram of an API gateway abnormal call identification device provided by the present application.
  • Fig. 7 is a schematic structural diagram of training an abnormal call recognition model in an API gateway abnormal call recognition device provided by the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by the present application.
  • the method for identifying the API gateway abnormal call of the present application is described below in conjunction with FIG. 1, and the method includes the following steps:
  • the identification result includes whether it is an abnormal call and the type of the abnormal call.
  • the abnormal call recognition model is trained based on sample log information, sample resource performance data, and sample IP addresses.
  • the abnormal call identification model adopted in the API gateway abnormal call identification method of this application is a two-layer model constructed by constructing a random forest model and selecting a sub-forest with a high degree of fitting, on the premise of meeting the low latency of API abnormal call identification Model, and based on the XGBoost algorithm built before adding the structural risk items of the tree, the subtrees with high fitting degree are screened out to meet the high-precision identification requirements.
  • the log information, resource performance data and IP address collected in step S100 are used as the input data of the abnormal call identification model, and finally the output of the abnormal call identification model is the abnormal call identification field, that is, whether it is abnormal call, and its classification when it is an abnormal call, that is, the exception type when it is called abnormally.
  • the output of the abnormal call recognition model also includes the sample variable weight sequence in the modeling process.
  • API gateway abnormal call identification method of this application in view of the problems of low accuracy and inaccurate classification in the existing abnormal call identification scheme, when there is an API gateway call time, a two-layer classifier model is constructed , improve the classification accuracy, and solve the disadvantages of the low accuracy of the existing schemes by constructing a high-fitting sub-forest.
  • the abnormal call recognition model can be stored in the cloud platform.
  • Localized deployment is performed by linking to the API gateway.
  • the abnormal call identification model first extracts the server performance fields and log information, and deploys the online reasoning service of the algorithm model on the gateway side to meet the resolution of the thread pool. The need to solve the problems of low resource utilization and low throughput caused by exhaustion, shorten the thread release time on the gateway side, and solve the problem of resource preemption in the case of large concurrent calls.
  • the API gateway abnormal call identification method of the present application can realize rapid and accurate identification of API gateway abnormal call by constructing a high-fitting two-layer random forest model. Firstly, the characteristic field of the log information is screened by the improved XGBoost algorithm. Combined with relevant resource performance indicators, a new data set and feature set are generated, and then a high-fitting double-layer random forest model is used to establish an abnormality recognition model. Accurate inference service.
  • the API gateway abnormal call identification method of this application is based on low latency conditions, aiming at the high-precision identification requirements of API gateway abnormal identification, by obtaining server performance fields and log information, and using it as a trained abnormal call identification model
  • the input data of the abnormal call recognition model outputs the abnormal recognition results, which can obtain more accurate abnormal recognition results, and realize real-time, fast and high-precision abnormal detection on the API gateway side.
  • A100 Obtain sample log information, sample resource performance data, and sample IP address.
  • A200 Perform multi-category labeling of abnormal calls on sample log information, sample resource performance data, and sample IP addresses to obtain a first data set D, a first feature field sequence T, and a sample label R.
  • the first feature field sequence T is obtained based on the first sequence I and the second sequence J
  • the first sequence is obtained based on the sample log information and the sample IP address
  • sample label R refers to its classification when it is an abnormal call, that is, the exception type when it is called abnormally.
  • HTTP Hyper Text Transfer Protocol
  • both the sample log information and the log information include request path attributes, parameter attributes, character distribution characteristics, access time, and the like.
  • the abnormal call recognition model includes a feature extraction layer, a feature screening layer, a first classification layer, a branch shear fitting layer, a second classification layer and a recognition layer;
  • the feature extraction layer is used to extract the features of sample log information and sample resource performance data based on the sample Internet protocol address, obtain the first sequence and the second sequence, and combine the first sequence and the second sequence based on the access time and number, and
  • the multi-category labeling of abnormal access obtains the first data set and the first feature field sequence.
  • the first characteristic field sequence is obtained based on the first sequence and the second sequence
  • the first sequence is obtained based on sample log information and sample IP address
  • the second sequence is obtained based on sample resource performance data and sample IP address obtained;
  • the feature screening layer is used to perform feature screening on the first sequence to obtain a third sequence, and generate a second data set and a second feature field sequence based on the third sequence;
  • the first classification layer is used to split the second data set and the second feature field sequence to obtain a plurality of tree classifiers and the first prediction results output by the tree classifiers;
  • the branch pruning fitting layer is used to fit the tree classifier exceeding the preset accuracy according to the accuracy of the tree classifier to obtain the first-level classifier after fitting;
  • the second classification layer is used to perform feature matching on the first layer classifier and the sample label to obtain the second layer classifier and the sample recognition result output by the second layer classifier.
  • the feature screening layer specifically includes:
  • the XGBoost algorithm based on the structural risk items added to the tree screens the first sequence, and extracts the features in the first sequence according to the preset sub-dimensions to obtain the third sequence.
  • the first classification level specifically includes:
  • the second data set and the second feature field sequence are allocated to the segmentation nodes until the number of samples allocated to each segmentation node is within a preset value, and multiple tree classifiers are obtained.
  • step S200 specifically includes the following steps:
  • the third feature field sequence is based on the fourth sequence and the fifth sequence
  • the fourth sequence is obtained based on the log information and the IP address
  • the fifth sequence is obtained based on the resource performance data and the IP address
  • the fourth feature is input into the feature screening layer to obtain the fourth data set and the fourth feature field sequence output by the feature screening layer; wherein, the fourth data set and the fourth feature field sequence are generated based on the fourth feature screening. obtained by six sequences;
  • It is used for inputting the second prediction result and the label into the second classification layer to obtain the recognition result output by the second classification layer.
  • Step A100 also uses the resource performance data generated by the server when the API network management calls as the sample resource performance data, and then pulls the corresponding key performance indicator (Key Performance Indicator, KPI) performance indicator value when a single request of the sample resource performance data occurs
  • KPI Key Performance Indicator
  • the KPI performance index values and historical request information include access time time, historical request times reqCou, current second-level concurrent requests reqEru, single IP request times IPreq, and memory usage rateC.
  • sample label reorganize the current first sequence I and the second sequence J with the access time and number to form a new field req_id, and perform multi-category labeling of abnormal access to form the first data set D and the first feature
  • feature screening is performed based on the XGBoost algorithm added to the structural risk items of the tree. While setting the Gini index, in this method, the problem of multivariate screening under the same data structure is solved by improving the loss function of the XGBoost algorithm.
  • the tree structure risk item is added to the XGBoost algorithm, so that in the process of building the tree, the growth structure of the tree will be constrained and the overfitting problem will be reduced. In this way, the objective function XGBoost algorithm becomes:
  • y i is the actual value is the observed value
  • fk is the structural risk item.
  • the number T of leaf nodes and the weight w of nodes are used to represent the complexity of a tree.
  • a and b are hyperparameters
  • T is the number of leaf nodes
  • w is the weight of nodes
  • a is the first hyperparameter
  • b is the second hyperparameter.
  • the inherent interpretability of the decision tree used in the XGBoost algorithm reduces the complexity of algorithm calculations and improves the interpretability of the entire abnormal call identification model.
  • Interpretability is also an important part of credit evaluation, so it is very appropriate to use it to estimate the importance of each feature index. The higher the importance score, the more important the feature index is, and the greater the contribution of the feature index in the data set. big.
  • the API gateway abnormal call identification method of this application a two-layer classifier is constructed, and the random forest is pruned, and combined with the XGBoost algorithm that adds the structural risk item of the tree to filter variables, to realize the leaf structure Construct or automatically select forecasts for subsets of variables.
  • the input data in the API gateway abnormal call identification method of this application is log information, resource performance data and IP address
  • the input data of the abnormal call identification model during training is the second data set D1 and the sample label R, A third sequence I' representing the characteristics of the sample log information, and a second sequence J representing the characteristics of the sample resource performance data.
  • K training data sets are sampled with replacement from the original first data set D by using the bootstrap sampling technique, and the number of samples in each training data set is also N. Use these bootstrap samples to train a decision tree.
  • the first layer classifier uses the classification regression tree, at the node of the tree, randomly select m features (m ⁇ M) from the M input features as the split feature set of the current node of the decision tree, and select the most Optimize split features and split points, and distribute the training data set into two child nodes.
  • the criterion for selecting splitting features and splitting points is the Gini coefficient minimization criterion. Repeat the above division process until the stop condition is met, until the number of samples in the division node is less than the preset value.
  • the model classification is given and recorded as T(x).
  • T the classification result of the given model.
  • the precision (Precision, P) and recall (Recall, R) are used for comparison.
  • the improved abnormal call recognition model has significantly improved the existing single-layer random forest algorithm model in terms of precision and recall.
  • the improved abnormal call recognition model makes the recognition results more accurate, that is, for each call in the system, through the abnormal call recognition model and online reasoning, it can achieve higher accuracy and more accurate classification Recognition results (whether the call is abnormal and the correct classification of the abnormal call).
  • the following describes the device for identifying abnormal API gateway calls provided by this application.
  • the device for identifying abnormal API gateway calls described below and the method for identifying abnormal API gateway calls described above can be referred to in correspondence.
  • the device for identifying the API gateway abnormal call of the present application is described below in conjunction with FIG. 6, the device:
  • the collection module 100 is configured to obtain log information, resource performance data and IP addresses generated when the API network management calls.
  • the recognition module 200 is used to input log information, resource performance data and IP address into the trained abnormal call recognition model to obtain the recognition result output by the abnormal call recognition model.
  • the identification result includes whether it is an abnormal call and the type of the abnormal call.
  • the abnormal call recognition model is trained based on sample log information, sample resource performance data, and sample IP addresses.
  • the abnormal call identification model adopted in the API gateway abnormal call identification device of this application is a two-layer constructed by constructing a random forest model and selecting a sub-forest with a high degree of fitting, on the premise of meeting the low latency of API abnormal call identification Model, and based on the XGBoost algorithm built before adding the structural risk items of the tree, the subtrees with high fitting degree are screened out to meet the high-precision identification requirements.
  • the log information, resource performance data and IP address collected in step S100 are used as the input data of the abnormal call identification model, and finally the output of the abnormal call identification model is the abnormal call identification field, that is, whether it is abnormal call, and its classification when it is an abnormal call, that is, the exception type when it is called abnormally.
  • the output of the abnormal call identification model also includes the sample variable weight sequence in the modeling process.
  • the abnormal call recognition model can be stored in the cloud platform.
  • the device for identifying abnormal calls of API gateways of the present application can realize fast and accurate identification of abnormal calls of API gateways by constructing a double-layer random forest model with a high degree of fitting.
  • the characteristic field of the log information is screened by the improved XGBoost algorithm.
  • a new data set and feature set are generated, and then a high-fitting double-layer random forest model is used to establish an abnormality recognition model.
  • the API gateway abnormal call identification device of this application is based on the low-latency condition, aiming at the high-precision identification requirements of API gateway abnormal identification, by obtaining server performance fields and log information, and using it as a trained abnormal call identification model
  • the input data of the abnormal call recognition model outputs the abnormal recognition results, which can obtain more accurate abnormal recognition results, and realize real-time, fast and high-precision abnormal detection on the API gateway side.
  • the first training module 300 is configured to acquire sample log information, sample resource performance data, and sample IP addresses.
  • the second training module 400 is used for multi-category labeling of sample log information, sample resource performance data, and sample IP addresses to obtain the first data set D, the first feature field sequence T, and the sample label R.
  • the first feature field sequence T is obtained based on the first sequence I and the second sequence J
  • the first sequence is obtained based on the sample log information and the sample IP address
  • sample label R refers to its classification when it is an abnormal call, that is, the exception type when it is called abnormally.
  • the third training module 500 is configured to use the first feature field sequence T and the corresponding sample label R as input data for training, and adopt a machine learning training method to obtain an abnormal call recognition model for generating recognition results.
  • FIG. 8 illustrates a schematic diagram of the physical structure of an electronic device.
  • the electronic device may include: a processor (processor) 810, a communication interface (Communications Interface) 820, a memory (memory) 830, and a communication bus 840, Wherein, the processor 810 , the communication interface 820 , and the memory 830 communicate with each other through the communication bus 840 .
  • the processor 810 can call the logic instructions in the memory 830 to execute the method for identifying abnormal calls of the API gateway, and the method includes the following steps:
  • the log information, the resource performance data and the IP address are input into the abnormal call recognition model to obtain the recognition result output by the abnormal call recognition model;
  • the identification result includes whether it is an abnormal call and the abnormal type of the abnormal call; the abnormal call identification model is trained based on sample log information, sample resource performance data, and sample Internet Protocol addresses.
  • the above logic instructions in the memory 830 may be implemented in the form of software functional units and when sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
  • the present application also provides a computer program product, the computer program product includes a computer program, the computer program can be stored on a non-transitory computer-readable storage medium, and when the computer program is executed by a processor, the computer can Executing the API gateway abnormal call identification method provided by the above methods, the method includes the following steps:
  • the identification result includes whether it is an abnormal call and the abnormal type of the abnormal call; the abnormal call identification model is trained based on sample log information, sample resource performance data, and sample Internet Protocol addresses.
  • the present application also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it is implemented to perform the API gateway abnormal call identification method provided by the above methods,
  • the method includes the following steps:
  • the log information, the resource performance data and the IP address are input into the abnormal call recognition model to obtain the recognition result output by the abnormal call recognition model;
  • the identification result includes whether it is an abnormal call and the abnormal type of the abnormal call; the abnormal call identification model is trained based on sample log information, sample resource performance data, and sample Internet Protocol addresses.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.
  • each implementation can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware.
  • the essence of the above technical solution or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic discs, optical discs, etc., including several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in various embodiments or some parts of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请提供一种API网关异常调用识别的方法、装置、设备及产品,涉及网络安全技术领域,该方法包括以下步骤:获取调用时产生的日志信息、资源性能数据以及网际协议地址;将所述日志信息、所述资源性能数据以及所述网际协议地址输入异常调用识别模型中,得到所述异常调用识别模型输出的识别结果;其中,所述识别结果包括是否为异常调用以及异常调用时的异常类型;所述异常调用识别模型是基于样本日志信息、样本资源性能数据以及样本网际协议地址训练得到的。

Description

一种API网关异常调用识别的方法、装置、设备及产品
相关申请的交叉引用
本申请要求于2021年11月26日提交的申请号为202111423922X,发明名称为“一种API网关异常调用识别的方法、装置、设备及产品”的中国专利申请的优先权,其通过引用方式全部并入本申请。
技术领域
本申请涉及网络安全技术领域,尤其涉及一种API网关异常调用识别的方法、装置、设备及产品。
背景技术
异常行为分析即入侵检测,是一种新的网络安全机制,用于检测及防止非法用户对网络的未授权访问。目前针对应用程序接口(Application Programming Interface,API)网关调用的异常识别即API网关调用异常识别的方法可以分为两种,第一种是针对性能指标传统异常调用通过设置规则的方式进行识别,基于业务逻辑,设置异常调用的发现规则,并通过设置的业务逻辑针对异常进行报警;第二种是基于业务逻辑对历史运行日志进行处理,将日志内容做聚类划分,并对编码后的日志基于深度神经网络做建模处理。
但是由于API调用异常识别通常为多分类问题,即异常调用会有多种原因,上述两种方式的检测效率和检测结果都无法得到保证。因此,提升API网关异常调用识别检测效率的效率并使得检测结果更加精确的需求是目前业界亟待解决的重要课题。
发明内容
本申请提供一种API网关异常调用识别的方法、装置、设备及产品,用 以解决现有技术中异常调用识别准确率低以及分类不准确的缺陷,实现在API网关侧实现实时、快速、高精度的异常检测。
本申请提供一种API网关异常调用识别的方法,包括以下步骤:
获取调用时产生的日志信息、资源性能数据以及网际协议地址;
将所述日志信息、所述资源性能数据以及所述网际协议地址输入异常调用识别模型中,得到所述异常调用识别模型输出的识别结果;
其中,所述识别结果包括是否为异常调用以及异常调用时的异常类型;所述异常调用识别模型是基于样本日志信息、样本资源性能数据以及样本网际协议地址训练得到的。
根据本申请提供的API网关异常调用识别的方法,所述异常调用识别模型包括特征抽取层、特征筛选层、第一分类层、枝剪拟合层、第二分类层和识别层;
所述特征抽取层用于基于所述样本网际协议地址,抽取所述样本日志信息和所述样本资源性能数据的特征,得到第一序列和第二序列,以及基于访问时间和编号,对所述第一序列和所述第二序列进行组合以及异常访问的多分类标注,得到第一数据集以及第一特征字段序列;其中,所述第一特征字段序列是基于第一序列和第二序列得到的,所述第一序列是基于所述样本日志信息和所述样本网际协议地址得到的,所述第二序列是基于所述样本资源性能数据和所述样本网际协议地址得到的;
所述特征筛选层用于对所述第一序列进行特征筛选,得到第三序列,并基于所述第三序列,生成第二数据集以及第二特征字段序列;
所述第一分类层用于对所述第二数据集以及第二特征字段序列进行分割处理,得到多个树分类器以及所述树分类器输出的第一预测结果;
所述枝剪拟合层用于根据所述树分类器的精度,将超过预设精度的所述树分类器进行拟合,得到拟合后的第一层分类器;
所述第二分类层用于对所述第一层分类器以及所述样本标签进行特征匹配,得到第二层分类器以及所述第二层分类器输出的样本识别结果。
根据本申请提供的API网关异常调用识别的方法,所述特征筛选层具体 包括:
基于加入树的结构风险项的XGBoost算法对所述第一序列进行筛选,按照预设分维度,提取所述第一序列中的特征,得到所述第三序列。
根据本申请提供的API网关异常调用识别的方法,所述第一分类层具体包括:
确定切分点,并基于所述切分点确定切分节点;
将所述第二数据集以及所述第二特征字段序列分配到所述切分节点中,直至每个切分节点被分配的样本数在预设值内,得到多个树分类器。
根据本申请提供的API网关异常调用识别的方法,所述将所述日志信息、所述资源性能数据以及所述网际协议地址输入异常调用识别模型中,得到所述异常调用识别模型输出的识别结果,具体包括以下步骤:
将所述日志信息、所述资源性能数据的特征和所述网际协议地址输入至所述特征抽取层,得到所述特征抽取层输出的第三数据集、第三特征字段序列以及标签;其中,所述第三特征字段序列是基于第四序列和第五序列得到的,所述第四序列是基于所述日志信息和所述网际协议地址得到的,所述第五序列是基于所述资源性能数据和所述网际协议地址得到的;
将所述第四特征输入至所述特征筛选层中,得到所述特征筛选层输出的第四数据集以及第四特征字段序列;其中,所述第四数据集和所述第四特征字段序列均是基于所述第四特征筛选生成的第六序列得到的;
将所述第四数据集和所述第四特征字段序列输入至所述第一分类层中,得到所述第一分类层输出的第二预测结果;
用于将所述第二预测结果和所述标签输入至所述第二分类层中,得到所述第二分类层输出的所述识别结果。
根据本申请提供的API网关异常调用识别的方法,所述异常识别模型通过以下步骤训练得到:
获取所述样本日志信息、所述样本资源性能数据以及所述样本网际协议地址。
对所述样本日志信息、所述样本资源性能数据以及所述样本网际协议地 址进行异常调用的多分类标注,得到所述第一数据集、所述第一特征字段序列以及所述样本标签;
将所述第一特征字段序列以及对应的所述样本标签作为训练使用的输入数据,采用机器学习的训练方式,得到用于生成所述识别结果的所述异常调用识别模型。
本申请还提供一种API网关异常调用识别的装置,包括:
采集模块,用于获取调用时产生的日志信息、资源性能数据以及网际协议地址;
识别模块,用于将所述日志信息、所述资源性能数据以及所述网际协议地址输入异常调用识别模型中,得到所述异常调用识别模型输出的识别结果;
其中,所述识别结果包括是否为异常调用以及异常调用时的异常类型;所述异常调用识别模型是基于样本日志信息、样本资源性能数据以及样本网际协议地址训练得到的。
本申请还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述API网关异常调用识别的方法的步骤。
本申请还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述API网关异常调用识别的方法的步骤。
本申请还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上述任一种所述API网关异常调用识别的方法的步骤。
本申请提供的API网关异常调用识别的方法、装置、设备及产品,基于低时延的条件下,针对API网关异常识别的高精度识别的要求,通过获取服务器性能字段及日志信息,并作为训练好的异常调用识别模型的输入数据,由异常调用识别模型输出异常识别结果,能够得到更为精准的异常识别结果,实现在API网关侧实现实时、快速、高精度的异常检测。
附图说明
为了更清楚地说明本申请或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请提供的API网关异常调用识别的方法的流程示意图;
图2是本申请提供的API网关异常调用识别的方法应用时的示意图;
图3是本申请提供的API网关异常调用识别的方法中训练异常调用识别模型的流程示意图;
图4是本申请提供的API网关异常调用识别的方法中训练异常调用识别模型的逻辑示意图;
图5是本申请提供的API网关异常调用识别的方法中异常调用识别模型建立高拟合度的双层随机森林模型时的逻辑示意图;
图6是本申请提供的API网关异常调用识别的装置的结构示意图;
图7是本申请提供的API网关异常调用识别的装置中训练异常调用识别模型的结构示意图;
图8是本申请提供的电子设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
下面结合图1描述本申请的API网关异常调用识别的方法,该方法包括以下步骤:
S100、获取API网管调用时产生的日志信息、资源性能数据以及网际互连协议(Internet Protocol,IP)地址。
S200、将日志信息、资源性能数据以及IP地址输入训练好的异常调用识 别模型中,得到异常调用识别模型输出的识别结果。
在本实施例中,识别结果包括是否为异常调用以及异常调用时的异常类型。
在本实施例中,异常调用识别模型是基于样本日志信息、样本资源性能数据以及样本IP地址训练得到的。
针对租户私有业务系统的数据服务调用需求,其中异常访问、异常调用类型较多,具体的可以分为用户侧访问异常和系统内调用异常,通过厘清网关侧KPI并检测,辅以机器学习建模的做法,可以高效且准确的完成异常检测。
本申请的API网关异常调用识别的方法中采用的异常调用识别模型是通过构建随机森林模型并选取高拟合度的子森林,在满足API异常调用识别低时延的前提下,构建的双层模型,并基于之前构建的加入树的结构风险项的XGBoost算法筛选出高拟合度的子树,达到高精度的识别要求。
得到训练好的异常调用识别模型之后,将步骤S100采集得到的日志信息、资源性能数据以及IP地址作为异常调用识别模型的输入数据,最终异常调用识别模型输出的是异常调用识别字段即是否为异常调用,以及当为异常调用时其分类即异常调用时的异常类型。在该方法中,异常调用识别模型输出的还包括建模过程中的样本变量权重序列。
在本申请的API网关异常调用识别的方法中,针对现有的异常调用识别方案中存在的准确率低、分类不准确的问题,当有API网关调用时间发生时,通过构造两层分类器模型,提升分类准确度,通过构造高拟合的子森林,解决现有方案的准确率低的弊病。
需要说明的是,在训练好异常调用识别模型后,可以将异常调用识别模型存储在云平台中。
请参阅图2,通过链接API网关进行本地化部署,当调用发生时,异常调用识别模型首先对服务器性能字段及日志信息进行抽取,并在网关侧部署算法模型的在线推理服务,满足解决线程池耗尽导致资源利用率低、吞吐量低等问题的解决需求,缩短网关侧线程释放时间,解决大并发调用情况下的资 源抢占问题。
综上,本申请的API网关异常调用识别的方法通过构造高拟合度的双层随机森林模型,能够实现API网关的异常调用快速准确识别。首先通过改进的XGBoost算法对日志信息的特征字段进行特征筛选。并结合相关的资源性能指标,产生新的数据集及特征集,继而使用高拟合度的双层随机森林模型建立异常识别模型,最终通过改进后的机器学习模型实现异常调用的在线快速、高准确的推理服务。
本申请的API网关异常调用识别的方法,在基于低时延的条件下,针对API网关异常识别的高精度识别的要求,通过获取服务器性能字段及日志信息,并作为训练好的异常调用识别模型的输入数据,由异常调用识别模型输出异常识别结果,能够得到更为精准的异常识别结果,实现在API网关侧实现实时、快速、高精度的异常检测。
下面结合图3描述本申请的API网关异常调用识别的方法,在该方法中,异常调用识别模型是通过以下步骤训练得到的:
A100、获取样本日志信息、样本资源性能数据以及样本IP地址。
A200、对样本日志信息、样本资源性能数据以及样本IP地址进行异常调用的多分类标注,得到第一数据集D、第一特征字段序列T以及样本标签R。在该方法中,第一特征字段序列T是基于第一序列I和第二序列J得到的,第一序列是基于样本日志信息和样本IP地址得到的,第二序列J是基于样本资源性能数据和样本IP地址得到的,具体的,T={I,J}。
需要说明的是,样本标签R指的是当为异常调用时其分类即异常调用时的异常类型。
A300、将第一特征字段序列T以及对应的样本标签R作为训练使用的输入数据,采用机器学习的训练方式,得到用于生成识别结果的异常调用识别模型。
步骤A100中会获取API网关调用时产生的原始的日志信息,作为样本日志信息,并在获取到原始的日志信息后进行解析,获取关于单条超文本传输协议(Hyper Text Transfer Protocol,HTTP)请求的字段。之后,拉取单条样 本日志信息所有特征后根据用户IP生成特异性字段userIp,记为第一序列I={x 1,x 2,...,x n}。
在本实施例中,样本日志信息以及日志信息中均包括请求路径属性、参数属性、字符分布特征、访问时间等。
异常调用识别模型包括特征抽取层、特征筛选层、第一分类层、枝剪拟合层、第二分类层和识别层;
特征抽取层用于基于样本网际协议地址,抽取样本日志信息和样本资源性能数据的特征,得到第一序列和第二序列,以及基于访问时间和编号,对第一序列和第二序列进行组合以及异常访问的多分类标注,得到第一数据集以及第一特征字段序列。该方法中,第一特征字段序列是基于第一序列和第二序列得到的,第一序列是基于样本日志信息和样本网际协议地址得到的,第二序列是基于样本资源性能数据和样本网际协议地址得到的;
特征筛选层用于对第一序列进行特征筛选,得到第三序列,并基于第三序列,生成第二数据集以及第二特征字段序列;
第一分类层用于对第二数据集以及第二特征字段序列进行分割处理,得到多个树分类器以及树分类器输出的第一预测结果;
枝剪拟合层用于根据树分类器的精度,将超过预设精度的树分类器进行拟合,得到拟合后的第一层分类器;
第二分类层用于对第一层分类器以及样本标签进行特征匹配,得到第二层分类器以及第二层分类器输出的样本识别结果。
特征筛选层具体包括:
基于加入树的结构风险项的XGBoost算法对第一序列进行筛选,按照预设分维度,提取第一序列中的特征,得到第三序列。
第一分类层具体包括:
确定切分点,并基于切分点确定切分节点;
将第二数据集以及第二特征字段序列分配到切分节点中,直至每个切分节点被分配的样本数在预设值内,得到多个树分类器。
因此步骤S200具体包括以下步骤:
将日志信息、资源性能数据的特征和网际协议地址输入至特征抽取层,得到特征抽取层输出的第三数据集、第三特征字段序列以及标签;其中,第三特征字段序列是基于第四序列和第五序列得到的,第四序列是基于日志信息和网际协议地址得到的,第五序列是基于资源性能数据和网际协议地址得到的;
将第四特征输入至特征筛选层中,得到特征筛选层输出的第四数据集以及第四特征字段序列;其中,第四数据集和第四特征字段序列均是基于第四特征筛选生成的第六序列得到的;
将第四数据集和第四特征字段序列输入至第一分类层中,得到第一分类层输出的第二预测结果;
用于将第二预测结果和标签输入至第二分类层中,得到第二分类层输出的识别结果。
步骤A100还会根据API网管调用时服务器所产生的资源性能数据,作为样本资源性能数据,之后拉取样本资源性能数据单次请求发生时对应的关键绩效指标(Key Performance Indicator,KPI)性能指标数值与对应IP地址的用户的历史请求信息(样本资源性能数据的特征),记为第二序列J={y 1,y 2,...,y n}。
在本实施例中,KPI性能指标数值和历史请求信息包括访问时间time、历史请求次数reqCou、当前秒级请求并发reqEru、单IP请求次数IPreq和内存占用rateC等。
通过样本标签的业务逻辑判断,对当前第一序列I和第二序列J以访问时间和编号进行重组形成新字段req_id,并进行异常访问的多分类标注,形成第一数据集D以及第一特征字段序列T,样本标签记为R={R 1,R 2,...,R n},其中R n为第n个样本的标签。
在本申请的API网关异常调用识别的方法中,基于加入树的结构风险项的XGBoost算法进行特征筛选。在设置Gini指数的同时,在该方法中通过改进XGBoost算法的损失函数,解决同一数据结构下多变量筛选的问题。在本实施例中,是通过在XGBoost算法中加入树的结构风险项,这样在构建树的 过程,会约束树的生长结构,减少过拟合问题。这样一来,目标函数XGBoost算法就变成:
Figure PCTCN2022107910-appb-000001
其中,y i为实际值
Figure PCTCN2022107910-appb-000002
为观测值,fk为结构风险项。
在本申请的API网关异常调用识别的方法中使用叶子节点个数T与节点的权重w表示一棵树的复杂度,
Figure PCTCN2022107910-appb-000003
其中,a、b为超参数,T为叶子节点个数,w为节点的权重,并且a为第一超参数,b为第二超参数。如此以来加入结构风险项的XGBoost算法的目标函数就变成:
Figure PCTCN2022107910-appb-000004
针对同一结构下的日志信息来说,按照特征下按特征值大小对样本排序,然后从左往右依次选择分割点,计算该分割点下的损失差值,找到损失差值最大时对应的特征与分割点,并以此为当前节点进行分裂,最终便能够得到各变量的重要度排序。
在本实施例中,按照预设分维度,例如前80%分位度对特征进行提取,至此第一序列I={x 1,x 2,...,x n},成为第三序列I′={x 1,x 2,...,x 80%*n}。
与神经网络的参数权重可解释程度低不同,XGBoost算法所用决策树内在的可解释性降低了算法计算的复杂度,提升了整个异常调用识别模型的可解释性。可解释性也是信用评估的一个重要组成部分,因此将其用于对各个特征指标的重要性进行估计十分合适,重要性分数越高则该特征指标越重要,该特征指标在数据集中的贡献越大。
因此,在本申请的API网关异常调用识别的方法中通过构建加入树的结构风险项的XGBoost算法的目标函数,当每棵树的分裂次数总和越大,特征越优。
通过得到的第三序列I′={x 1,x 2,...,x 80%*n},重新构建第一特征字段序列T,得到第二特征字段序列T1及对应的第二数据集D1,第二特征字段序列T1为T1={I′,J}。
请参阅图4,具体的,在本申请的API网关异常调用识别的方法中,构造双层分类器,并剪枝随机森林,并结合加入树的结构风险项的XGBoost算法筛选变量,实现叶子结构构造或自动选择变量子集的预测。在异常调用场景中,通常关注的是日志情况和资源使用情况。因此,本申请的API网关异常调用识别的方法中输入数据采用的是日志信息、资源性能数据以及IP地址,而异常调用识别模型在训练时的输入数据为第二数据集D1、样本标签R,表示样本日志信息特征的第三序列I′,以及表示样本资源性能数据特征的第二序列J。
请参阅图5,在构建决策树之前,使用自助法抽样技术从原始的第一数据集D中有放回地抽取K个训练数据集,每个训练数据集的样本数也为N。使用这些bootstrap样本来训练决策树。
之后,构造第一层分类器,用分类回归树,在树的结点处,从M个输入特征中随机选择m个特征(m<M)作为决策树当前节点的分裂特征集,从中选择最优分裂特征和切分点,将训练数据集分配到两个子节点中去。选择分裂特征及切分点的标准是Gini系数最小化准则。重复上述划分过程,直到满足停止条件,直到切分节点中的样本数小于预设值。
将K个bootstrap样本集按照上述方式训练决策树模型,把所有生成的决策树组合成一个随机森林模型即树分类器,将测试数据集X输入模型,得到对应的分类结果序列T={T(x) i},其中i=1,2,…,n。
在本实施例中,查全率(R)=被正确分类样本数/应当被正确分类的样本数;查准率(P)=被正确分类样本数/被分类样本总数。
此时,F1=2×P×R/(P+R)。
对单颗子树来说,求解F1指标的评估精度,根据该值对决策树进行排序,按照预设精度(预设F1值),舍弃掉一部分F1值较低的树,保留部分精度较高的树组成子森林,得到第一层分类器。
接下来,构造第二层分类器,选取第一层分类器的概率前N的异常类别,当真实的标签等于预测的标签时,第二层分类器返回模型结算标志i=1;当真实的标签不等于预测的标签时,增加结构化风险项
Figure PCTCN2022107910-appb-000005
的值,重新进行损 失函数拟合,确定叶子节点权重,并得到新的分类结论,直至真实标签等于预测标签。
至此,高拟合度的随机森林模型的两层分类器构造完毕。
基于已建立并训练好的异常调用识别模型f DEF(x),针对数据样本x,给出模型分类记为T(x)。作为模型比对,选择使用现有的单层随机森林模型f SRF(x)。针对同一数据样本x,给出的模型分类结果记为T 2(x),针对分类结果,使用查准率(Precision,P)及查全率(Recall,R)进行对比。
现有单层随机森林算法模型与本申请的API网关异常调用识别的方法中采用的双层模型异常调用识别模型的效果如表1所示:
表1单层随机森林算法模型与异常调用识别模型识别精准度的对比表
Figure PCTCN2022107910-appb-000006
通过表1可以看出改进的异常调用识别模型在查准率层面上以及在查全率层面上均对现有的单层随机森林算法模型有显著提升,即异常调用识别模型在确保能够找出足够多异常调用的前提下,改进的异常调用识别模型使得识别结果更加的精确,即对于系统中的每一次调用,通过异常调用识别模型,经过在线推理,能够准确率更高以及分类更加准确的识别结果(调用是否异常及异常调用时的正确分类)。
下面对本申请提供的API网关异常调用识别的装置进行描述,下文描述的API网关异常调用识别的装置与上文描述的API网关异常调用识别的方法可相互对应参照。
下面结合图6描述本申请的API网关异常调用识别的装置,该装置:
采集模块100,用于获取API网管调用时产生的日志信息、资源性能数据以及IP地址。
识别模块200,用于将日志信息、资源性能数据以及IP地址输入训练好 的异常调用识别模型中,得到异常调用识别模型输出的识别结果。
在本实施例中,识别结果包括是否为异常调用以及异常调用时的异常类型。
在本实施例中,异常调用识别模型是基于样本日志信息、样本资源性能数据以及样本IP地址训练得到的。
针对租户私有业务系统的数据服务调用需求,其中异常访问、异常调用类型较多,具体的可以分为用户侧访问异常和系统内调用异常,通过厘清网关侧KPI并检测,辅以机器学习建模的做法,可以高效且准确的完成异常检测。
本申请的API网关异常调用识别的装置中采用的异常调用识别模型是通过构建随机森林模型并选取高拟合度的子森林,在满足API异常调用识别低时延的前提下,构建的双层模型,并基于之前构建的加入树的结构风险项的XGBoost算法筛选出高拟合度的子树,达到高精度的识别要求。
得到训练好的异常调用识别模型之后,将步骤S100采集得到的日志信息、资源性能数据以及IP地址作为异常调用识别模型的输入数据,最终异常调用识别模型输出的是异常调用识别字段即是否为异常调用,以及当为异常调用时其分类即异常调用时的异常类型。在该装置中,异常调用识别模型输出的还包括建模过程中的样本变量权重序列。
在本申请的API网关异常调用识别的装置中,针对现有的异常调用识别方案中存在的准确率低、分类不准确的问题,当有API网关调用时间发生时,通过构造两层分类器模型,提升分类准确度,通过构造高拟合的子森林,解决现有方案的准确率低的弊病。
需要说明的是,在训练好异常调用识别模型后,可以将异常调用识别模型存储在云平台中。
综上,本申请的API网关异常调用识别的装置通过构造高拟合度的双层随机森林模型,能够实现API网关的异常调用快速准确识别。首先通过改进的XGBoost算法对日志信息的特征字段进行特征筛选。并结合相关的资源性能指标,产生新的数据集及特征集,继而使用高拟合度的双层随机森林模型 建立异常识别模型,最终通过改进后的机器学习模型实现异常调用的在线快速、高准确的推理服务。
本申请的API网关异常调用识别的装置,在基于低时延的条件下,针对API网关异常识别的高精度识别的要求,通过获取服务器性能字段及日志信息,并作为训练好的异常调用识别模型的输入数据,由异常调用识别模型输出异常识别结果,能够得到更为精准的异常识别结果,实现在API网关侧实现实时、快速、高精度的异常检测。
下面结合图7描述本申请的API网关异常调用识别的装置,在该装置中,异常调用识别模型是通过以下模块训练得到的:
第一训练模块300,用于获取样本日志信息、样本资源性能数据以及样本IP地址。
第二训练模块400,用于对样本日志信息、样本资源性能数据以及样本IP地址进行异常调用的多分类标注,得到第一数据集D、第一特征字段序列T以及样本标签R。在该装置中,第一特征字段序列T是基于第一序列I和第二序列J得到的,第一序列是基于样本日志信息和样本IP地址得到的,第二序列J是基于样本资源性能数据和样本IP地址得到的,具体的,T={I,J}。
需要说明的是,样本标签R指的是当为异常调用时其分类即异常调用时的异常类型。
第三训练模块500,用于将第一特征字段序列T以及对应的样本标签R作为训练使用的输入数据,采用机器学习的训练方式,得到用于生成识别结果的异常调用识别模型。
图8示例了一种电子设备的实体结构示意图,如图8所示,该电子设备可以包括:处理器(processor)810、通信接口(Communications Interface)820、存储器(memory)830和通信总线840,其中,处理器810,通信接口820,存储器830通过通信总线840完成相互间的通信。处理器810可以调用存储器830中的逻辑指令,以执行API网关异常调用识别的方法,该方法包括以下步骤:
获取调用时产生的日志信息、资源性能数据以及网际协议地址;
将所述日志信息、所述资源性能数据以及所述网际协议地址输入异常调 用识别模型中,得到所述异常调用识别模型输出的识别结果;
其中,所述识别结果包括是否为异常调用以及异常调用时的异常类型;所述异常调用识别模型是基于样本日志信息、样本资源性能数据以及样本网际协议地址训练得到的。
此外,上述的存储器830中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
另一方面,本申请还提供一种计算机程序产品,所述计算机程序产品包括计算机程序,计算机程序可存储在非暂态计算机可读存储介质上,所述计算机程序被处理器执行时,计算机能够执行上述各方法所提供的API网关异常调用识别的方法,该方法包括以下步骤:
获取调用时产生的日志信息、资源性能数据以及网际协议地址;
将所述日志信息、所述资源性能数据以及所述网际协议地址输入异常调用识别模型中,得到所述异常调用识别模型输出的识别结果;
其中,所述识别结果包括是否为异常调用以及异常调用时的异常类型;所述异常调用识别模型是基于样本日志信息、样本资源性能数据以及样本网际协议地址训练得到的。
又一方面,本申请还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各方法提供的API网关异常调用识别的方法,该方法包括以下步骤:
获取调用时产生的日志信息、资源性能数据以及网际协议地址;
将所述日志信息、所述资源性能数据以及所述网际协议地址输入异常调 用识别模型中,得到所述异常调用识别模型输出的识别结果;
其中,所述识别结果包括是否为异常调用以及异常调用时的异常类型;所述异常调用识别模型是基于样本日志信息、样本资源性能数据以及样本网际协议地址训练得到的。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (10)

  1. 一种API网关异常调用识别的方法,包括以下步骤:
    获取调用时产生的日志信息、资源性能数据以及网际协议地址;
    将所述日志信息、所述资源性能数据以及所述网际协议地址输入异常调用识别模型中,得到所述异常调用识别模型输出的识别结果;
    其中,所述识别结果包括是否为异常调用以及异常调用时的异常类型;所述异常调用识别模型是基于样本日志信息、样本资源性能数据以及样本网际协议地址训练得到的。
  2. 根据权利要求1所述的API网关异常调用识别的方法,其中,所述异常调用识别模型包括特征抽取层、特征筛选层、第一分类层、枝剪拟合层、第二分类层和识别层;
    所述特征抽取层用于基于所述样本网际协议地址,抽取所述样本日志信息和所述样本资源性能数据的特征,得到第一序列和第二序列,以及基于访问时间和编号,对所述第一序列和所述第二序列进行组合以及异常访问的多分类标注,得到第一数据集以及第一特征字段序列;其中,所述第一特征字段序列是基于第一序列和第二序列得到的,所述第一序列是基于所述样本日志信息和所述样本网际协议地址得到的,所述第二序列是基于所述样本资源性能数据和所述样本网际协议地址得到的;
    所述特征筛选层用于对所述第一序列进行特征筛选,得到第三序列,并基于所述第三序列,生成第二数据集以及第二特征字段序列;
    所述第一分类层用于对所述第二数据集以及第二特征字段序列进行分割处理,得到多个树分类器以及所述树分类器输出的第一预测结果;
    所述枝剪拟合层用于根据所述树分类器的精度,将超过预设精度的所述树分类器进行拟合,得到拟合后的第一层分类器;
    所述第二分类层用于对所述第一层分类器以及所述样本标签进行特征匹配,得到第二层分类器以及所述第二层分类器输出的样本识别结果。
  3. 根据权利要求2所述的API网关异常调用识别的方法,其中,所述特 征筛选层具体包括:
    基于加入树的结构风险项的XGBoost算法对所述第一序列进行筛选,按照预设分维度,提取所述第一序列中的特征,得到所述第三序列。
  4. 根据权利要求2所述的API网关异常调用识别的方法,其中,所述第一分类层具体包括:
    确定切分点,并基于所述切分点确定切分节点;
    将所述第二数据集以及所述第二特征字段序列分配到所述切分节点中,直至每个切分节点被分配的样本数在预设值内,得到多个树分类器。
  5. 根据权利要求2所述的API网关异常调用识别的方法,其中,所述将所述日志信息、所述资源性能数据以及所述网际协议地址输入异常调用识别模型中,得到所述异常调用识别模型输出的识别结果,具体包括以下步骤:
    将所述日志信息、所述资源性能数据的特征和所述网际协议地址输入至所述特征抽取层,得到所述特征抽取层输出的第三数据集、第三特征字段序列以及标签;其中,所述第三特征字段序列是基于第四序列和第五序列得到的,所述第四序列是基于所述日志信息和所述网际协议地址得到的,所述第五序列是基于所述资源性能数据和所述网际协议地址得到的;
    将所述第四特征输入至所述特征筛选层中,得到所述特征筛选层输出的第四数据集以及第四特征字段序列;其中,所述第四数据集和所述第四特征字段序列均是基于所述第四特征筛选生成的第六序列得到的;
    将所述第四数据集和所述第四特征字段序列输入至所述第一分类层中,得到所述第一分类层输出的第二预测结果;
    用于将所述第二预测结果和所述标签输入至所述第二分类层中,得到所述第二分类层输出的所述识别结果。
  6. 根据权利要求2所述的API网关异常调用识别的方法,其中,所述异常识别模型通过以下步骤训练得到:
    获取所述样本日志信息、所述样本资源性能数据以及所述样本网际协议地址。
    对所述样本日志信息、所述样本资源性能数据以及所述样本网际协议地 址进行异常调用的多分类标注,得到所述第一数据集、所述第一特征字段序列以及所述样本标签;
    将所述第一特征字段序列以及对应的所述样本标签作为训练使用的输入数据,采用机器学习的训练方式,得到用于生成所述识别结果的所述异常调用识别模型。
  7. 一种API网关异常调用识别的装置,包括:
    采集模块,用于获取调用时产生的日志信息、资源性能数据以及网际协议地址;
    识别模块,用于将所述日志信息、所述资源性能数据以及所述网际协议地址输入异常调用识别模型中,得到所述异常调用识别模型输出的识别结果;
    其中,所述识别结果包括是否为异常调用以及异常调用时的异常类型;所述异常调用识别模型是基于样本日志信息、样本资源性能数据以及样本网际协议地址训练得到的。
  8. 一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现如权利要求1至6任一项所述API网关异常调用识别的方法的步骤。
  9. 一种非暂态计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述API网关异常调用识别的方法的步骤。
  10. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述API网关异常调用识别的方法的步骤。
PCT/CN2022/107910 2021-11-26 2022-07-26 一种api网关异常调用识别的方法、装置、设备及产品 WO2023093100A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111423922.X 2021-11-26
CN202111423922.XA CN114389834B (zh) 2021-11-26 2021-11-26 一种api网关异常调用识别的方法、装置、设备及产品

Publications (1)

Publication Number Publication Date
WO2023093100A1 true WO2023093100A1 (zh) 2023-06-01

Family

ID=81195468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/107910 WO2023093100A1 (zh) 2021-11-26 2022-07-26 一种api网关异常调用识别的方法、装置、设备及产品

Country Status (2)

Country Link
CN (1) CN114389834B (zh)
WO (1) WO2023093100A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033052A (zh) * 2023-08-14 2023-11-10 贵州慧码科技有限公司 基于模型识别的对象异常诊断方法及系统
CN117033052B (zh) * 2023-08-14 2024-05-24 企口袋(重庆)数字科技有限公司 基于模型识别的对象异常诊断方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114389834B (zh) * 2021-11-26 2024-04-30 浪潮通信信息系统有限公司 一种api网关异常调用识别的方法、装置、设备及产品
CN116016120A (zh) * 2023-01-05 2023-04-25 中国联合网络通信集团有限公司 故障处理方法、终端设备和可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114417A1 (en) * 2017-10-13 2019-04-18 Ping Identity Corporation Methods and apparatus for analyzing sequences of application programming interface traffic to identify potential malicious actions
CN111212038A (zh) * 2019-12-23 2020-05-29 江苏国泰新点软件有限公司 基于大数据人工智能的开放数据api网关系统
CN111309539A (zh) * 2020-03-26 2020-06-19 北京奇艺世纪科技有限公司 一种异常监测方法、装置和电子设备
US20210073618A1 (en) * 2019-09-11 2021-03-11 Intuit Inc. System and method for detecting anomalies utilizing a plurality of neural network models
CN114389834A (zh) * 2021-11-26 2022-04-22 浪潮通信信息系统有限公司 一种api网关异常调用识别的方法、装置、设备及产品

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107465643A (zh) * 2016-06-02 2017-12-12 国家计算机网络与信息安全管理中心 一种深度学习的网络流量分类方法
CN107045607A (zh) * 2016-12-13 2017-08-15 全球能源互联网研究院 应用异常行为识别模型建立方法及装置、识别方法及装置
CN110263265B (zh) * 2019-04-10 2024-05-07 腾讯科技(深圳)有限公司 用户标签生成方法、装置、存储介质和计算机设备
CN111177095B (zh) * 2019-12-10 2023-10-27 中移(杭州)信息技术有限公司 日志分析方法、装置、计算机设备及存储介质
EP3910571A1 (en) * 2020-05-13 2021-11-17 MasterCard International Incorporated Methods and systems for server failure prediction using server logs
CN112052891A (zh) * 2020-08-28 2020-12-08 平安科技(深圳)有限公司 机器行为识别方法、装置、设备及计算机可读存储介质
CN112543176A (zh) * 2020-10-22 2021-03-23 新华三信息安全技术有限公司 一种异常网络访问检测方法、装置、存储介质及终端
CN113657461A (zh) * 2021-07-28 2021-11-16 北京宝兰德软件股份有限公司 基于文本分类的日志异常检测方法、系统、设备及介质
CN113626241B (zh) * 2021-08-10 2023-07-14 中国平安财产保险股份有限公司 应用程序的异常处理方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114417A1 (en) * 2017-10-13 2019-04-18 Ping Identity Corporation Methods and apparatus for analyzing sequences of application programming interface traffic to identify potential malicious actions
US20210073618A1 (en) * 2019-09-11 2021-03-11 Intuit Inc. System and method for detecting anomalies utilizing a plurality of neural network models
CN111212038A (zh) * 2019-12-23 2020-05-29 江苏国泰新点软件有限公司 基于大数据人工智能的开放数据api网关系统
CN111309539A (zh) * 2020-03-26 2020-06-19 北京奇艺世纪科技有限公司 一种异常监测方法、装置和电子设备
CN114389834A (zh) * 2021-11-26 2022-04-22 浪潮通信信息系统有限公司 一种api网关异常调用识别的方法、装置、设备及产品

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033052A (zh) * 2023-08-14 2023-11-10 贵州慧码科技有限公司 基于模型识别的对象异常诊断方法及系统
CN117033052B (zh) * 2023-08-14 2024-05-24 企口袋(重庆)数字科技有限公司 基于模型识别的对象异常诊断方法及系统

Also Published As

Publication number Publication date
CN114389834A (zh) 2022-04-22
CN114389834B (zh) 2024-04-30

Similar Documents

Publication Publication Date Title
WO2023093100A1 (zh) 一种api网关异常调用识别的方法、装置、设备及产品
TWI723528B (zh) 電腦執行的事件風險評估方法及裝置、電腦可讀儲存媒體以及計算設備
CN111565205B (zh) 网络攻击识别方法、装置、计算机设备和存储介质
US10621493B2 (en) Multiple record linkage algorithm selector
US20190058719A1 (en) System and a method for detecting anomalous activities in a blockchain network
WO2023279696A1 (zh) 业务风险客群的识别方法、装置、设备及存储介质
CN109818961B (zh) 一种网络入侵检测方法、装置和设备
CN109447180A (zh) 一种基于大数据和机器学习的电信诈骗上当人发现方法
CN106997367A (zh) 程序文件的分类方法、分类装置和分类系统
US10824694B1 (en) Distributable feature analysis in model training system
CN113762377B (zh) 网络流量识别方法、装置、设备及存储介质
CN113965389B (zh) 一种基于防火墙日志的网络安全管理方法、设备及介质
CN108322428A (zh) 一种异常访问检测方法及设备
CN111143838A (zh) 数据库用户异常行为检测方法
CN112884121A (zh) 基于生成对抗深度卷积网络的流量识别方法
CN110083756A (zh) 识别知识图数据结构中的冗余节点
Tae et al. Comparing ML algorithms on financial fraud detection
Harbola et al. Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set
CN111431884B (zh) 一种基于dns分析的主机失陷检测方法及装置
Jimenez et al. An empirical study on identifying sentences with salient factual statements
Śniegula et al. Study of machine learning methods for customer churn prediction in telecommunication company
CN111049839B (zh) 一种异常检测方法、装置、存储介质及电子设备
CN110888977B (zh) 文本分类方法、装置、计算机设备和存储介质
Corrales et al. Sequential classifiers for network intrusion detection based on data selection process
CN113516189B (zh) 基于两阶段随机森林算法的网站恶意用户预测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897196

Country of ref document: EP

Kind code of ref document: A1