WO2020062803A1 - 基于模型树算法的异常流量分析方法、装置、电子设备及非易失性可读存储介质 - Google Patents

基于模型树算法的异常流量分析方法、装置、电子设备及非易失性可读存储介质 Download PDF

Info

Publication number
WO2020062803A1
WO2020062803A1 PCT/CN2019/079034 CN2019079034W WO2020062803A1 WO 2020062803 A1 WO2020062803 A1 WO 2020062803A1 CN 2019079034 W CN2019079034 W CN 2019079034W WO 2020062803 A1 WO2020062803 A1 WO 2020062803A1
Authority
WO
WIPO (PCT)
Prior art keywords
traffic data
value
target
list
initial
Prior art date
Application number
PCT/CN2019/079034
Other languages
English (en)
French (fr)
Inventor
孙家棣
马宁
于洋
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020062803A1 publication Critical patent/WO2020062803A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring

Definitions

  • the present application relates to the field of data processing technology, and in particular, to a method, an apparatus, an electronic device, and a non-volatile readable storage medium for analyzing abnormal traffic based on a model tree algorithm.
  • the traffic data of a blacklisted user is the traffic data issued by a user who has previously known that the user is engaged in the black industry or has had behaviors that caused abnormal traffic
  • the traffic data of the whitelisted users is the life insurance back office and formal business
  • the traffic data sent by users such as employees, policy users, purchasing assistants, and fund users.
  • the traffic data of uncertain users refers to traffic data sent by users other than blacklisted and whitelisted users.
  • the inventor of the present application realizes that a defect of the prior art is that there are black industry users disguised as white list users among the white list users, which results in that the traffic data sent by the detected white list users includes abnormal traffic data.
  • the application provides a method, an apparatus, an electronic device, and a non-volatile readable storage medium for analyzing abnormal traffic based on a model tree algorithm.
  • the first aspect of the embodiments of the present application discloses a method for analyzing abnormal traffic based on a model tree algorithm.
  • the method includes: obtaining at least one characteristic value of unprocessed traffic data in a black and white list; the black and white list includes at least one unprocessed traffic. Data; performing normalization processing on the feature values to obtain normalized feature values; and using an iterative algorithm to traverse all the normalized feature values according to the initial weight values to obtain weight values corresponding to the traffic data to be processed; Wherein, the weight value is used to indicate an abnormality degree of the to-be-processed traffic data; when the weight value is greater than a reference weight threshold, it is determined that the to-be-processed traffic data is abnormal traffic data.
  • the second aspect of the embodiments of the present application discloses an abnormal traffic analysis device based on a model tree algorithm, including: an acquisition module configured to acquire at least one characteristic value of unprocessed traffic data in a black and white list, where the black and white list includes at least A flow data to be processed; a calculation module configured to: normalize the characteristic value to obtain a normalized characteristic value; the calculation module is further configured to: traverse using an iterative algorithm according to the initial weight value All the normalized feature values are used to obtain a weight value corresponding to the to-be-processed traffic data; wherein the weight value is used to indicate an abnormality of the to-be-processed traffic data; a first determination module is configured to: When the weight value is greater than a reference weight threshold, it is determined that the to-be-processed traffic data is abnormal traffic data.
  • a third aspect of the embodiments of the present application discloses an electronic device including: a memory for storing computer-readable instructions; a processor configured to execute the computer-readable instructions stored in the memory; the computer-readable instructions are When executed by the processor, the method for analyzing abnormal traffic based on a model tree algorithm disclosed in the first aspect of the embodiments of the present application is implemented.
  • the fourth aspect of the embodiments of the present application discloses a non-volatile readable storage medium that stores a computer program that causes a computer to execute the abnormal flow analysis method based on a model tree algorithm disclosed in the first aspect of the embodiments of the present application. .
  • the abnormal traffic data in the detected traffic data can be distinguished to identify black industry users disguised as whitelisted users, thereby improving The purity of traffic data sent by whitelisted users.
  • it distinguishes abnormal traffic data contained in traffic data, and improves the purity of traffic data sent by whitelisted users.
  • FIG. 1 is a schematic structural diagram of a device disclosed in an embodiment of the present application.
  • FIG. 2 is a flowchart of a method for analyzing abnormal traffic based on a model tree algorithm disclosed in an embodiment of the present application
  • FIG. 3 is a flowchart of another method for analyzing abnormal traffic based on a model tree algorithm disclosed in an embodiment of the present application
  • FIG. 4 is a flowchart of another method for analyzing abnormal traffic based on a model tree algorithm disclosed in an embodiment of the present application
  • FIG. 5 is a schematic structural diagram of an abnormal traffic analysis device based on a model tree algorithm disclosed in an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of another abnormal traffic analysis device based on a model tree algorithm disclosed in an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of another abnormal traffic analysis device based on a model tree algorithm disclosed in an embodiment of the present application.
  • the implementation environment of the present application may be a portable mobile device, such as a smart phone, a tablet computer, or a desktop computer.
  • the images stored in the portable mobile device may be: images downloaded from the Internet; images received through a wireless or wired connection; images obtained through a camera built into the portable device.
  • FIG. 1 is a schematic structural diagram of a device disclosed in an embodiment of the present application.
  • the device 100 may be the aforementioned portable mobile device.
  • the device 100 may include one or more of the following components: a processing component 102, a memory 104, a power component 106, a multimedia component 108, an audio component 110, a sensor component 114, and a communication component 116.
  • the processing component 102 generally controls overall operations of the device 100, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 102 may include one or more processors 118 to execute instructions to complete all or part of the steps of the method described below.
  • the processing component 102 may include one or more modules for facilitating interaction between the processing component 102 and other components.
  • the processing component 102 may include a multimedia module to facilitate the interaction between the multimedia component 108 and the processing component 102.
  • the memory 104 is configured to store various types of data to support operation at the device 100. Examples of such data include instructions for any application program or method for operating on the device 100.
  • the memory 104 may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read Only Memory (Electrically Erasable Programmable Read-Only Memory (referred to as EEPROM), Erasable Programmable Read Only Memory (referred to as EPROM), Programmable Read Only Memory (Programmable Red-Only Memory (referred to as PROM), read-only memory (referred to as PROM) Read-Only Memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM Static Random Access Memory
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • EPROM Erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • PROM Read-Only Memory
  • One or more modules are also stored in the memory 104, and the one
  • the power supply assembly 106 provides power to various components of the device 100.
  • the power component 106 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 100.
  • the multimedia component 108 includes a screen that provides an output interface between the device 100 and a user.
  • the screen may include a liquid crystal display (Liquid Crystal Display, LCD for short) and a touch panel. If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user.
  • the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor can not only sense the boundary of a touch or slide action, but also detect duration and pressure related to the touch or slide operation.
  • the screen may also include an Organic Light Emitting Display (OLED).
  • OLED Organic Light Emitting Display
  • the audio component 110 is configured to output and / or input audio signals.
  • the audio component 110 includes a microphone (Microphone, MIC for short).
  • the microphone When the device 100 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signals may be further stored in the memory 104 or transmitted via the communication component 116.
  • the audio component 110 further includes a speaker for outputting audio signals.
  • the sensor component 114 includes one or more sensors for providing status assessment of various aspects of the device 100.
  • the sensor component 114 can detect the open / closed state of the device 100, the relative positioning of the components, and the sensor component 114 can also detect a change in the position of the device 100 or a component of the device 100 and a change in the temperature of the device 100.
  • the sensor component 114 may further include a magnetic sensor, a pressure sensor, or a temperature sensor.
  • the communication component 116 is configured to facilitate wired or wireless communication between the device 100 and other devices.
  • the device 100 can access a wireless network based on a communication standard, such as WiFi (Wireless-Fidelity).
  • the communication component 116 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel.
  • the communication component 116 further includes a Near Field Communication (NFC) module for promoting short-range communication.
  • NFC Near Field Communication
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra wideband (UWB) technology, Bluetooth technology, and other technologies. .
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra wideband
  • Bluetooth and other technologies.
  • the device 100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital signal processors, digital signal processing equipment, programmable logic devices, field programmable gate arrays, and controllers. , Microcontrollers, microprocessors, or other electronic components to perform the methods described below.
  • ASICs Application Specific Integrated Circuits
  • digital signal processors digital signal processing equipment
  • programmable logic devices programmable logic devices
  • field programmable gate arrays and controllers.
  • FIG. 2 is a schematic flowchart of an abnormal traffic analysis method based on a model tree algorithm disclosed in an embodiment of the present application.
  • the method for analyzing abnormal traffic based on a model tree algorithm may include the following steps:
  • the black and white list includes at least one pending traffic data.
  • the manner of obtaining at least one characteristic value of the pending traffic data in the black and white list may be specifically:
  • At least one feature value corresponding to the at least one feature is obtained from the to-be-processed traffic data of the black and white list.
  • the feature is a parameter preset by the operator to identify the abnormality of a piece of traffic data, that is, the larger the feature value corresponding to a certain feature of a piece of traffic data, the higher the degree of abnormality of a piece of traffic data.
  • the characteristics can be path repeatability ranking, abnormal rate of user risk control parameters, proportion of back-end buried points, risk control IP divergence rate, number of risk control IP access accounts, number of risk control IP accesses, number of risk control ip_wifi names, wind Cumulative risk score of IP control, mean number of users in IPC period, variance of user in IPC period, mean value of visits in IPC period, variance of number of visits in IPC period, mean value of user login in mobile phone number period and At least one of the user login variances in the mobile phone number segment during the period.
  • the black and white list includes a black list and a white list, and the black and white lists each include multiple pending traffic data.
  • step 202 is triggered to be performed.
  • the eigenvalues are normalized and the normalized eigenvalues can be obtained by adding all the eigenvalues to obtain the total value; divide each eigenvalue separately to obtain the normalization value Characterization value.
  • another optional method is to perform a normalization process on the characteristic values, and a method for obtaining the normalized characteristic values may be: determining multiple thresholds by using the following formula,
  • x is a feature value to be normalized
  • x min and x max are the minimum value and 99% median value of the feature corresponding to the feature value in all the traffic data to be processed. Is the normalized eigenvalue obtained after normalization.
  • the median value of 99% refers to the following: Among all the pending traffic data, 99% of the pending traffic data has the characteristic value less than the 99% median value, and only 1% of the pending traffic data has the characteristic Values are greater than 99% median.
  • the 99% median value is set to avoid the influence of occasional large eigenvalue samples, and to improve the accuracy of distinguishing abnormal flow data.
  • the execution of step 203 is triggered.
  • the weight value is used to indicate the abnormality of the traffic data to be processed.
  • step 204 is triggered.
  • an iterative algorithm is used to traverse all the normalized feature values to obtain the weight value corresponding to the traffic data to be processed, which may include the following steps:
  • a weight value corresponding to the traffic data to be processed is determined according to the current initial weight value, the current target threshold value, and the current target feature value.
  • each feature value in each to-be-processed traffic data is determined separately according to each threshold, and multiple determination results are obtained, which may include the following steps:
  • any threshold is greater than any characteristic value and the initial attribution list of any pending traffic data is white list, the determination result is determined to be correct; if any threshold is less than any characteristic value and any of the pending traffic data If the initial attribution list is a white list, the determination result is determined as an error determination; if any threshold is greater than any characteristic value and the initial attribute of any pending traffic data is a black list, the determination result is determined as an error determination; if any If a threshold value is less than any characteristic value and the initial attribute of any pending traffic data is a blacklist, the determination result is determined to be a correct determination.
  • determining multiple thresholds may include determining an initial value within a preset value range, and calculating a sum of a positive integer multiple of a preset step size and the initial value to obtain a plurality of target values.
  • the initial value and all target values are determined as multiple threshold values, where any target value is within a preset value range.
  • this optional embodiment can distinguish the abnormal traffic data contained in the traffic data, and improve the purity of the traffic data sent by the whitelisted users.
  • multiple thresholds can be determined in an interval of [0,1] by an equal step size method (that is, each time an equal step size is increased).
  • One threshold is determined to be 0.1
  • the second threshold is increased by 0.1 and becomes 0.2 on the basis of the first threshold
  • the third threshold is increased by 0.1 and becomes 0.3 on the basis of the second threshold, and so on.
  • Multiple thresholds are determined in the interval of [0,1].
  • the initial home list (eg, black list, white list) of the pending traffic data is known.
  • this judgment is an error judgment; if the feature value is judged as black traffic data but the The initial belonging list of the to-be-processed traffic data corresponding to the characteristic value is a black list, and then the determination is a correct determination.
  • the minimum weight error can be obtained based on the above-mentioned correct and wrong determinations and according to the definition of the weight error; wherein the weight error is defined as: n is the number of traffic data to be processed, ⁇ i is the weight value (or initial weight value) of the i-th sample data, and error (X i ) is the error determination error of the sample X i . If X i is incorrectly determined, then error (X i ) is 1, otherwise it is 0. In addition, the initial weight value is 1 / n.
  • the error obtained by the above method is a weight error.
  • a threshold value can be arbitrarily selected from the determined multiple threshold values, and a weight error can be obtained according to the same method as described above.
  • ⁇ i is reduced, and if X i is incorrectly determined under the target threshold and target features, ⁇ i is increased. Furthermore, ⁇ i after the increase / decrease is traversed through all the combinations of the above features and thresholds until the feature and threshold corresponding to the smallest weight error after the increase / decrease ⁇ i is obtained. Furthermore, ⁇ i is reduced / increased by correct / wrong determination.
  • ⁇ i is a weight value of the corresponding to-be-processed traffic data.
  • the way to determine the cost parameter ⁇ may be: training multiple classifiers with a sample set composed of several samples of white user traffic data and black user traffic data.
  • the preset set of values of ⁇ can be ⁇ 0.1, 0.2, 0.3, 0.4 ... 1 ⁇ .
  • any value in the set can be taken as the value of ⁇ , and then the classifier can determine the weight value of each pending traffic data according to the value of ⁇ , and based on the comparison of the weight value with the reference weight threshold, the reference weight threshold will be exceeded Of the pending traffic data is deleted from all the pending traffic data, and the rest is the purified whitelisted traffic data.
  • the whitelisted traffic data is identified as blacklisted traffic data, or the blacklisted traffic data is identified as whitelisted traffic data.
  • identifying the blacklisted traffic data as the whitelisted traffic data will cause serious consequences. Therefore, the following operations need to be performed to reduce the possibility of the above serious consequences: First, the accurate number of distinguishing abnormal traffic data needs to be divided by the traffic data.
  • the method of selecting the optimal ⁇ value according to the average recall rate and the average accuracy rate may specifically be: determining a weighted average of the average accuracy rate and the average recall rate as the value of ⁇ ; wherein the weight of the average recall rate is greater than the average Accuracy.
  • the method of selecting the optimal lambda value according to the average recall rate and the average accuracy rate can also be: if the average recall rate and the average accuracy rate are in the target interval, weight the sum of the average recall rate and the average accuracy rate to obtain a weight Sum value as the value of ⁇ .
  • the target attribution list is consistent with the initial attribution list of the pending traffic data, reduce the initial weight value of the pending traffic data; if the target attribution list is inconsistent with the initial attribution list, increase the initial weight value of the pending traffic data Can include the following steps:
  • ⁇ i (t + 1) represents the initial weight value after this update
  • sum ( ⁇ i (t) ) is the sum of the initial weight values after each update before this update
  • is based on this update
  • the median number calculated based on the misjudgment rate, where if ⁇ is used to represent the misjudgment rate of this update, the formula for calculating the median alpha can be expressed as: It can be seen that updating the initial weight value according to the first preset rule makes the initial weight value smaller and smaller, and updating the initial weight value according to the second preset rule instead makes the initial weight value larger and larger.
  • FIG. 3 is a schematic flowchart of another abnormal traffic analysis method based on a model tree algorithm disclosed in an embodiment of the present application.
  • the black and white list consists of a black list and a white list.
  • the black list includes at least one pending traffic data
  • the white list includes at least one pending traffic data.
  • the method for analyzing abnormal traffic based on a model tree algorithm may include the following steps:
  • step 301 and step 302 for the description of step 301 and step 302, please refer to the detailed description of step 201 and step 202 in the second embodiment, which is not repeated in this embodiment of the present application.
  • step 305 Determine whether the pending traffic data belongs to the white list or the black list. If the traffic data belongs to the white list, step 306 is performed. If the data belongs to the black list, the process is ended.
  • the to-be-processed traffic data belongs to the white list, it means that the to-be-processed traffic data is traffic generated by a black industry user disguised as a white list user.
  • step 308 Determine whether the pending traffic data belongs to the white list or the black list. If the data belongs to the white list, step 309 is performed. If the data belongs to the black list, the process is ended.
  • the traffic data to be processed belongs to the blacklist, it indicates that the traffic data to be processed is traffic generated by misoperations that cause the whitelisted users to be classified as blacklisted users.
  • FIG. 4 is a schematic flowchart of another abnormal flow analysis method based on a model tree algorithm disclosed in an embodiment of the present application.
  • the black and white list consists of a black list and a white list.
  • the black list includes at least one pending traffic data
  • the white list includes at least one pending traffic data.
  • the method for analyzing abnormal traffic based on a model tree algorithm may include the following steps:
  • steps 401 to 403 please refer to the detailed descriptions of steps 301 to 303 in the third embodiment, which will not be repeated in this embodiment of the present application.
  • the method for analyzing abnormal traffic based on a model tree algorithm may include the following steps: step 407 to step 412.
  • steps 407 to 412 please refer to the detailed description of steps 304 to 309 in the third embodiment, which is not repeated in this embodiment of the present application.
  • determining the cost value of each original traffic data point in the traffic data distribution map can include the following steps:
  • each original flow data point in the flow data distribution graph as an inflection point, and fit all the points on the left and the right of the inflection point into a straight line, and calculate each remaining except the inflection point in the flow data distribution graph.
  • the ordinate difference between the ordinate of the original flow data point and the ordinate of each target flow data point on the corresponding line, respectively, to obtain the ordinate difference corresponding to each remaining original flow data point, and calculate each remaining original flow data The sum of the squares of the ordinate differences corresponding to the points, to obtain the cost value of each original flow data.
  • FIG. 5 is a schematic structural diagram of an abnormal traffic analysis device based on a model tree algorithm disclosed in an embodiment of the present application.
  • the apparatus for analyzing abnormal traffic based on a model tree algorithm may include: an acquisition module 501, a calculation module 502, and a first determination module 503.
  • the acquisition module 501 is configured to acquire data of pending traffic in a black and white list. At least one characteristic value of the; black and white list includes at least one pending traffic data.
  • the calculation module 502 is triggered to start.
  • the calculation module 502 is configured to perform normalization processing on the feature values to obtain the normalized feature values.
  • the calculation module 502 is further configured to: according to the initial weight value, use an iterative algorithm to traverse all the normalized feature values to obtain the weight value corresponding to the traffic data to be processed; wherein the weight value is used to indicate the abnormality of the traffic data to be processed.
  • the first determining module 503 is configured to: when the weight value is greater than the reference weight threshold, determine the to-be-processed traffic data as abnormal traffic data.
  • the abnormal traffic analysis device based on the model tree algorithm described in FIG. 5 can distinguish the black traffic users disguised as white list users by distinguishing the abnormal traffic data in the detected traffic data, thereby improving the white list.
  • FIG. 6 is a schematic structural diagram of another abnormal traffic analysis device based on a model tree algorithm disclosed in an embodiment of the present application.
  • the black and white list consists of a black list and a white list.
  • the black list includes at least one pending traffic data
  • the white list includes at least one pending traffic data.
  • the abnormal flow analysis device based on the model tree algorithm shown in FIG. 6 is optimized by the abnormal flow analysis device based on the model tree algorithm shown in FIG. 5.
  • the abnormal traffic analysis device based on the model tree algorithm shown in FIG. 6 may further include a determination module 504, a deletion module 505, and a second determination module 506.
  • the determining module 504 is configured to determine whether the to-be-processed traffic data belongs to the white list or the black-list after the first determining module 503 determines that the to-be-processed traffic data is abnormal traffic data.
  • the to-be-processed traffic data belongs to the white list it means that the to-be-processed traffic data is traffic generated by a black industry user disguised as a white list user.
  • the deleting module 505 is configured to delete the to-be-processed traffic data from the white list and add the to-be-processed traffic data to the black list after determining that the to-be-processed traffic data belongs to the white list.
  • the second determining module 506 is configured to determine that the traffic data to be processed is normal traffic data when the weight value is not greater than the reference weight threshold.
  • the determining module 504 is further configured to determine whether the to-be-processed traffic data belongs to the white list or the black-list after the second determining module 506 determines that the to-be-processed traffic data is normal traffic data.
  • the traffic data to be processed belongs to the blacklist, it indicates that the traffic data to be processed is traffic generated by misoperations that cause the whitelisted users to be classified as blacklisted users.
  • the deleting module 505 is further configured to delete the to-be-processed traffic data from the black list and add the to-be-processed traffic data to the white list after determining that the to-be-processed traffic data belongs to the black list.
  • the abnormal traffic analysis device based on the model tree algorithm described in FIG. 6 can distinguish the black traffic users disguised as white list users by distinguishing the abnormal traffic data in the detected traffic data, thereby improving the white list users.
  • the purity of the outgoing traffic data can distinguish the black traffic users disguised as white list users by distinguishing the abnormal traffic data in the detected traffic data, thereby improving the white list users. The purity of the outgoing traffic data.
  • FIG. 7 is a schematic structural diagram of another abnormal traffic analysis device based on a model tree algorithm disclosed in an embodiment of the present application.
  • the black and white list consists of a black list and a white list.
  • the black list includes at least one pending traffic data
  • the white list includes at least one pending traffic data.
  • the abnormal traffic analysis device based on the model tree algorithm shown in FIG. 7 is optimized by the abnormal traffic analysis device based on the model tree algorithm shown in FIG. 6.
  • the abnormal flow analysis device based on the model tree algorithm shown in FIG. 7 may further include a third determination module 508 and a fourth determination module 509.
  • the obtaining module 501 is further configured to obtain the flow data distribution with all the pending flow data as a vertical axis and the weight value corresponding to the pending flow data as a horizontal axis before the first determining module 503 determines that the pending flow data is abnormal flow data. Illustration.
  • the third determining module 508 is used to determine the cost value of each original flow data point in the flow data distribution map, and the cost value of each original flow data point is used to represent each original flow data point in the flow data distribution map and the fitted value. The degree of similarity of each flow data point in the flow data distribution graph of.
  • the fourth determining module 509 is configured to determine the original traffic data point corresponding to the smallest generation value of all the original generation data as the target inflection point, and determine the ordinate of the target inflection point as the reference weight threshold.
  • the abnormal traffic analysis device based on the model tree algorithm described in FIG. 7 can distinguish the black traffic users disguised as white list users by distinguishing the abnormal traffic data in the detected traffic data, thereby improving the white list users.
  • the purity of the outgoing traffic data can distinguish the black traffic users disguised as white list users by distinguishing the abnormal traffic data in the detected traffic data, thereby improving the white list users. The purity of the outgoing traffic data.
  • the present application further provides an electronic device including a processor and a memory for storing computer-readable instructions, wherein the processor is configured to execute the model-based tree as shown before when the computer-readable instructions are executed. Algorithm for abnormal flow analysis.
  • the electronic device may be the apparatus 100 shown in FIG. 1.
  • the present application further provides a non-volatile readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the model tree algorithm-based Anomaly flow analysis method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Physics (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Traffic Control Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请揭示了一种基于模型树算法的异常流量分析方法、装置、电子设备及非易失性可读存储介质。本申请涉及数据处理技术领域。所述方法包括:获取黑白名单中待处理流量数据的至少一个特征值;黑白名单包括至少一个待处理流量数据;对特征值进行归一化处理,获得归一化特征值;根据初始权重值,利用迭代算法遍历所有归一化特征值,获得待处理流量数据对应的权重值;当权重值大于参考权重阈值时,确定待处理流量数据为异常流量数据;能够通过对大数据的分析,基于模型树算法甄别伪装成白名单用户的黑色产业用户,进而提高白名单用户发出的流量数据的纯净程度。综上,实现了对流量数据中包含的异常流量数据的区分,提高了白名单用户发出的流量数据的纯净程度。

Description

基于模型树算法的异常流量分析方法、装置、电子设备及非易失性可读存储介质
相关申请的交叉引用
本申请基于并要求2018年09月25日递交、发明名称为“基于模型树算法的异常流量分析方法及装置、电子设备”的中国专利申请CN201811120226.X的优先权,在此通过引用将其全部内容合并于此。
技术领域
本申请涉及数据处理技术领域,特别涉及一种基于模型树算法的异常流量分析方法、装置、电子设备及非易失性可读存储介质。
背景技术
在互联网领域中,经常会出现导致异常流量的行为。例如,某应用程序发放一个优惠券,并且规定每个账号只能领取一个优惠券,此时,一些用户就会利用不正常手段在手机上重复注册多个账号,领取多个优惠券。此外,目前的以上述为例的黑色产业链已经形成了完整的产业链条,该产业链条中包括大量的例如木马播种、流量交易以及虚拟财产套现的会导致异常流量的行为。在现有技术的实现中,为了区别用户发出流量的异常表现,将用户分为黑名单用户、白名单用户和不确定用户,以通过白名单用户发出的流量数据作为依据,检测流量数据中存在的异常流量数据;其中,黑名单用户的流量数据是事先已知该用户从事黑色产业或曾有过导致异常流量的行为的用户发出的流量数据,白名单用户的流量数据是寿险内勤、正式业务员、保单用户、购买生活助手和基金用户等用户发出的流量数据,不确定用户的流量数据是指黑名单用户和白名单用户之外的用户发出的流量数据。
本申请的发明人意识到,现有技术的缺陷在于,白名单用户中存在伪装成白名单用户的黑色产业用户,导致检测到的白名单用户发出的流量数据中包含了异常流量数据。
技术解决方案
本申请提供了一种基于模型树算法的异常流量分析方法、装置、电子设备及非易失性可读存储介质。
本申请实施例第一方面公开了一种基于模型树算法的异常流量分析方法,所述方法包括:获取黑白名单中待处理流量数据的至少一个特征值;所述黑白名单包括至少一个待处理流量数据;对所述特征值进行归一化处理,获得归一化特征值;根据初始权重值,利用迭代算法遍历所有所述归一化特征值,获得所述待处理流量数据对应的权重值;其中,所述权重值用于指示所述待处理流量数据的异常程度;当所述权重值大于参考权重阈值时,确定所述待处理流量数据为异常流量数据。
本申请实施例第二方面公开了一种基于模型树算法的异常流量分析装置,包括:获取模块,被配置为:获取黑白名单中待处理流量数据的至少一个特征值,所述黑白名单包括至少一个待处理流量数据;计算模块,被配置为:对所述特征值进行归一化处理,获得归一化特征值;所述计算模块,还被配置为:根据初始权重值,利用迭代算法遍历所有所述归一化特征值,获得所述待处理流量数据对应的权重值;其中,所述权重值用于指示所述待处理流量数据的异常程度;第一确定模块,被配置为:当所述权重值大于参考权重阈值时,确定所述待处理流量数据为异常流量数据。
本申请实施例第三方面公开了一种电子设备,包括:存储器,用于存储计算机可读指令;处理器,被配置为执行所述存储器存储的计算机可读指令;所述计算机可读指令被所述处理器执行时,实现本申请实施例第一方面公开的基于模型树算法的异常流量分析方法。
本申请实施例第四方面公开了一种非易失性可读存储介质,其存储计算机程序,所述计算机程序使得计算机执行本申请实施例第一方面公开的基于模型树算法的异常流量分析方法。
有益效果
通过本申请所提供的基于模型树算法的异常流量分析方法的各实施例,能够通过对检测到的流量数据中的异常流量数据进行区分,以甄别伪装成白名单用户的黑色产业用户,进而提高白名单用户发出的流量数据的纯净程度。综上,实现了对流量数据中包含的异常流量数据的区分,提高了白名单用户发出的流量数据的纯净程度。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并于说明书一起用于解释本申请的原理。
图1是本申请实施例公开的一种装置的结构示意图;
图2是本申请实施例公开的一种基于模型树算法的异常流量分析方法的流程图;
图3是本申请实施例公开的另一种基于模型树算法的异常流量分析方法的流程图;
图4是本申请实施例公开的又一种基于模型树算法的异常流量分析方法的流程图;
图5是本申请实施例公开的一种基于模型树算法的异常流量分析装置的结构示意图;
图6是本申请实施例公开的另一基于模型树算法的异常流量分析装置的结构示意图;
图7是本申请实施例公开的又一基于模型树算法的异常流量分析装置的结构示意图。
本发明的实施方式
这里将详细地对示例性实施例执行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
本申请的实施环境可以是便携移动设备,例如智能手机、平板电脑、台式电脑。便携移动设备中所存储的图像可以是:从互联网下载的图像;通过无线连接或有线连接接收的图像;通过自身所内置摄像头拍摄得到的图像。
图1是本申请实施例公开的一种装置的结构示意图。装置100可以是上述便携移动设备。如图1所示,装置100可以包括以下一个或多个组件:处理组件102,存储器104,电源组件106,多媒体组件108,音频组件110,传感器组件114以及通信组件116。
处理组件102通常控制装置100的整体操作,诸如与显示,电话呼叫,数据通信,相机操作以及记录操作相关联的操作等。处理组件102可以包括一个或多个处理器118来执行指令,以完成下述的方法的全部或部分步骤。此外,处理组件102可以包括一个或多个模块,用于便于处理组件102和其他组件之间的交互。例如,处理组件102可以包括多媒体模块,用于以方便多媒体组件108和处理组件102之间的交互。
存储器104被配置为存储各种类型的数据以支持在装置100的操作。这些数据的示例包括用于在装置100上操作的任何应用程序或方法的指令。存储器104可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(Static Random Access Memory,简称SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,简称EEPROM),可擦除可编程只读存储器(Erasable Programmable Read Only Memory,简称EPROM),可编程只读存储器(Programmable Red-Only Memory,简称PROM),只读存储器(Read-Only Memory,简称ROM),磁存储器,快闪存储器,磁盘或光盘。存储器104中还存储有一个或多个模块,该一个或多个模块被配置成由该一个或多个处理器118执行,以完成如下所示方法中的全部或者部分步骤。
电源组件106为装置100的各种组件提供电力。电源组件106可以包括电源管理系统,一个或多个电源,及其他与为装置100生成、管理和分配电力相关联的组件。
多媒体组件108包括在装置100和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(Liquid Crystal Display,简称LCD)和触摸面板。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与触摸或滑动操作相关的持续时间和压力。屏幕还可以包括有机电致发光显示器(Organic Light Emitting Display,简称OLED)。音频组件110被配置为输出和/或输入音频信号。例如,音频组件110包括一个麦克风(Microphone,简称MIC),当装置100处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器104或经由通信组件116发送。在一些实施例中,音频组件110还包括一个扬声器,用于输出音频信号。
传感器组件114包括一个或多个传感器,用于为装置100提供各个方面的状态评估。例如,传感器组件114可以检测到装置100的打开/关闭状态,组件的相对定位,传感器组件114还可以检测装置100或装置100一个组件的位置改变以及装置100的温度变化。在一些实施例中,该传感器组件114还可以包括磁传感器,压力传感器或温度传感器。
通信组件116被配置为便于装置100和其他设备之间有线或无线方式的通信。装置100可以接入基于通信标准的无线网络,如WiFi(Wireless-Fidelity,无线保真)。在本申请实施例中,通信组件116经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在本申请实施例中,通信组件116还包括近场通信(Near Field Communication,简称NFC)模块,用于以促进短程通信。例如,在NFC模块可基于射频识别(Radio Frequency Identification,简称RFID)技术,红外数据协会(Infrared Data Association,简称IrDA)技术,超宽带(Ultra Wideband,简称UWB)技术,蓝牙技术和其他技术来实现。
在示例性实施例中,装置100可以被一个或多个专用集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器、数字信号处理设备、可编程逻辑器件、现场可编程门阵列、控制器、微控制器、微处理器或其他电子元件实现,用于执行下述方法。
请参阅图2,图2是本申请实施例公开的一种基于模型树算法的异常流量分析方法的流程示意图。如图2所示该基于模型树算法的异常流量分析方法可以包括以下步骤:
201、获取黑白名单中待处理流量数据的至少一个特征值;黑白名单包括至少一个待处理流量数据。
可选的,获取黑白名单中待处理流量数据的至少一个特征值的方式具体可以为:
根据预设特征库中的至少一个特征,从黑白名单的待处理流量数据中获取与至少一个特征相对应的至少一个特征值。
其中,特征是操作人员预先设置的用于识别一条流量数据的异常程度的参数,即某条流量数据的某个特征对应的特征值越大,则某条流量数据的异常程度越高。此外,特征可以为路径重复度排名、用户风控参数异常率、后端埋点比重、风控ip发散率、风控ip访问账号数、风控ip访问次数、风控ip_wifi名个数、风控ip累计风险得分、风控ip周期内用户数均值、风控ip周期内用户方差、风控ip周期内访问次数均值、风控ip周期内访问次数方差、周期内手机号段用户登录均值和周期内手机号段用户登录方差中至少一种。
作为示例,黑白名单包括黑名单和白名单,黑、白名单均包括多个待处理流量数据。
本申请实施例中,在步骤201执行完毕之后,触发执行步骤202。
202、对特征值进行归一化处理,获得归一化特征值。
可选的,对特征值进行归一化处理,获得归一化特征值的方式具体可以为:对所有特征值进行加和,获得总数值;通过总数值分别除每个特征值,获得归一化特征值。
本申请实施例中,另一种可选的,对特征值进行归一化处理,获得归一化特征值的方 式可以为:通过以下公式确定多个阈值,
Figure PCTCN2019079034-appb-000001
其中,x是要需要归一化的特征值,x min和x max为在所有待处理流量数据中的特征值对应的该特征的最小值和99%中位值,
Figure PCTCN2019079034-appb-000002
是归一化后获得的归一化特征值。
其中,99%中位值是指如下:在所有的待处理流量数据中,有99%的待处理流量数据的该特征值小于99%中位值,只有1%的待处理流量数据的该特征值大于99%中位值。设置成99%中位值是为了避免偶然出现的大特征值样本的影响,以提高区分异常流量数据的区分准确性。本申请实施例中,在步骤202执行完毕之后,触发执行步骤203。
203、根据初始权重值,利用迭代算法遍历所有归一化特征值,获得待处理流量数据对应的权重值;其中,权重值用于指示待处理流量数据的异常程度。
本申请实施例中,在步骤203执行完毕之后,触发执行步骤204。
204、当权重值大于参考权重阈值时,确定待处理流量数据为异常流量数据。
作为一种可选的实施方式,根据初始权重值,利用迭代算法遍历所有归一化特征值,获得待处理流量数据对应的权重值,可以包括以下步骤:
确定多个阈值,并根据每个阈值对每个待处理流量数据中的每个特征值分别进行判定,得到多个判定结果;
根据初始权重值以及每个判定结果,确定每个判定结果对应的权重误差,并确定所有权重误差中最小的目标权重误差以及目标权重误差对应的目标阈值和目标特征值;以及根据目标阈值和目标特征,确定待处理流量数据的目标归属名单;如果目标归属名单与待处理流量数据的初始归属名单一致时,减少待处理流量数据的初始权重值;如果目标归属名单与初始归属名单不一致时,增加待处理流量数据的初始权重值;
执行上述的根据初始权重值以及每个判定结果,确定每个判定结果对应的权重误差,并确定所有权重误差中最小的目标权重误差以及目标权重误差对应的目标阈值和目标特征;以及根据目标阈值和目标特征,确定待处理流量数据的目标归属名单;如果目标归属名单与待处理流量数据的初始归属名单一致时,减少待处理流量数据的初始权重值;如果目标归属名单与初始归属名单不一致时,增加待处理流量数据的初始权重值,直至初始权重值的变化次数达到预设次数阈值;
当初始权重值的变化次数达到预设次数阈值时,根据当前的初始权重值、当前的目标阈值和当前的目标特征值,确定待处理流量数据对应的权重值。
进一步地,根据每个阈值对每个待处理流量数据中的每个特征值分别进行判定,得到多个判定结果,可以包括以下步骤:
根据多个阈值中的任一阈值判定待处理流量数据中的任一特征值,得到判定结果;
如果任一阈值大于任一特征值且任一待处理流量数据的初始归属名单为白名单,则将判定结果确定为正确判定;如果任一阈值小于任一特征值且任一待处理流量数据的初始归属名单为白名单,则将判定结果确定为错误判定;如果任一阈值大于任一特征值且任一待处理流量数据的初始属性为黑名单,则将判定结果确定为错误判定;如果任一阈值小于任一特征值且任一待处理流量数据的初始属性为黑名单,则将判定结果确定为正确判定。
更进一步地,确定多个阈值,可以包括:在预设数值范围内确定初始数值,并计算预设步长的正整数倍与初始数值的和,得到多个目标数值。将初始数值以及所有目标数值确定为多个阈值,其中,任一目标数值均处于预设数值范围内。
可见,实施该可选的实施方式,能够实现对流量数据中包含的异常流量数据的区分,并提高白名单用户发出的流量数据的纯净程度。
针对上述可选的实施方式,进行如下具体地说明:首先,可以通过等步长法(即每次增加一个相等的步长)在【0,1】的区间内确定多个阈值,例如,第一个阈值确定为0.1,第二个阈值在第一个阈值的基础上增加0.1变成0.2,第三个阈值在第二个阈值的基础上增加0.1变成0.3,以此类推,即可在【0,1】的区间内确定多个阈值。
进一步地,如果在确定出的多个阈值中任意取一个阈值,并且任意确定一个目标特征。那么,黑白名单中所有的待处理流量数据均对应一个与目标特征匹配的特征值,大于阈值的特征值可以被判定为白流量数据,小于阈值的特征值可以被判定为黑流量数据。在此之前,待处理流量数据的初始归属名单(例如,黑名单、白名单)已知。因此,如果特征值被判定为白流量数据但是该特征值对应的待处理流量数据的初始归属名单为黑名单,那么此次判定则为一次错误判定;如果特征值被判定为黑流量数据但是该特征值对应的待处理流量数据的初始归属名单为黑名单,那么此次判定则为一次正确判定。
更进一步地,基于上述的正确判定和错误判定并根据权重误差的定义可以获得最小权重误差;其中,权重误差的定义为:
Figure PCTCN2019079034-appb-000003
n为待处理流量数据的个数,ω i为第i个样本数据的权重值(或初始权重值),error(X i)是样本X i的错误判定误差,如果X i被错误判定,则error(X i)为1,否则为0。此外,初始权重值为1/n。通过上述方法获得的error为权重误差。进而,可以在确定出的多个阈值中再任意取一个阈值,并且按照上述同样的方法得到权重误差。以此类推,遍历所有确定出的阈值,进而,再任意确定一个目标特征,遍历所有确定出的阈值,直到遍历完所有的特征。这样,针对每个遍历的特征和遍历的阈值的组合均对应一个权重误差,其中,最小的权重误差对应的特征和阈值即为目标阈值和目标特征。
更进一步地,如果X i在目标阈值和目标特征下被正确判定,则减少ω i,如果X i在目标阈值和目标特征下被错误判定,则增加ω i。进而,再将增加/减少后的ω i遍历上述所有的特征和阈值的组合,直到获得增加/减少ω i后的最小的权重误差对应的特征和阈值。进而再通过正确/错误判定对ω i减少/增加。执行上述操作,直到ω i的变化次数(增加/减少)达到预设次数阈值(例如,40次),此时获得的ω i为对应的待处理流量数据的权重值。
更进一步地,为了降低进行异常流量数据区分时将黑流量数据区分为白流量数据的概率,可以对
Figure PCTCN2019079034-appb-000004
中的ω i进行代价补充,即ω i=ω i*(1+λ),其中,λ为代价参数。可见,通过对ω i进行代价补充,提高了对异常流量数据的区分准确率。
更进一步地,确定代价参数λ的方式可以为:用若干白用户流量数据样本和黑用户流量数据样本组成的样本集合训练多个分类器。其中,预设的λ取值的集合可以为{0.1,0.2,0.3,0.4……1}。首先可以取集合中任一值作为λ的值,再使得分类器分别根据该λ的值确定每个待处理流量数据的权重值,并基于权重值与参考权重阈值的比较,将超过参考权重阈值的待处理流量数据从所有的待处理流量数据中删除,剩下的即为净化后的白名单流量数据。但是,由于进行异常流量数据的区分的过程中会存在区分错误,即,将白名单流量数据识别为黑名单流量数据,或者将黑名单流量数据识别为白名单流量数据。其中,将黑名单流量数据识别为白名单流量数据会导致严重后果,因此,需要执行以下操作以降低上述严重后果发生的可能性:首先,需要根据区分异常流量数据准确的个数除以流量数据 的总数获得准确率;并根据黑名单流量数据中识别错误的数据除以黑名单流量数据的总数以获得召回率;进而得到每个分类器的准确率和召回率;再将各分类器的准确率和召回率分别取平均值;进而得到在λ的值为上述的任一值时的平均召回率和平均准确率;遍历集合中的所有值作为λ的值并计算获得每个λ对应的平均召回率和平均准确率,进而根据平均召回率和平均准确率选取最优的λ值(例如,0.3或0.4)。其中,根据平均召回率和平均准确率选取最优的λ值的方法具体可以为:确定平均准确率和平均召回率的加权平均值,作为λ的值;其中,平均召回率的权值大于平均准确率。另外,根据平均召回率和平均准确率选取最优的λ值的方法还可以为:如果平均召回率和平均准确率均处于目标区间,则将平均召回率和平均准确率进行加权和,得到加权和值,作为λ的值。
更进一步地,如果目标归属名单与待处理流量数据的初始归属名单一致时,减少待处理流量数据的初始权重值;如果目标归属名单与初始归属名单不一致时,增加待处理流量数据的初始权重值,可以包括以下步骤:
如果目标归属名单与待处理流量数据的初始归属名单一致时,通过
Figure PCTCN2019079034-appb-000005
更新待处理流量数据的初始权重值;如果目标归属名单与初始归属名单不一致时,通过
Figure PCTCN2019079034-appb-000006
更新待处理流量数据的初始权重值。其中,ω i (t+1)表示本次更新之后的初始权重值,sum(ω i (t))为本次更新之前的每次更新后的初始权重值之和,α为依据本次更新的误判定率计算出的中间数,其中,若用ε表示本次更新的误判定率,中间数α的计算公式可以表示为:
Figure PCTCN2019079034-appb-000007
可见,按照第一预设规则更新初始权重值使得初始权重值越来越小,按照第二预设规则更新初始权重值反而使得初始权重值越来越大。需要说明的是,α≤0(即ε≥0.5)时,说明误判定率过高,丢弃,这样可以保证α>0(即ε<0.5),由于指数函数的特点是开始的几次更新中按照指数函数上升或下降得慢,后来的更新中上升或下降的快,因此,这使得只有很多次更新中都上升或下降的流量数据权重才能达到一个很高的权重值,减少了由于初始选取特征和阈值不理想等原因造成的阈值上升或下降带来的影响,可见,上述公式有利于判定结果的正确率。
实施图2所述的方法,能够通过对检测到的流量数据中的异常流量数据进行区分以甄别伪装成白名单用户的黑色产业用户,进而提高白名单用户发出的流量数据的纯净程度。
请参阅图3,图3是本申请实施例公开的另一种基于模型树算法的异常流量分析方法的流程示意图。其中,黑白名单由黑名单和白名单组成,黑名单包括至少一个待处理流量数据,白名单包括至少一个待处理流量数据。如图3所示,该基于模型树算法的异常流量分析方法可以包括以下步骤:
步骤301和步骤302,针对步骤301和步骤302的描述,请参照实施例二中针对步骤201和步骤202的详细描述,本申请实施例不再赘述。
303、根据初始权重值,利用迭代算法遍历所有归一化特征值,获得待处理流量数据对应的权重值;其中,权重值用于指示待处理流量数据的异常程度。
304、当权重值大于参考权重阈值时,确定待处理流量数据为异常流量数据。
305、判断待处理流量数据属于白名单还是属于黑名单,如果属于白名单,则执行步骤306,如果属于黑名单,则结束本次流程。
本申请实施例中,如果待处理流量数据属于白名单,那么则说明待处理流量数据为黑 色产业用户伪装成白名单用户所产生的流量。
306、将待处理流量数据从白名单中删除并且将待处理流量数据加入黑名单。
307、当权重值不大于参考权重阈值时,确定待处理流量数据为正常流量数据。
308、判断待处理流量数据属于白名单还是属于黑名单,如果属于白名单,则执行步骤309,如果属于黑名单,则结束本次流程。
本申请实施例中,如果待处理流量数据属于黑名单,那么则说明待处理流量数据为误操作导致将白名单用户区分为黑名单用户所产生的流量。
309、将待处理流量数据从黑名单中删除并且将待处理流量数据加入所白名单。
实施图3所述的方法,能够通过对检测到的流量数据中的异常流量数据进行区分以甄别伪装成白名单用户的黑色产业用户,进而提高白名单用户发出的流量数据的纯净程度。
请参阅图4,图4是本申请实施例公开的又一种基于模型树算法的异常流量分析方法的流程示意图。其中,黑白名单由黑名单和白名单组成,黑名单包括至少一个待处理流量数据,白名单包括至少一个待处理流量数据。如图4所示该基于模型树算法的异常流量分析方法可以包括以下步骤:
步骤401~步骤403,针对步骤401~步骤403的描述,请参照实施例三中针对301~步骤303的详细描述,本申请实施例不再赘述。
404、以所有待处理流量数据为纵轴以及以待处理流量数据对应的权重值为横轴,获得流量数据分布图。
405、确定流量数据分布图中每个原始流量数据点的代价值,每个原始流量数据点的代价值用于表示流量数据分布图中每个原始流量数据点与拟合后的流量数据分布图中每个流量数据点的相似程度。
406、将所有原始流量数据的代价值中最小的代价值对应的原始流量数据点确定为目标拐点,并将目标拐点的纵坐标确定为参考权重阈值。
如图4所示该基于模型树算法的异常流量分析方法可以包括以下步骤:步骤407~步骤412。针对步骤407~步骤412的描述,请参照实施例三中针对步骤304~步骤309的详细描述,本申请实施例不再赘述。
作为示例,确定流量数据分布图中每个原始流量数据点的代价值,可以包括以下步骤:
分别将流量数据分布图中每个原始流量数据点确定为拐点,并将拐点左侧的所有点和右侧的所有点拟合为直线,计算流量数据分布图中除拐点之外的每个剩余原始流量数据点纵坐标分别与对应的直线上的每个目标流量数据点纵坐标的纵坐标差值,得到每个剩余原始流量数据点对应的纵坐标差值,并计算每个剩余原始流量数据点对应的纵坐标差值的平方和,得到每个原始流量数据的代价值。
实施图4所述的方法,能够通过对检测到的流量数据中的异常流量数据进行区分以甄别伪装成白名单用户的黑色产业用户,进而提高白名单用户发出的流量数据的纯净程度。
请参阅图5,图5是本申请实施例公开的一种基于模型树算法的异常流量分析装置的结构示意图。如图5所示,该基于模型树算法的异常流量分析装置可以包括:获取模块501、计算模块502以及第一确定模块503,其中,获取模块501被配置为:获取黑白名单中待处理流量数据的至少一个特征值;黑白名单包括至少一个待处理流量数据。
作为示例,在获取模块501获取黑白名单中待处理流量数据的至少一个特征值后,触发计算模块502启动。计算模块502被配置为:对特征值进行归一化处理,获得归一化特征值。
计算模块502还被配置为:根据初始权重值,利用迭代算法遍历所有归一化特征值,获得待处理流量数据对应的权重值;其中,权重值用于指示待处理流量数据的异常程度。
第一确定模块503被配置为:当权重值大于参考权重阈值时,确定待处理流量数据为异常流量数据。
可见,实施图5所描述的基于模型树算法的异常流量分析装置,能够通过对检测到的流量数据中的异常流量数据进行区分,以甄别伪装成白名单用户的黑色产业用户,进而提高白名单用户发出的流量数据的纯净程度。
请参阅图6,图6是本申请实施例公开的另一种基于模型树算法的异常流量分析装置的结构示意图。其中,黑白名单由黑名单和白名单组成,黑名单包括至少一个待处理流量数据,白名单包括至少一个待处理流量数据。图6所示的基于模型树算法的异常流量分析装置是由图5所示的基于模型树算法的异常流量分析装置进行优化得到的。与图5所示的基于模型树算法的异常流量分析装置相比较,图6所示的基于模型树算法的异常流量分析装置还可以包括:判断模块504、删除模块505以及第二确定模块506,其中,判断模块504用于在第一确定模块503确定待处理流量数据为异常流量数据之后,判断待处理流量数据属于白名单还是属于黑名单。本申请实施例中,如果待处理流量数据属于白名单,那么则说明待处理流量数据为黑色产业用户伪装成白名单用户所产生的流量。
删除模块505用于在判断模块504判断出待处理流量数据属于白名单之后,将待处理流量数据从白名单中删除并且将待处理流量数据加入黑名单。第二确定模块506用于当权重值不大于参考权重阈值时,确定待处理流量数据为正常流量数据。判断模块504还用于在第二确定模块506确定待处理流量数据为正常流量数据之后,判断待处理流量数据属于白名单还是属于黑名单。本申请实施例中,如果待处理流量数据属于黑名单,那么则说明待处理流量数据为误操作导致将白名单用户区分为黑名单用户所产生的流量。删除模块505还用于在判断模块504判断出待处理流量数据属于黑名单之后,将待处理流量数据从黑名单中删除并且将待处理流量数据加入白名单。
可见,实施图6所描述的基于模型树算法的异常流量分析装置能够通过对检测到的流量数据中的异常流量数据进行区分,以甄别伪装成白名单用户的黑色产业用户,进而提高白名单用户发出的流量数据的纯净程度。
请参阅图7,图7是本申请实施例公开的又一种基于模型树算法的异常流量分析装置的结构示意图。其中,黑白名单由黑名单和白名单组成,黑名单包括至少一个待处理流量数据,白名单包括至少一个待处理流量数据。图7所示的基于模型树算法的异常流量分析装置是由图6所示的基于模型树算法的异常流量分析装置进行优化得到的。与图6所示的基于模型树算法的异常流量分析装置相比较,图7所示的基于模型树算法的异常流量分析装置还可以包括:第三确定模块508和第四确定模块509,其中,获取模块501还用于在第一确定模块503确定待处理流量数据为异常流量数据之前,以所有待处理流量数据为纵轴以及以待处理流量数据对应的权重值为横轴,获得流量数据分布图。第三确定模块508用于确定流量数据分布图中每个原始流量数据点的代价值,每个原始流量数据点的代价值用于表示流量数据分布图中每个原始流量数据点与拟合后的流量数据分布图中每个流量数据点的相似程度。第四确定模块509用于将所有原始流量数据的代价值中最小的代价值对应的原始流量数据点确定为目标拐点,并将目标拐点的纵坐标确定为参考权重阈值。
可见,实施图7所描述的基于模型树算法的异常流量分析装置能够通过对检测到的流量数据中的异常流量数据进行区分,以甄别伪装成白名单用户的黑色产业用户,进而提高白名单用户发出的流量数据的纯净程度。
本申请还提供一种电子设备,包括处理器和用于存储计算机可读指令的存储器,其中,所述处理器在执行所述计算机可读指令时被配置为执行如前所示的基于模型树算法的异常流量分析方法。该电子设备可以是图1所示装置100。
在一示例性实施例中,本申请还提供一种非易失性可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现如前所示的基于模型树算法的异常流量分析方法。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可 以在不脱离其范围执行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (20)

  1. 一种基于模型树算法的异常流量分析方法,包括:
    获取黑白名单中待处理流量数据的至少一个特征值;所述黑白名单包括至少一个待处理流量数据;
    对所述特征值进行归一化处理,获得归一化特征值;
    根据初始权重值,利用迭代算法遍历所有所述归一化特征值,获得所述待处理流量数据对应的权重值;其中,所述权重值用于指示所述待处理流量数据的异常程度;
    当所述权重值大于参考权重阈值时,确定所述待处理流量数据为异常流量数据。
  2. 根据权利要求1所述的方法,其中所述黑白名单包括黑名单和白名单,所述黑名单包括至少一个所述待处理流量数据,所述白名单包括至少一个所述待处理流量数据;
    所述的确定所述待处理流量数据为异常流量数据之后,所述方法还包括:
    判断所述待处理流量数据属于所述白名单还是属于所述黑名单;
    如果所述待处理流量数据属于所述白名单,则将所述待处理流量数据从所述白名单中删除并且将所述待处理流量数据加入所述黑名单,所述方法还包括:
    当所述权重值不大于所述参考权重阈值时,确定所述待处理流量数据为正常流量数据;判断所述待处理流量数据属于所述白名单还是属于所述黑名单;
    如果所述待处理流量数据属于所述黑名单,则将所述待处理流量数据从所述黑名单中删除并且将所述待处理流量数据加入所述白名单。
  3. 根据权利要求1或2所述的方法,其中,所述的当所述权重值大于参考权重阈值时,确定所述待处理流量数据为异常流量数据之前,所述方法还包括:
    以所有所述待处理流量数据为纵轴以及以所述待处理流量数据对应的权重值为横轴,获得流量数据分布图;
    确定所述流量数据分布图中每个原始流量数据点的代价值,每个所述原始流量数据点的代价值用于表示所述流量数据分布图中每个原始流量数据点与拟合后的所述流量数据分布图中每个流量数据点的相似程度;
    将所有所述原始流量数据的代价值中最小的代价值对应的原始流量数据点确定为目标拐点,并将所述目标拐点的纵坐标确定为参考权重阈值。
  4. 根据权利要求3所述的方法,其中,所述的确定所述流量数据分布图中每个原始流量数据点的代价值,包括:
    分别将所述流量数据分布图中每个原始流量数据点确定为拐点,并将所述拐点左侧的所有点和右侧的所有点拟合为直线,计算所述流量数据分布图中除所述拐点之外的每个剩余原始流量数据点纵坐标分别与对应的所述直线上的每个目标流量数据点纵坐标的纵坐标差值,得到每个所述剩余原始流量数据点对应的纵坐标差值,并计算每个所述剩余原始流量数据点对应的纵坐标差值的平方和,得到每个所述原始流量数据的代价值。
  5. 根据权利要求1所述的方法,其中,所述的根据初始权重值,利用迭代算法遍历所有所述归一化特征值,获得所述待处理流量数据对应的权重值,包括:
    确定多个阈值,并根据每个所述阈值对每个所述待处理流量数据中的每个特征值分别进行判定,得到多个判定结果;
    根据初始权重值以及每个所述判定结果,确定每个所述判定结果对应的权重误差,并确定所有所述权重误差中最小的目标权重误差以及所述目标权重误差对应的目标阈值和目标特征值;以及根据所述目标阈值和所述目标特征,确定所述待处理流量数据的目标归属名单;如果所述目标归属名单与所述待处理流量数据的初始归属名单一致时,减少所述待处理流量数据的初始权重值;如果所述目标归属名单与所述初始归属名单不一致时,增加所述待处理流量数据的初始权重值;
    执行所述根据初始权重值以及每个所述判定结果,确定每个所述判定结果对应的权重误差,并确定所有所述权重误差中最小的目标权重误差以及所述目标权重误差对应的目标阈值和目标特征值;以及根据所述目标阈值和所述目标特征,确定所述待处理流量数据的目标归属名单;若所述目标归属名单与所述待处理流量数据的初始归属名单一致,减少所述待处理流量数据的初始权重值;若所述目标归属名单与所述初始归属名单不一致,增加所述待处理流量数据的初始权重值,直至所述初始权重值的变化次数达到预设次数阈值;当所述初始权重值的变化次数达到所述预设次数阈值时,根据当前的初始权重值、当前的目标阈值和当前的目标特征值,确定所述待处理流量数据对应的权重值。
  6. 根据权利要求5所述的方法,其中,所述的根据每个所述阈值对每个所述待处理流量数据中的每个特征值分别进行判定,得到多个判定结果,包括:
    根据多个所述阈值中的任一阈值对所述待处理流量数据中的任一特征值进行判定,得到判定结果;
    如果所述任一阈值大于所述任一特征值且所述任一待处理流量数据的初始归属名单为白名单,则将所述判定结果确定为正确判定;如果所述任一阈值小于所述任一特征值且所述任一待处理流量数据的所述初始归属名单为白名单,则将所述判定结果确定为错误判定;如果所述任一阈值大于所述任一特征值且所述任一待处理流量数据的所述初始属性为黑名单,则将所述判定结果确定为错误判定;如果所述任一阈值小于所述任一特征值且所述任一待处理流量数据的所述初始属性为黑名单,则将所述判定结果确定为正确判定。
  7. 根据权利要求5所述的方法,其中,所述的确定多个阈值,包括:
    在预设数值范围内确定初始数值,并计算预设步长的正整数倍与所述初始数值的和,得到多个目标数值;
    将所述初始数值以及所有所述目标数值确定为多个阈值,其中,任一所述目标数值均处于所述预设数值范围内。
  8. 一种基于模型树算法的异常流量分析装置,包括:
    获取模块,被配置为:获取黑白名单中待处理流量数据的至少一个特征值;所述黑白名单 包括至少一个待处理流量数据;
    计算模块,被配置为:对所述特征值进行归一化处理,获得归一化特征值;
    计算模块还被配置为:根据初始权重值,利用迭代算法遍历所有所述归一化特征值,获得所述待处理流量数据对应的权重值,所述权重值指示所述待处理流量数据的异常程度;第一确定模块,被配置为:当所述权重值大于参考权重阈值时,确定所述待处理流量数据为异常流量数据。
  9. 一种电子设备,包括:处理器和用于存储计算机可读指令的存储器,其中,所述处理器在执行所述计算机可读指令时被配置为执行一种基于模型树算法的异常流量分析方法,所述方法包括:
    获取黑白名单中待处理流量数据的至少一个特征值,所述黑白名单包括至少一个待处理流量数据;
    对所述特征值进行归一化处理,获得归一化特征值;
    根据初始权重值,利用迭代算法遍历所有所述归一化特征值,获得所述待处理流量数据对应的权重值;其中,所述权重值用于指示所述待处理流量数据的异常程度;
    当所述权重值大于参考权重阈值时,确定所述待处理流量数据为异常流量数据。
  10. 根据权利要求9所述的电子设备,其中,所述黑白名单包括黑名单和白名单,所述黑名单包括至少一个所述待处理流量数据,所述白名单包括至少一个所述待处理流量数据;所述的确定所述待处理流量数据为异常流量数据之后,所述方法还包括:
    判断所述待处理流量数据属于所述白名单还是属于所述黑名单;
    如果所述待处理流量数据属于所述白名单,则将所述待处理流量数据从所述白名单中删除并且将所述待处理流量数据加入所述黑名单;
    所述方法还包括:
    当所述权重值不大于所述参考权重阈值时,确定所述待处理流量数据为正常流量数据;判断所述待处理流量数据属于所述白名单还是属于所述黑名单;
    如果所述待处理流量数据属于所述黑名单,则将所述待处理流量数据从所述黑名单中删除并且将所述待处理流量数据加入所述白名单。
  11. 根据权利要求9或10所述的电子设备,其中,所述的当所述权重值大于参考权重阈值时,确定所述待处理流量数据为异常流量数据之前,所述方法还包括:
    以所有所述待处理流量数据为纵轴以及以所述待处理流量数据对应的权重值为横轴,获得流量数据分布图;
    确定所述流量数据分布图中每个原始流量数据点的代价值,每个所述原始流量数据点的代价值用于表示所述流量数据分布图中每个原始流量数据点与拟合后的所述流量数据分布图中每个流量数据点的相似程度;
    将所有所述原始流量数据的代价值中最小的代价值对应的原始流量数据点确定为目标拐点,并将所述目标拐点的纵坐标确定为参考权重阈值。
  12. 根据权利要求11所述的电子设备,其中,所述的确定所述流量数据分布图中每个原始流量数据点的代价值,包括:
    分别将所述流量数据分布图中每个原始流量数据点确定为拐点,并将所述拐点左侧的所有点和右侧的所有点拟合为直线,计算所述流量数据分布图中除所述拐点之外的每个剩余原始流量数据点纵坐标分别与对应的所述直线上的每个目标流量数据点纵坐标的纵坐标差值,得到每个所述剩余原始流量数据点对应的纵坐标差值,并计算每个所述剩余原始流量数据点对应的纵坐标差值的平方和,得到每个所述原始流量数据的代价值。
  13. 根据权利要求9所述的电子设备,其特征在于,其中,所述的根据初始权重值,利用迭代算法遍历所有所述归一化特征值,获得所述待处理流量数据对应的权重值,包括:确定多个阈值,并根据每个所述阈值对每个所述待处理流量数据中的每个特征值分别进行判定,得到多个判定结果;
    根据初始权重值以及每个所述判定结果,确定每个所述判定结果对应的权重误差,并确定所有所述权重误差中最小的目标权重误差以及所述目标权重误差对应的目标阈值和目标特征值;以及根据所述目标阈值和所述目标特征,确定所述待处理流量数据的目标归属名单;如果所述目标归属名单与所述待处理流量数据的初始归属名单一致时,减少将所述待处理流量数据的初始权重值;如果所述目标归属名单与所述初始归属名单不一致时,增加将所述待处理流量数据的初始权重值;
    执行所述的根据初始权重值以及每个所述判定结果,确定每个所述判定结果对应的权重误差,并确定所有所述权重误差中最小的目标权重误差以及所述目标权重误差对应的目标阈值和目标特征值;以及根据所述目标阈值和所述目标特征,确定所述待处理流量数据的目标归属名单;如果所述目标归属名单与所述待处理流量数据的初始归属名单一致时,减少将所述待处理流量数据的初始权重值;如果所述目标归属名单与所述初始归属名单不一致时,增加将所述待处理流量数据的初始权重值,直至所述初始权重值的变化次数达到预设次数阈值;
    当所述初始权重值的变化次数达到所述预设次数阈值时,根据当前的初始权重值、当前的目标阈值和当前的目标特征值,确定所述待处理流量数据对应的权重值。
  14. 一种非易失性可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时使得所述处理器被配置为执行一种基于模型树算法的异常流量分析方法,所述方法包括:
    获取黑白名单中待处理流量数据的至少一个特征值;所述黑白名单包括至少一个待处理流量数据;
    对所述特征值进行归一化处理,获得归一化特征值;
    根据初始权重值,利用迭代算法遍历所有所述归一化特征值,获得所述待处理流量数据对应的权重值;其中,所述权重值用于指示所述待处理流量数据的异常程度;
    当所述权重值大于参考权重阈值时,确定所述待处理流量数据为异常流量数据。
  15. 根据权利要求14所述的非易失性可读存储介质,其中,所述黑白名单被处理器执行时使得所述处理器被配置为由黑名单和白名单组成,所述黑名单被处理器执行时使得所述处理器被配置为:包括至少一个所述待处理流量数据,所述白名单被处理器执行时使得所述处理器被配置为:包括至少一个所述待处理流量数据;
    所述的确定所述待处理流量数据为异常流量数据之后,还被处理器执行时使得所述处理器被配置为:
    判断所述待处理流量数据属于所述白名单还是属于所述黑名单;
    如果所述待处理流量数据属于所述白名单,则将所述待处理流量数据从所述白名单中删除并且将所述待处理流量数据加入所述黑名单;
    还被处理器执行时使得所述处理器被配置为:
    当所述权重值不大于所述参考权重阈值时,确定所述待处理流量数据为正常流量数据;判断所述待处理流量数据属于所述白名单还是属于所述黑名单;
    如果所述待处理流量数据属于所述黑名单,则将所述待处理流量数据从所述黑名单中删除并且将所述待处理流量数据加入所述白名单。
  16. 根据权利要求14或15所述的非易失性可读存储介质,其中,所述的当所述权重值大于参考权重阈值时,确定所述待处理流量数据为异常流量数据之前,被处理器执行时使得所述处理器被配置为:
    以所有所述待处理流量数据为纵轴以及以所述待处理流量数据对应的权重值为横轴,获得流量数据分布图;
    确定所述流量数据分布图中每个原始流量数据点的代价值,每个所述原始流量数据点的代价值用于表示所述流量数据分布图中每个原始流量数据点与拟合后的所述流量数据分布图中每个流量数据点的相似程度;
    将所有所述原始流量数据的代价值中最小的代价值对应的原始流量数据点确定为目标拐点,并将所述目标拐点的纵坐标确定为参考权重阈值。
  17. 根据权利要求16所述的非易失性可读存储介质,其中,所述的确定所述流量数据分布图中每个原始流量数据点的代价值,被处理器执行时使得所述处理器被配置为:
    分别将所述流量数据分布图中每个原始流量数据点确定为拐点,并将所述拐点左侧的所有点和右侧的所有点拟合为直线,计算所述流量数据分布图中除所述拐点之外的每个剩余原始流量数据点纵坐标分别与对应的所述直线上的每个目标流量数据点纵坐标的纵坐标差值,得到每个所述剩余原始流量数据点对应的纵坐标差值,并计算每个所述剩余原始流量数据点对应的纵坐标差值的平方和,得到每个所述原始流量数据的代价值。
  18. 根据权利要求14所述的非易失性可读存储介质,其中,所述的根据初始权重值,利用迭代算法遍历所有所述归一化特征值,获得所述待处理流量数据对应的权重值,被处理器执行时使得所述处理器被配置为:
    确定多个阈值,并根据每个所述阈值对每个所述待处理流量数据中的每个特征值分别进行 判定,得到多个判定结果;
    根据初始权重值以及每个所述判定结果,确定每个所述判定结果对应的权重误差,并确定所有所述权重误差中最小的目标权重误差以及所述目标权重误差对应的目标阈值和目标特征值;以及根据所述目标阈值和所述目标特征,确定所述待处理流量数据的目标归属名单;如果所述目标归属名单与所述待处理流量数据的初始归属名单一致时,减少所述待处理流量数据的初始权重值;如果所述目标归属名单与所述初始归属名单不一致时,增加所述待处理流量数据的初始权重值;
    执行所述的根据初始权重值以及每个所述判定结果,确定每个所述判定结果对应的权重误差,并确定所有所述权重误差中最小的目标权重误差以及所述目标权重误差对应的目标阈值和目标特征值;以及根据所述目标阈值和所述目标特征,确定所述待处理流量数据的目标归属名单;如果所述目标归属名单与所述待处理流量数据的初始归属名单一致时,减少所述待处理流量数据的初始权重值;如果所述目标归属名单与所述初始归属名单不一致时,增加所述待处理流量数据的初始权重值,直至所述初始权重值的变化次数达到预设次数阈值;
    当所述初始权重值的变化次数达到所述预设次数阈值时,根据当前的初始权重值、当前的目标阈值和当前的目标特征值,确定所述待处理流量数据对应的权重值。
  19. 根据权利要求18所述的非易失性可读存储介质,其中,所述的根据每个所述阈值对每个所述待处理流量数据中的每个特征值分别进行判定,得到多个判定结果,被处理器执行时使得所述处理器被配置为:
    根据多个所述阈值中的任一阈值对所述待处理流量数据中的任一特征值进行判定,得到判定结果;
    如果所述任一阈值大于所述任一特征值且所述任一待处理流量数据的初始归属名单为所述白名单,则将所述判定结果确定为正确判定;如果所述任一阈值小于所述任一特征值且所述任一待处理流量数据的所述初始归属名单为所述白名单,则将所述判定结果确定为错误判定;如果所述任一阈值大于所述任一特征值且所述任一待处理流量数据的所述初始属性为所述黑名单,则将所述判定结果确定为错误判定;如果所述任一阈值小于所述任一特征值且所述任一待处理流量数据的所述初始属性为所述黑名单,则将所述判定结果确定为正确判定。
  20. 根据权利要求18所述的非易失性可读存储介质,其中,所述的确定多个阈值,被处理器执行时使得所述处理器被配置为:
    在预设数值范围内确定初始数值,并计算预设步长的正整数倍与所述初始数值的和,得到多个目标数值;
    将所述初始数值以及所有所述目标数值确定为多个阈值,其中,任一所述目标数值均处于所述预设数值范围内。
PCT/CN2019/079034 2018-09-25 2019-03-21 基于模型树算法的异常流量分析方法、装置、电子设备及非易失性可读存储介质 WO2020062803A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811120226.X 2018-09-25
CN201811120226.XA CN109257354B (zh) 2018-09-25 2018-09-25 基于模型树算法的异常流量分析方法及装置、电子设备

Publications (1)

Publication Number Publication Date
WO2020062803A1 true WO2020062803A1 (zh) 2020-04-02

Family

ID=65048085

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/079034 WO2020062803A1 (zh) 2018-09-25 2019-03-21 基于模型树算法的异常流量分析方法、装置、电子设备及非易失性可读存储介质

Country Status (2)

Country Link
CN (1) CN109257354B (zh)
WO (1) WO2020062803A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220741A (zh) * 2021-04-29 2021-08-06 北京华艺世嘉网络有限公司 互联网广告虚假流量识别方法、系统、设备及存储介质
CN113837318A (zh) * 2021-10-20 2021-12-24 北京明略软件系统有限公司 流量判定方案的确定方法和装置、电子设备和存储介质
CN115795482A (zh) * 2023-01-06 2023-03-14 杭州中电安科现代科技有限公司 一种工控设备安全的管理方法、装置、设备及介质
CN117927459A (zh) * 2024-03-25 2024-04-26 陕西中环机械有限责任公司 一种注浆泵注浆流量优化控制方法
CN117991172A (zh) * 2024-04-03 2024-05-07 山东德源电力科技股份有限公司 一种具有故障识别功能的电压互感器

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257354B (zh) * 2018-09-25 2021-11-12 平安科技(深圳)有限公司 基于模型树算法的异常流量分析方法及装置、电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120090027A1 (en) * 2010-10-12 2012-04-12 Electronics And Telecommunications Research Institute Apparatus and method for detecting abnormal host based on session monitoring
CN103117903A (zh) * 2013-02-07 2013-05-22 中国联合网络通信集团有限公司 上网流量异常检测方法及装置
CN106713324A (zh) * 2016-12-28 2017-05-24 北京奇艺世纪科技有限公司 一种流量检测方法及装置
CN108269012A (zh) * 2018-01-12 2018-07-10 中国平安人寿保险股份有限公司 风险评分模型的构建方法、装置、存储介质及终端
CN109257354A (zh) * 2018-09-25 2019-01-22 平安科技(深圳)有限公司 基于模型树算法的异常流量分析方法及装置、电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292186B (zh) * 2016-03-31 2021-01-12 阿里巴巴集团控股有限公司 一种基于随机森林的模型训练方法和装置
CN108243271A (zh) * 2016-12-23 2018-07-03 北京安云世纪科技有限公司 一种进行流量控制的方法、装置以及移动设备
CN108287996A (zh) * 2018-01-08 2018-07-17 北京工业大学 一种恶意代码混淆特征清洗方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120090027A1 (en) * 2010-10-12 2012-04-12 Electronics And Telecommunications Research Institute Apparatus and method for detecting abnormal host based on session monitoring
CN103117903A (zh) * 2013-02-07 2013-05-22 中国联合网络通信集团有限公司 上网流量异常检测方法及装置
CN106713324A (zh) * 2016-12-28 2017-05-24 北京奇艺世纪科技有限公司 一种流量检测方法及装置
CN108269012A (zh) * 2018-01-12 2018-07-10 中国平安人寿保险股份有限公司 风险评分模型的构建方法、装置、存储介质及终端
CN109257354A (zh) * 2018-09-25 2019-01-22 平安科技(深圳)有限公司 基于模型树算法的异常流量分析方法及装置、电子设备

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220741A (zh) * 2021-04-29 2021-08-06 北京华艺世嘉网络有限公司 互联网广告虚假流量识别方法、系统、设备及存储介质
CN113220741B (zh) * 2021-04-29 2024-04-05 北京华艺世嘉网络有限公司 互联网广告虚假流量识别方法、系统、设备及存储介质
CN113837318A (zh) * 2021-10-20 2021-12-24 北京明略软件系统有限公司 流量判定方案的确定方法和装置、电子设备和存储介质
CN115795482A (zh) * 2023-01-06 2023-03-14 杭州中电安科现代科技有限公司 一种工控设备安全的管理方法、装置、设备及介质
CN117927459A (zh) * 2024-03-25 2024-04-26 陕西中环机械有限责任公司 一种注浆泵注浆流量优化控制方法
CN117927459B (zh) * 2024-03-25 2024-06-11 陕西中环机械有限责任公司 一种注浆泵注浆流量优化控制方法
CN117991172A (zh) * 2024-04-03 2024-05-07 山东德源电力科技股份有限公司 一种具有故障识别功能的电压互感器
CN117991172B (zh) * 2024-04-03 2024-06-11 山东德源电力科技股份有限公司 一种具有故障识别功能的电压互感器

Also Published As

Publication number Publication date
CN109257354A (zh) 2019-01-22
CN109257354B (zh) 2021-11-12

Similar Documents

Publication Publication Date Title
WO2020062803A1 (zh) 基于模型树算法的异常流量分析方法、装置、电子设备及非易失性可读存储介质
US11367075B2 (en) Method, apparatus and electronic device for identifying risks pertaining to transactions to be processed
EP2960823B1 (en) Method, device and system for managing authority
CN107102746B (zh) 候选词生成方法、装置以及用于候选词生成的装置
CN108629354B (zh) 目标检测方法及装置
WO2017020514A1 (zh) 图片场景判定方法、装置以及服务器
US9787685B2 (en) Methods, devices and systems for managing authority
CN108614970B (zh) 病毒程序的检测方法、模型训练方法、装置及设备
US10749881B2 (en) Comparing unsupervised algorithms for anomaly detection
JP2018528517A (ja) 詐欺的ソフトウェアプロモーションを検出するための方法、装置、及びシステム
CN110717509B (zh) 基于树分裂算法的数据样本分析方法及装置
CN110162956B (zh) 确定关联账户的方法和装置
CN111539443A (zh) 一种图像识别模型训练方法及装置、存储介质
US20180232665A1 (en) User score model training and calculation
CN111428032A (zh) 内容质量评价方法及装置、电子设备、存储介质
WO2019001170A1 (zh) 一种智能设备执行任务的方法和装置
CN112884040B (zh) 训练样本数据的优化方法、系统、存储介质及电子设备
US20240086736A1 (en) Fault detection and mitigation for aggregate models using artificial intelligence
CN111753539B (zh) 一种识别敏感文本的方法及装置
US11373038B2 (en) Method and terminal for performing word segmentation on text information, and storage medium
CN109800784B (zh) 基于神经网络的合同核对方法及装置
CN109525548B (zh) 一种基于代价函数的白名单更新方法、装置及电子设备
CN109284307B (zh) 一种流量数据的聚类处理方法、装置及电子设备
CN109726550B (zh) 异常操作行为检测方法、装置及计算机可读存储介质
CN116127353A (zh) 分类方法、分类模型训练方法、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19866088

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19866088

Country of ref document: EP

Kind code of ref document: A1