WO2020258657A1 - Abnormality detection method and apparatus, computer device and storage medium - Google Patents

Abnormality detection method and apparatus, computer device and storage medium Download PDF

Info

Publication number
WO2020258657A1
WO2020258657A1 PCT/CN2019/117607 CN2019117607W WO2020258657A1 WO 2020258657 A1 WO2020258657 A1 WO 2020258657A1 CN 2019117607 W CN2019117607 W CN 2019117607W WO 2020258657 A1 WO2020258657 A1 WO 2020258657A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection model
combined
detection
data
negative
Prior art date
Application number
PCT/CN2019/117607
Other languages
French (fr)
Chinese (zh)
Inventor
黎立桂
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020258657A1 publication Critical patent/WO2020258657A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Definitions

  • the present invention relates to the field of computer application technology. Specifically, the present invention relates to an abnormality detection method, device, computer equipment and storage medium.
  • Abnormal user behavior refers to "abnormal" behavior that violates the social civilized norms or group behavior habits and standards. Especially with the improvement of people's awareness of public safety and network safety, there is an increasing focus on the detection of abnormal behavior in crowd scenes, networks and other environments.
  • the detection of user behavior abnormality usually performs matching detection based on the characteristics of individual abnormal behavior, or comparison detection based on the characteristics of individual normal behavior.
  • the attributes of the samples are basically nominal attributes, only a few attributes such as resolution are numerical. Complicated text-based device data and incomprehensible nominal attribute data make it difficult to dig out effective classification features, and thus a good anomaly detection model cannot be obtained, resulting in low anomaly detection accuracy.
  • the purpose of the present invention is to solve at least one of the above-mentioned technical defects, and to disclose an abnormality detection method, device, computer equipment, and storage medium, which can comprehensively obtain cursor trigger data to accurately identify abnormal cursor trigger data.
  • an abnormality detection method including:
  • operation terminal data when the user performs registration or verification, where the operation terminal data is combined data including two or more of device type, system information, and IP address;
  • an abnormality detection device including:
  • Obtaining module configured to perform obtaining operation terminal data when the user is registered or authenticated, where the operation terminal data is combined data including two or more of device type, system information and IP address, System information includes system type, version number and resolution;
  • Processing module configured to execute input of the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, each of which The detection models all output corresponding sub-results, and multiple said sub-results generate combined result information;
  • Execution module configured to perform voting on multiple sub-results in the combined result information according to preset rules to obtain final result information.
  • the present application discloses a computer device including a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor executes the foregoing Any of the steps of the abnormality detection method.
  • the present application discloses a storage medium storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors execute any of the above Steps of anomaly detection method.
  • the beneficial effects of the present invention are: the abnormality detection method and device disclosed in this application, by decomposing complex text-type device data, adopting an effective feature conversion method, and combining the incomprehensible multiple nominal attribute data with sample distribution, Converted into a 0-1 binary combination feature, generate a discriminative combination feature set, and mine an effective classification feature set.
  • This feature set can be used for model training to obtain a better anomaly detection model.
  • five An algorithm is used to construct the detection model under the Bagging strategy. According to the Bagging strategy, multiple models are constructed for anomaly detection, Naive Bayes gives a comprehensive probability from the overall distribution of the sample.
  • OneClassSVM and Isolation Forest respectively give the test results of the samples from the normal and abnormal aspects.
  • Figure 1 is a schematic diagram of an abnormality detection method of the present invention
  • Figure 2 is a flowchart of the training method of the combined detection model of the present invention.
  • Figure 3 is a flow chart of the method for obtaining sample data to construct a combined feature set according to the present invention
  • Figure 4 is a flowchart of the method for obtaining final result information according to the present invention.
  • Figure 5 is a schematic structural diagram of an abnormality detection device of the present invention.
  • Figure 6 is a block diagram of the basic structure of the computer equipment of the present invention.
  • terminal and “terminal equipment” used herein include both wireless signal receiver equipment, equipment that only has wireless signal receivers without transmitting capability, and equipment receiving and transmitting hardware.
  • a device which has a device capable of performing two-way communication receiving and transmitting hardware on a two-way communication link.
  • Such equipment may include: cellular or other communication equipment, which has a single-line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service, personal communication system), which can combine voice, data processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant, personal digital assistant), which can include radio frequency receivers, pagers, Internet/Intranet access, web browsers, notepads, calendars, and/or GPS (Global Positioning System (Global Positioning System) receiver; conventional laptop and/or palmtop computer or other device, which has and/or includes a radio frequency receiver, conventional laptop and/or palmtop computer or other device.
  • GPS Global Positioning System (Global Positioning System) receiver
  • conventional laptop and/or palmtop computer or other device which has and/or includes a radio frequency receiver, conventional laptop and/or palmtop computer or other device.
  • terminal and terminal equipment used here may be portable, transportable, installed in vehicles (aviation, sea and/or land), or suitable and/or configured to operate locally, and/or In a distributed form, it runs on the earth and/or any other location in space.
  • the "terminal” and “terminal equipment” used here can also be communication terminals, internet terminals, music/video playback terminals, such as PDA, MID (Mobile Internet Device, mobile Internet device) and/or mobile phone with music/video playback function, it can also be a smart TV, set-top box and other devices.
  • the present invention discloses an abnormality detection method, including:
  • the technical solution of the present application is mainly used to verify the detection of abnormal behaviors of user operations, especially to monitor abnormal operations during the verification process when the user registers a new account or logs in.
  • the operating terminal data includes at least two or more combination information of device type, system information, and IP address.
  • the combination information can be three types of data: device type, system type, and IP address, four types of data: device type, system type, version number, and IP address, or device type, system type, version number,
  • the five resolutions and IP addresses can also be other data and any combination of these data.
  • the obtained operation terminal data is combined information, which includes at least two or more of the device type, system information, and IP address, and input these combined information into the combined detection model for detection, namely The corresponding combination result information can be obtained.
  • the combined detection model includes at least two detection models, and the output results of each detection model are independent of each other. Therefore, at least two sets of result information are output for the combined information.
  • the combination information includes three types of device type, system type, and IP address.
  • the combined detection model includes five types: A, B, C, D, and E.
  • Each detection model is independent of each other, so five sets of device-specific
  • the result information of the combination information of type, system type and IP address for example, the result information is (A1, A2, A3), (B1, B2, B3), (C1, C2, C3), (D1, D2, D3) And (E1, E2, E3), the number 1 represents the detection result information of the device type, the number 2 represents the detection result of the system type, and the number 3 represents the detection result of the IP address.
  • Multiple detection models in the combined detection module respectively output corresponding sub-results for the same operation terminal data to generate combined result information, and then vote on the sub-results in the combined result information according to certain rules to obtain final result information.
  • Certain rules disclosed here include but are not limited to selecting the same number of sub-results as the final result.
  • This application obtains the user's operating terminal data and extracts multiple data from it as combined data for identification.
  • the combined data makes the judgment result more accurate.
  • the detection model for identifying combined data is also a combined detection model, which is trained through a variety of training methods
  • the different detection models of the company recognize the unified combined data, obtain the final result information by voting, and comprehensively judge whether the registration and verification of the user are abnormal, which effectively avoids only using the normal sample with a large amount of data to train a single detection
  • the one-sided problem of the model also reduces the inaccuracy of a single detection model due to sample imbalance and improves the accuracy of anomaly detection.
  • the detection model in the combined detection model includes: Naive Bayes detection model, OneClassSVM detection model of positive class, OneClassSVM detection model of negative class, isolated forest classification and detection model of positive class, and isolated forest classification and detection model of negative class.
  • Bayes (Naive Bayes Model) detection model is a classification algorithm, and uses Bayes' theorem. In addition, it is a generative model (generative model), using the method of directly modeling the joint probability P(x,c) to obtain the target probability value.
  • Bayes' theorem describes the possibility of an event. This possibility is based on the knowledge of some situations related to the event in advance.
  • the mathematical formula is used to express Bayes' theorem:
  • c represents a situation where a random event occurs.
  • x stands for evidence ⁇ condition, which generally refers to factors related to random events.
  • P(c) (without considering relevant factors) the probability of occurrence of c in a random event (prior probability).
  • c) The probability of the occurrence of condition x (posterior probability) under the condition of known event occurrence c.
  • P(x) The probability of occurrence of x (prior probability).
  • the OneClassSVM detection model means that there are only positive samples and negative samples in the training data. Those that meet the requirements are positive samples, and all others that do not meet the requirements are negative samples.
  • Use One-Class SVM it has the ability to capture the shape of the data set, so it has a better effect on strong non-Gaussian data, such as two completely separate data sets. Strictly speaking, the one-class SVM is not an abnormal point detection algorithm, but a singular point detection algorithm: its training set cannot contain abnormal samples, otherwise, it may affect the selection of the boundary during training.
  • the OneClassSVM detection model includes the OneClassSVM detection model of the positive class and the OneClassSVM detection model of the negative class. The OneClassSVM detection model of the positive class is only given positive samples for training, while the OneClassSVM detection model of the negative class is only given negative samples. Conduct training.
  • Isolated forest classification and detection model is a fast anomaly detection method based on Ensemble, with linear time complexity and high accuracy, and is a state-of-the-art algorithm that meets the requirements of big data processing. Applicable and continuous data numerical data). Anomaly detection is defined as "more likely to be Separated)"-can be understood as a point that is sparsely distributed and far away from a high-density group. To explain with statistics, in the data space, a sparsely distributed area means that the probability of data occurring in this area is very low, so it can be considered The data falling in these areas is abnormal.
  • the isolated forest classification and detection model also includes the positive isolated forest classification and detection model and the negative isolated forest classification and detection model, where the positive isolated forest classification The detection model is trained through positive samples, and the isolated forest classification and detection model of the negative class is trained through negative samples.
  • the training method of the combined detection model composed of the above five detection models includes:
  • the positive samples disclosed above are selected according to the possible recognition purpose and meet the purpose of data information.
  • the expression of these data information can be text, numbers, strings, pictures, sounds, etc.
  • the face of this application is that the user detects abnormal user input behavior, which is judged by the device type, system information, and IP address of the user client. Therefore, in this application, the positive sample refers to the legal device type, system Information and IP address, such as legal device types including mobile phone, PC, tablet and computer.
  • the login and registration information is from the above-mentioned public device type, it is a positive sample.
  • the device data of the smart bracelet is a negative sample. These sample data are obtained through collection.
  • the method of obtaining sample data to construct a combined feature set includes:
  • the sample data comes from different ways of obtaining, for example, obtained by crawling algorithm, obtained by device detection, and obtained from registration or verification information sent by the user.
  • Obtaining by crawler algorithm is to compile a piece of crawler code to monitor the user's login and obtain all the terminal data of the user during registration or verification.
  • the data collected in this process includes the final registration information and verification information, as well as intermediate Information, such as whether it was intercepted during transmission, etc.
  • the device is detected as data recognized by the client itself, that is, on the client, after the registration information or verification information is input through the input tool, before the final transmission, the registration information or verification information monitored by the input tool on the client ;
  • the registration or verification information sent by the user is the registration or verification information sent by the user through the client and received by the back-end server.
  • the data obtained through device detection is the original data input by the user
  • the data obtained through the crawler algorithm is the data in the process of sending the original data from the client to the server
  • the data sent from the user is the data received by the server.
  • the original data is monitored in three stages from data input, transmission to reception, which can ensure data consistency. As long as the data obtained by comparison is inconsistent in any link, it means that the data registered or verified by the user is abnormal.
  • the degree of support here reveals the probability that the data obtained by the above methods will appear at the same time.
  • the confidence degree indicates the credibility of the data obtained by the above methods.
  • the accuracy Through verification, the accuracy of various sample data can be obtained. According to the accuracy Set a value for each method to represent its confidence. The higher the threshold, the more credible the data obtained in this way.
  • Each method has a value to indicate its confidence.
  • the user registration or verification data is obtained through three methods: crawler algorithm acquisition, device detection acquisition, and registration or verification information sent by users. Through previous data comparison and calculation, a confidence level can be set for each of the three methods.
  • the confidence level of the data obtained through the crawler algorithm is A
  • the confidence level of the data obtained through the device detection method is B
  • the confidence level of the registration or verification information sent by the user is C.
  • Support(A->B) P(A U B).
  • Support reveals the probability of A and B appearing at the same time. If the probability of A and B appearing at the same time is small, it means that the relationship between A and B is not great; if A and B appear very frequently at the same time, it means that A and B are always related.
  • Confidence formula: Confidence(A->B) P(A
  • the operating terminal data can be obtained from a variety of ways, comparing the operating terminal data with the benchmark data, and marking according to the first rule, a set of feature data can be obtained, and this set of feature data is a feature set.
  • the first rule is that the data in the operating terminal data that is the same as the reference data is marked as 1 as a positive sample, and the data that is different from the reference data is marked as 0 as a negative sample.
  • the above multiple sets of operation terminal data constitute a feature set consisting of 0 or 1.
  • the Naive The Bayes detection model learns the identification of positive samples and negative samples at the same time; the OneClassSVM detection model of the positive class and the isolated forest classification and detection model of the positive class learn the identification of positive samples; the OneClassSVM detection model of the negative class and the negative class The isolated forest classification and detection model learns to identify negative samples.
  • the Bayes detection model is a classification algorithm that recognizes positive samples and negative samples separately. For example, when inputting data that needs to be recognized, the output is positive or negative. Among them, when the same as the positive sample, it is the positive class. Output 1, when it is the same as the negative sample, it is a negative type, and -1 is output. Because in Naive In the Bayes detection model, both positive sample training and negative sample training are carried out, so the positive output and negative output can be obtained more accurately.
  • the OneClassSVM detection model of the positive class is mainly trained by positive samples, so the output of the positive class is more accurate, and the OneClassSVM detection model of the negative class is mainly trained by negative samples, so the output of the negative class is more accurate.
  • the positive class is isolated
  • the forest classification and detection model is mainly accurate for the output of the positive class, and the isolated forest classification and detection model of the negative class is more accurate for the output of the negative class.
  • the combination result information is voted to obtain the final result information.
  • the multiple sub-results in the combination result information are preset
  • the rules for voting to obtain the final result information include:
  • S3100 Vote multiple sub-results in the obtained combined result information according to the Bagging strategy
  • the operating terminal data is obtained through crawler algorithms, device detection, and registration or verification information sent by users.
  • the data obtained by each method may be the same or different. This leads to the diversity of the operating terminal data, and the operating terminal data is input into the combined detection model for detection, and the combined result information is obtained.
  • the detection results of the combined detection model are independent of each other, and different detection models have different training principles. , The training data may also be different. Each detection model has its own characteristics, so the combined result information obtained may also be different. Combine the detection results obtained by different detection models to obtain the combined result information. Vote the obtained combination result information according to the Bagging strategy.
  • Bagging is also called self-aggregation, which is a technique of repeatedly sampling (with replacement) from data according to a uniform probability distribution.
  • self-aggregation On the self-service sample set generated by each sample, train a base classifier; vote on the trained classifier, and assign the test sample to the class with the highest vote.
  • Each self-service sample set is as large as the original data. With replacement sampling, some samples may appear multiple times in the same training set, and some may be ignored.
  • the one with the most identical data is selected as the final result.
  • the acquired operating terminal data are: device type, system type, version number, resolution, and IP address.
  • the combined result information obtained by the five detection models is as follows: Data 1 Data 2 Data 3 Data 4 Data 5 Naive Bayes detection model 1 1 0 0 1 Positive OneClassSVM detection model 1 0 1 0 1 1 Negative class OneClassSVM detection model 1 1 0 1 1 Isolated forest points 0 0 0 0 1 Class and detection model Negative classification and detection model of isolated forest 1 1 0 1 0 Final Results 1 1 0 0 1
  • the five detection models output five sets of data. Since each set of data is the data formed by comparing the output results of the operating terminal data with the benchmark data, and marking them according to the first rule, each set of data All have uniformity, that is, data that is summarized as 0 or 1, which facilitates comparative voting. From the above five sets of data, it can be seen that for data 1, the number of "1"s is the largest, so the final result of data 1 is “1”, the final result of data 2 is "1", the final result of data 3 is "0”, the final result of data 4 is "0", the final result of data 5 is "1", so the final result is "1” , 1, 0, 0, 1".
  • This application discloses an abnormality detection device, including:
  • Obtaining module 1000 configured to execute operation terminal data when registering or verifying by acquiring a user, where the operation terminal data is combined data including two or more of device type, system information and IP address, The system information includes the system type, version number, and resolution;
  • the processing module 2000 is configured to execute the input of the operating terminal data into the combined detection model for detection to obtain combined result information, wherein the combined detection model It includes two or more detection models, each of the detection models outputs a corresponding sub-result, and a plurality of the sub-results generate combined result information;
  • the execution module 3000 configured to execute the combination result information A number of sub-results are voted according to preset rules to obtain the final result information.
  • the detection model in the combined detection model includes: Naive The Bayes detection model, the OneClassSVM detection model of the positive class, the OneClassSVM detection model of the negative class, the isolated forest classification and detection model of the positive class, and the isolated forest classification and detection model of the negative class, the Naive The Bayes detection model learns the identification of positive samples and negative samples at the same time; the OneClassSVM detection model of the positive class and the isolated forest classification and detection model of the positive class learn the identification of positive samples; the OneClassSVM detection model of the negative class and the negative class The isolated forest classification and detection model learns to identify negative samples.
  • the processing module further includes: a feature set construction module: configured to obtain sample data to construct a combined feature set, wherein the combined feature set includes positive samples and negative samples.
  • a feature set construction module configured to obtain sample data to construct a combined feature set, wherein the combined feature set includes positive samples and negative samples.
  • the feature set construction module further includes: a sample acquisition module: configured to execute operation terminal data obtained through at least two acquisition methods during user registration or verification as sample data, wherein the acquisition method Including acquisition through crawler algorithm, device detection, and acquisition from registration or verification information sent by users; calculation module: configured to perform calculations on the support and confidence of sample data acquired by each acquisition method; first selection module : Configured to perform selection of the combination of the operating terminal data with the highest degree of support and confidence as the reference data; marking module: configured to perform the combination of the operating terminal data acquired by each of the acquisition methods and the reference data The comparison result of is marked according to the first rule to form a feature set.
  • a sample acquisition module configured to execute operation terminal data obtained through at least two acquisition methods during user registration or verification as sample data, wherein the acquisition method Including acquisition through crawler algorithm, device detection, and acquisition from registration or verification information sent by users
  • calculation module configured to perform calculations on the support and confidence of sample data acquired by each acquisition method
  • first selection module Configured to perform selection of the combination of the operating terminal data with the highest
  • the method for obtaining operation terminal data when the user performs registration or verification includes: obtaining by crawling algorithm, obtaining by device detection, and obtaining from registration or verification information sent by the user.
  • the first rule is: the data in the operating terminal data that is the same as the reference data is marked as 1, as a positive sample, and the data that is different from the reference data is marked as 0, as a negative sample.
  • the execution module includes: a voting module: configured to perform voting for multiple sub-results in the obtained combined result information according to a Bagging strategy; a second selection module: configured to execute the result information with the largest number of selected marks As the final result.
  • the above-mentioned abnormality detection device is a one-to-one correspondence of the abnormality detection method, its function and execution principle are the same, so it will not be repeated here.
  • FIG. 5 Please refer to FIG. 5 for the basic structural block diagram of the computer equipment provided by the embodiment of the present invention.
  • the computer device includes a processor, a nonvolatile storage medium, a memory, and a network interface connected through a system bus.
  • the non-volatile storage medium of the computer device stores an operating system, a database, and computer-readable instructions.
  • the database may store control information sequences.
  • the processor can realize a An anomaly detection method.
  • the processor of the computer equipment is used to provide calculation and control capabilities, and supports the operation of the entire computer equipment.
  • Computer readable instructions may be stored in the memory of the computer device, and when the computer readable instructions are executed by the processor, the processor may execute an abnormality detection method.
  • the network interface of the computer device is used to connect and communicate with the terminal.
  • FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the computer device receives the status information of the prompt behavior sent by the associated client, that is, whether the associated terminal opens the prompt and whether the lender closes the prompt task.
  • the corresponding preset instruction is sent to the associated terminal, so that the associated terminal can perform corresponding operations according to the preset instruction, thereby realizing effective supervision of the associated terminal.
  • the server side controls the associated terminal to continue ringing to prevent the prompt task of the associated terminal from being automatically terminated after a period of time.
  • the present invention also provides a storage medium storing computer-readable instructions, which when executed by one or more processors, cause one or more processors to perform the abnormality detection described in any of the above embodiments method.
  • the computer program can be stored in a computer readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments.
  • the aforementioned storage medium may be a magnetic disk, an optical disk, a read-only storage memory (Read-Only Non-volatile storage media such as Memory, ROM, or Random Access Memory (RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An abnormality detection method, comprising: acquiring operation terminal data when a user performs registration or verification, wherein the operation terminal data is combined data comprising two or more of a device type, system information and an IP address; inputting the operation terminal data into a combined detection model for detection, so as to obtain combined result information, wherein the combined detection model comprises two or more detection models, each detection model outputs a corresponding sub-result, and the combined result information is generated from a plurality of sub-results; and voting on the combined result information to obtain final result information. According to the method, a feature conversion method is used to convert, in conjunction with a sample distribution condition, a plurality of pieces of unreadable attribute data into 0-1 two-valued combined features, a distinguishing combined feature set is generated, a detection model is built under a Bagging policy, and whether the user who performs registration or verification is abnormal is determined more comprehensively, thereby improving the accuracy of abnormality detection.

Description

异常检测方法、装置、计算机设备及存储介质 Anomaly detection method, device, computer equipment and storage medium To
本申请要求于2019年6月28日提交中国专利局、申请号为201910575550.9、发明名称为“异常检测方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 28, 2019, the application number is 201910575550.9, and the invention title is "anomaly detection method, device, computer equipment and storage medium", the entire content of which is incorporated by reference In application.
技术领域Technical field
本发明涉及计算机应用技术领域,具体而言,本发明涉及一种异常检测方法、装置、计算机设备及存储介质。The present invention relates to the field of computer application technology. Specifically, the present invention relates to an abnormality detection method, device, computer equipment and storage medium.
背景技术Background technique
用户异常行为是指违反社会文明准则或成群体行为习惯和标准的“反常”行为。特别是随着人们对公共安全意识、网络安全意识的提高,因此对人群场景、网络等环境中的异常行为检测的关注度越来越高。Abnormal user behavior refers to "abnormal" behavior that violates the social civilized norms or group behavior habits and standards. Especially with the improvement of people's awareness of public safety and network safety, there is an increasing focus on the detection of abnormal behavior in crowd scenes, networks and other environments.
目前对用户行为异常检测,通常依据个体异常行为的特征进行匹配检测,或依据个体正常行为的特征进行对比检测。但由于样本的属性基本为标称属性,仅分辨率等少量属性为数值型。复杂的文本型设备数据和难以理解的标称属性数据,难以挖掘有效分类特征,进而不能得到好的异常检测模型,导致了异常检测的准确率很低。At present, the detection of user behavior abnormality usually performs matching detection based on the characteristics of individual abnormal behavior, or comparison detection based on the characteristics of individual normal behavior. However, since the attributes of the samples are basically nominal attributes, only a few attributes such as resolution are numerical. Complicated text-based device data and incomprehensible nominal attribute data make it difficult to dig out effective classification features, and thus a good anomaly detection model cannot be obtained, resulting in low anomaly detection accuracy.
发明内容Summary of the invention
本发明的目的旨在至少能解决上述的技术缺陷之一,公开一种异常检测方法、装置、计算机设备及存储介质,能够全面地获取光标的触发数据,以精确地识别异常的光标触发数据。The purpose of the present invention is to solve at least one of the above-mentioned technical defects, and to disclose an abnormality detection method, device, computer equipment, and storage medium, which can comprehensively obtain cursor trigger data to accurately identify abnormal cursor trigger data.
为了达到上述目的,本发明公开一种异常检测方法,包括:In order to achieve the above objective, the present invention discloses an abnormality detection method, including:
获取用户进行注册或验证时的操作终端数据,其中,所述操作终端数据为包括设备类型、系统信息和IP地址中的两种或两种以上的组合数据;Obtaining operation terminal data when the user performs registration or verification, where the operation terminal data is combined data including two or more of device type, system information, and IP address;
将所述操作终端数据输入至组合检测模型中进行检测以得到组合结果信息,其中,所述组合检测模型中包括两个或两个以上的检测模型,每个所述检测模型均输出对应的子结果,多个所述子结果生成组合结果信息; Input the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, and each detection model outputs a corresponding sub As a result, multiple said sub-results generate combined result information; To
对所述组合结果信息进行投票,以得到最终结果信息。Voting on the combined result information to obtain the final result information.
另一方面,本申请公开一种异常检测装置,包括:On the other hand, the present application discloses an abnormality detection device, including:
获取模块:被配置为执行获取用户进行注册或验证时的操作终端数据,其中,所述操作终端数据为包括设备类型、系统信息和IP地址中的两种或两种以上的组合数据,所述系统信息包括系统类型、版本号和分辨率;Obtaining module: configured to perform obtaining operation terminal data when the user is registered or authenticated, where the operation terminal data is combined data including two or more of device type, system information and IP address, System information includes system type, version number and resolution;
处理模块:被配置为执行将所述操作终端数据输入至组合检测模型中进行检测以得到组合结果信息,其中,所述组合检测模型中包括两个或两个以上的检测模型,每个所述检测模型均输出对应的子结果,多个所述子结果生成组合结果信息;Processing module: configured to execute input of the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, each of which The detection models all output corresponding sub-results, and multiple said sub-results generate combined result information;
执行模块:被配置为执行对所述组合结果信息中的多个子结果按预设规则进行投票,以得到最终结果信息。Execution module: configured to perform voting on multiple sub-results in the combined result information according to preset rules to obtain final result information.
另一方面,本申请公开一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行上述任一项所述的异常检测方法的步骤。On the other hand, the present application discloses a computer device including a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor executes the foregoing Any of the steps of the abnormality detection method.
另一方面,本申请公开一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述任一项所述的异常检测方法的步骤。On the other hand, the present application discloses a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute any of the above Steps of anomaly detection method.
本发明的有益效果是:在本申请中公开的异常检测方法及装置,通过分解复杂的文本型设备数据,采用有效的特征转化方法,将难以理解的多个标称属性数据结合样本分布情况,转化为0-1二值的组合特征,生成具有区分性的组合特征集合,挖掘出有效的分类特征集,即可利用此特征集进行模型训练,得到更好的异常检测模型,同时,采用五种算法构建Bagging策略下的检测模型,根据Bagging策略,构建多模型用于异常检测,Naive Bayes从样本整体分布上给出一个综合概率,OneClassSVM和孤立森林分别从正常、异常方面给出样本的检测结果,利用五者的判定结果,能够较为全面的判断注册、验证用户是否异常,有效的避免了只利用数据量较多的正常样本训练出单一检测模型的片面性问题可在一定程度上避免由于样本不均衡导致Naive Bayes分类不准确的问题,提高了异常检测的准确率。The beneficial effects of the present invention are: the abnormality detection method and device disclosed in this application, by decomposing complex text-type device data, adopting an effective feature conversion method, and combining the incomprehensible multiple nominal attribute data with sample distribution, Converted into a 0-1 binary combination feature, generate a discriminative combination feature set, and mine an effective classification feature set. This feature set can be used for model training to obtain a better anomaly detection model. At the same time, five An algorithm is used to construct the detection model under the Bagging strategy. According to the Bagging strategy, multiple models are constructed for anomaly detection, Naive Bayes gives a comprehensive probability from the overall distribution of the sample. OneClassSVM and Isolation Forest respectively give the test results of the samples from the normal and abnormal aspects. Using the five judgment results, it can more comprehensively judge whether the registration and verify whether the user is abnormal, and it is effective It avoids the one-sided problem of training a single detection model using only normal samples with a large amount of data, which can avoid Naive due to imbalanced samples to a certain extent The problem of inaccurate Bayes classification improves the accuracy of anomaly detection.
本发明附加的方面和优点将在下面的描述中部分给出,这些将从下面的描述中变得明显,或通过本发明的实践了解到。The additional aspects and advantages of the present invention will be partly given in the following description, which will become obvious from the following description, or be understood through the practice of the present invention.
附图说明Description of the drawings
本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present invention will become obvious and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, in which:
图1为本发明一种异常检测方法示意图;Figure 1 is a schematic diagram of an abnormality detection method of the present invention;
图2为本发明组合检测模型的训练方法的流程图;Figure 2 is a flowchart of the training method of the combined detection model of the present invention;
图3为本发明获取样本数据以构造组合特征集的方法流程图;Figure 3 is a flow chart of the method for obtaining sample data to construct a combined feature set according to the present invention;
图4为本发明对得到最终结果信息的方法流程图;Figure 4 is a flowchart of the method for obtaining final result information according to the present invention;
图5为本发明一种异常检测装置的结构示意图;Figure 5 is a schematic structural diagram of an abnormality detection device of the present invention;
图6为本发明计算机设备基本结构框图。Figure 6 is a block diagram of the basic structure of the computer equipment of the present invention.
具体实施方式Detailed ways
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能解释为对本发明的限制。The embodiments of the present invention are described in detail below. Examples of the embodiments are shown in the accompanying drawings, in which the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary, and are only used to explain the present invention, and cannot be construed as limiting the present invention.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本发明的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。Those skilled in the art can understand that, unless specifically stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of the present invention refers to the presence of the described features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof. It should be understood that when we refer to an element as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element, or intervening elements may also be present. In addition, “connected” or “coupled” used herein may include wireless connection or wireless coupling. The term "and/or" as used herein includes all or any unit and all combinations of one or more associated listed items.
本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语),具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语,应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样被特定定义,否则不会用理想化或过于正式的含义来解释。Those skilled in the art can understand that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meanings as commonly understood by those of ordinary skill in the art to which the present invention belongs. It should also be understood that terms such as those defined in general dictionaries should be understood to have a meaning consistent with the meaning in the context of the prior art, and unless specifically defined as here, they will not be idealized or overly Explain the formal meaning.
本技术领域技术人员可以理解,这里所使用的“终端”、“终端设备”既包括无线信号接收器的设备,其仅具备无发射能力的无线信号接收器的设备,又包括接收和发射硬件的设备,其具有能够在双向通信链路上,执行双向通信的接收和发射硬件的设备。这种设备可以包括:蜂窝或其他通信设备,其具有单线路显示器或多线路显示器或没有多线路显示器的蜂窝或其他通信设备;PCS(Personal Communications Service,个人通信系统),其可以组合语音、数据处理、传真和/或数据通信能力;PDA(Personal Digital Assistant,个人数字助理),其可以包括射频接收器、寻呼机、互联网/内联网访问、网络浏览器、记事本、日历和/或GPS(Global Positioning System,全球定位系统)接收器;常规膝上型和/或掌上型计算机或其他设备,其具有和/或包括射频接收器的常规膝上型和/或掌上型计算机或其他设备。这里所使用的“终端”、“终端设备”可以是便携式、可运输、安装在交通工具(航空、海运和/或陆地)中的,或者适合于和/或配置为在本地运行,和/或以分布形式,运行在地球和/或空间的任何其他位置运行。这里所使用的“终端”、“终端设备”还可以是通信终端、上网终端、音乐/视频播放终端,例如可以是PDA、MID(Mobile Internet Device,移动互联网设备)和/或具有音乐/视频播放功能的移动电话,也可以是智能电视、机顶盒等设备。Those skilled in the art can understand that the term "terminal" and "terminal equipment" used herein include both wireless signal receiver equipment, equipment that only has wireless signal receivers without transmitting capability, and equipment receiving and transmitting hardware. A device, which has a device capable of performing two-way communication receiving and transmitting hardware on a two-way communication link. Such equipment may include: cellular or other communication equipment, which has a single-line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service, personal communication system), which can combine voice, data processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant, personal digital assistant), which can include radio frequency receivers, pagers, Internet/Intranet access, web browsers, notepads, calendars, and/or GPS (Global Positioning System (Global Positioning System) receiver; conventional laptop and/or palmtop computer or other device, which has and/or includes a radio frequency receiver, conventional laptop and/or palmtop computer or other device. The "terminal" and "terminal equipment" used here may be portable, transportable, installed in vehicles (aviation, sea and/or land), or suitable and/or configured to operate locally, and/or In a distributed form, it runs on the earth and/or any other location in space. The "terminal" and "terminal equipment" used here can also be communication terminals, internet terminals, music/video playback terminals, such as PDA, MID (Mobile Internet Device, mobile Internet device) and/or mobile phone with music/video playback function, it can also be a smart TV, set-top box and other devices.
具体的,请参阅图1,本发明公开一种异常检测方法,包括:Specifically, please refer to FIG. 1. The present invention discloses an abnormality detection method, including:
S1000、获取用户进行注册或验证时的操作终端数据,其中,所述操作终端数据为包括设备类型、系统信息和IP地址中的两种或两种以上的组合数据,所述系统信息包括系统类型、版本号和分辨率;S1000. Obtain operation terminal data when the user performs registration or verification, where the operation terminal data is combined data including two or more of device type, system information, and IP address, and the system information includes system type , Version number and resolution;
本申请的技术方案主要用于验证用户操作异常行为的检测,尤其是在监控用户在注册新账号或者登陆时进行验证过程中的异常操作。The technical solution of the present application is mainly used to verify the detection of abnormal behaviors of user operations, especially to monitor abnormal operations during the verification process when the user registers a new account or logs in.
在获取用户通过客户端发送注册信息至服务器端的数据中,包括用户注册的账号信息、身份信息,同时还会携带客户端所在设备的IP地址,进一步的通过设置获取参数,还可以进一步获取得到关于客户端所在设备的设备类型以及系统信息。这里的设备类型包括设备的硬件支持,比如手机、平板、电脑终端或者其他设备,而系统信息为这些硬件支持的软件,比如IOS系统、OS系统、WINDOWS系统、Andriod系统,进一步的,系统信息还包括具体的系统版本号信息以及系统的分辨率等信息。在本申请中,所述操作终端数据至少包括设备类型、系统信息和IP地址中的两种或两种以上的组合信息。例如,其组合信息可以为设备类型、系统类型、IP地址这三种数据,可以是设备类型、系统类型、版本号、IP地址这四种数据,也可以是设备类型、系统类型、版本号、分辨率、IP地址这五个,还可以是其他的数据,以及这些数据的任意组合。In acquiring the data that the user sends the registration information to the server through the client, it includes the user's registered account information and identity information, and also carries the IP address of the device where the client is located. Further by setting the acquisition parameters, you can further obtain information about The device type and system information of the device where the client is located. The device type here includes the hardware support of the device, such as mobile phones, tablets, computer terminals or other devices, and the system information is the software supported by these hardware, such as IOS system, OS system, WINDOWS system, Andriod system. Further, system information also Including specific system version number information and system resolution and other information. In this application, the operating terminal data includes at least two or more combination information of device type, system information, and IP address. For example, the combination information can be three types of data: device type, system type, and IP address, four types of data: device type, system type, version number, and IP address, or device type, system type, version number, The five resolutions and IP addresses can also be other data and any combination of these data.
S2000、将所述操作终端数据输入至组合检测模型中进行检测以得到组合结果信息,其中,所述组合检测模型中包括两个或两个以上的检测模型,每个所述检测模型均输出对应的子结果,多个所述子结果生成组合结果信息;S2000. Input the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, and each of the detection models outputs a corresponding Sub-results of, multiple said sub-results generate combined result information;
通过步骤S1000可知,获取的操作终端数据为组合信息,该组合信息至少包括设备类型、系统信息和IP地址中的两种或两种以上,将这些组合信息输入至组合检测模型中进行检测,即可得到对应的组合结果信息。在本申请中,组合检测模型包括至少两个检测模型,每个检测模型输出结果都相互独立,因此针对组合信息会输出至少两组结果信息。例如,组合信息为设备类型、系统类型和IP地址这三种,而组合检测模型中包括A、B、C、D和E五种,每个检测模型都相互独立,因此可得到五组针对设备类型、系统类型和IP地址的组合信息的结果信息,比如结果信息分别为(A1、A2、A3)、(B1、B2、B3)、(C1、C2、C3)、(D1、D2、D3)以及(E1、E2、E3),标号1表示设备类型的检测结果信息,标号2表示系统类型的检测结果,标号3表示IP地址的检测结果。It can be seen from step S1000 that the obtained operation terminal data is combined information, which includes at least two or more of the device type, system information, and IP address, and input these combined information into the combined detection model for detection, namely The corresponding combination result information can be obtained. In this application, the combined detection model includes at least two detection models, and the output results of each detection model are independent of each other. Therefore, at least two sets of result information are output for the combined information. For example, the combination information includes three types of device type, system type, and IP address. The combined detection model includes five types: A, B, C, D, and E. Each detection model is independent of each other, so five sets of device-specific The result information of the combination information of type, system type and IP address, for example, the result information is (A1, A2, A3), (B1, B2, B3), (C1, C2, C3), (D1, D2, D3) And (E1, E2, E3), the number 1 represents the detection result information of the device type, the number 2 represents the detection result of the system type, and the number 3 represents the detection result of the IP address.
S3000、对所述组合结果信息中的多个子结果按预设规则进行投票,以得到最终结果信息。S3000. Voting a plurality of sub-results in the combined result information according to preset rules to obtain final result information.
组合检测模块中的多个检测模型分别针对同样的操作终端数据输出对应的子结果后生成组合结果信息,随后,对组合结果信息中的子结果按照一定的规则进行投票以得最终结果信息。这里公开的一定的规则包括但不限于选取子结果相同的数量最多的作为最终结果。Multiple detection models in the combined detection module respectively output corresponding sub-results for the same operation terminal data to generate combined result information, and then vote on the sub-results in the combined result information according to certain rules to obtain final result information. Certain rules disclosed here include but are not limited to selecting the same number of sub-results as the final result.
本申请通过获取用户的操作终端数据,并从中提取多个数据作为组合数据进行识别,通过组合数据使判断结果更准确,此外识别组合数据的检测模型也是组合检测模型,通过多种训练方式训练出来的不同的检测模型对统一组合数据进行识别,以投票的方式来获取最终的结果信息,全面地判断注册、验证用户是否异常,有效的避免了只利用数据量较多的正常样训练出单一检测模型的片面性问题,同时降低了由于样本不均衡导致单一的检测模型的不准确性,提高了异常检测的准确率。This application obtains the user's operating terminal data and extracts multiple data from it as combined data for identification. The combined data makes the judgment result more accurate. In addition, the detection model for identifying combined data is also a combined detection model, which is trained through a variety of training methods The different detection models of the company recognize the unified combined data, obtain the final result information by voting, and comprehensively judge whether the registration and verification of the user are abnormal, which effectively avoids only using the normal sample with a large amount of data to train a single detection The one-sided problem of the model also reduces the inaccuracy of a single detection model due to sample imbalance and improves the accuracy of anomaly detection.
在一实施例中,所述组合检测模型中的检测模型包括:Naive Bayes检测模型、正类的OneClassSVM检测模型、负类的OneClassSVM检测模型、正类的孤立森林分类与检测模型和负类的孤立森林分类与检测模型。In an embodiment, the detection model in the combined detection model includes: Naive Bayes detection model, OneClassSVM detection model of positive class, OneClassSVM detection model of negative class, isolated forest classification and detection model of positive class, and isolated forest classification and detection model of negative class.
Naive Bayes(朴素贝叶斯模型)检测模型是一种分类算法,且借助了贝叶斯定理。另外,它是一种生成模型(generative model),采用直接对联合概率P(x,c)建模,以获得目标概率值的方法。贝叶斯定理描述了一个事件的可能性,这个可能性是基于了预先对于一些与该事件相关的情况的知识,用数学公式来表述贝叶斯定理:Naive The Bayes (Naive Bayes Model) detection model is a classification algorithm, and uses Bayes' theorem. In addition, it is a generative model (generative model), using the method of directly modeling the joint probability P(x,c) to obtain the target probability value. Bayes' theorem describes the possibility of an event. This possibility is based on the knowledge of some situations related to the event in advance. The mathematical formula is used to express Bayes' theorem:
用数学公式来表述贝叶斯定理:Use mathematical formulas to express Bayes' theorem:
P(c∣x)=P(x) P(x∣c)/ P(c)​= P(x,c)​ /P(x)P(c∣x)=P(x) P(x∣c)/ P(c)​= P(x,c)​ /P(x)
c表示的是随机事件发生的一种情况。x表示的就是证据(evidence)\状况(condition),泛指与随机事件相关的因素。c represents a situation where a random event occurs. x stands for evidence\condition, which generally refers to factors related to random events.
P(c|x):在x的条件下,随机事件出现c情况的概率(后验概率)。P(c|x): Under the condition of x, the probability of occurrence of c in a random event (posterior probability).
P(c):(不考虑相关因素)随机事件出现c情况的概率(先验概率)。P(c): (without considering relevant factors) the probability of occurrence of c in a random event (prior probability).
P(x|c):在已知事件出现c情况的条件下,条件x出现的概率(后验概率)。P(x|c): The probability of the occurrence of condition x (posterior probability) under the condition of known event occurrence c.
P(x):x出现的概率(先验概率)。P(x): The probability of occurrence of x (prior probability).
OneClassSVM检测模型是指在训练数据中只有正样本和负样本,符合要求的为正样本,所有其他不符合要求的则为负样本,利用One-Class SVM,它有能力捕获数据集的形状,因此对于强非高斯数据有更加优秀的效果,例如两个截然分开的数据集。严格来说,一分类的SVM并不是一个异常点监测算法,而是一个奇异点检测算法:它的训练集不能包含异常样本,否则的话,可能在训练时影响边界的选取。本申请中,OneClassSVM检测模型包括正类的OneClassSVM检测模型、负类的OneClassSVM检测模型,其中,正类的OneClassSVM检测模型只给定正样本进行训练,而负类的OneClassSVM检测模型只给定负样本进行训练。The OneClassSVM detection model means that there are only positive samples and negative samples in the training data. Those that meet the requirements are positive samples, and all others that do not meet the requirements are negative samples. Use One-Class SVM, it has the ability to capture the shape of the data set, so it has a better effect on strong non-Gaussian data, such as two completely separate data sets. Strictly speaking, the one-class SVM is not an abnormal point detection algorithm, but a singular point detection algorithm: its training set cannot contain abnormal samples, otherwise, it may affect the selection of the boundary during training. In this application, the OneClassSVM detection model includes the OneClassSVM detection model of the positive class and the OneClassSVM detection model of the negative class. The OneClassSVM detection model of the positive class is only given positive samples for training, while the OneClassSVM detection model of the negative class is only given negative samples. Conduct training.
孤立森林分类与检测模型(Isolation Forest)是一个基于Ensemble的快速异常检测方法,具有线性时间复杂度和高精准度,是符合大数据处理要求的state-of-the-art算法。适用与连续数据(Continuous numerical data)的异常检测,将异常定义为“容易被孤立的离群点 (more likely to be separated)”——可以理解为分布稀疏且离密度高的群体较远的点。用统计学来解释,在数据空间里面,分布稀疏的区域表示数据发生在此区域的概率很低,因而可以认为落在这些区域里的数据是异常的。在本申请中孤立森林分类与检测模型也包括正类的孤立森林分类与检测模型和负类的孤立森林分类与检测模型,其中正类的孤立森林分类与检测模型通过正样本训练,负类的孤立森林分类与检测模型通过负样本训练。Isolated forest classification and detection model (Isolation Forest) is a fast anomaly detection method based on Ensemble, with linear time complexity and high accuracy, and is a state-of-the-art algorithm that meets the requirements of big data processing. Applicable and continuous data numerical data). Anomaly detection is defined as "more likely to be Separated)"-can be understood as a point that is sparsely distributed and far away from a high-density group. To explain with statistics, in the data space, a sparsely distributed area means that the probability of data occurring in this area is very low, so it can be considered The data falling in these areas is abnormal. In this application, the isolated forest classification and detection model also includes the positive isolated forest classification and detection model and the negative isolated forest classification and detection model, where the positive isolated forest classification The detection model is trained through positive samples, and the isolated forest classification and detection model of the negative class is trained through negative samples.
在一实施例中,请参阅图2,上述五种检测模型构成的组合检测模型的训练方法包括:In one embodiment, referring to FIG. 2, the training method of the combined detection model composed of the above five detection models includes:
S2100、获取样本数据以构造组合特征集,其中,所述组合特征集包括正样本和负样本;S2100. Obtain sample data to construct a combined feature set, where the combined feature set includes a positive sample and a negative sample;
上述公开的正样本为根据可能的识别目的,选取的符合该目的数据信息,这些数据信息的表现形式可以是文字、数字、字符串,也可以是图片、声音等。本申请的面对是用户检测用户异常输入行为,其是通过用户客户端的设备类型、系统信息以及IP地址等信息来判断的,因此在本申请中,该正样本是指合法的设备类型、系统信息和IP地址,比如合法的设备类型包括手机端、PC端、平板端和电脑端,当识别出登录和注册信息是来自于上述公开的设备类型时,则为正样本,当识别出的登录和注册信息是不是属于手机、PC端、平板和电脑端的任意一种,而是通过智能手环等一些未被认定为合法设备类型的终端时,则该智能手环的设备数据为负样本。这些样本数据通过收集而得到。The positive samples disclosed above are selected according to the possible recognition purpose and meet the purpose of data information. The expression of these data information can be text, numbers, strings, pictures, sounds, etc. The face of this application is that the user detects abnormal user input behavior, which is judged by the device type, system information, and IP address of the user client. Therefore, in this application, the positive sample refers to the legal device type, system Information and IP address, such as legal device types including mobile phone, PC, tablet and computer. When it is recognized that the login and registration information is from the above-mentioned public device type, it is a positive sample. When the recognized login is And whether the registration information belongs to any of mobile phones, PCs, tablets and computers, but through some terminals that are not recognized as legal device types such as smart bracelets, the device data of the smart bracelet is a negative sample. These sample data are obtained through collection.
在一实施例中,请参阅图3,所述获取样本数据以构造组合特征集的方法包括:In an embodiment, referring to FIG. 3, the method of obtaining sample data to construct a combined feature set includes:
S2110、将通过至少两种获取方式而获取的用户注册或验证时的操作终端数据作为样本数据,其中,所述获取方式包括通过爬虫算法获取、设备检测获得以及从用户发送的注册或验证信息中获取;S2110. Use the operation terminal data during user registration or verification acquired through at least two acquisition methods as sample data, where the acquisition methods include acquisition through a crawler algorithm, device detection, and registration or verification information sent from the user. Obtain;
在一实施例中,样本数据来源于不同的获取方式,比如,通过爬虫算法获取、设备检测获得以及从用户发送的注册或验证信息中获取。通过爬虫算法获取为通过编译一段爬虫代码以对用户的登录进行监控,获取用户在注册或者验证时的所有操作终端数据,此过程采集的数据,除了最终的注册信息以及验证信息外,还包括中间信息,比如在传输过程中是否被截取等。设备检测为通过客户端自身识别到的数据,即在客户端,当通过输入工具输入注册信息或者验证信息之后,在最终发送之前,由客户端上自身的输入工具监测到的注册信息或验证信息;而用户发送的注册或验证信息为用户通过客户端发送,且通过后端服务器端接收到的注册或验证信息的。可以说,通过设备检测获得的数据为用户输入的原始数据,而通过爬虫算法获取的数据为原始数据从客户端发送到服务器端过程中的数据,而从用户发送的数据为服务器端接收到的原始数据,从数据输入、传输到接收三个阶段对同一数据进行监控,可以确保数据的一致性。只要任何一个环节中,对比得到的数据不一致,则表示用户注册或验证的数据有异常。In an embodiment, the sample data comes from different ways of obtaining, for example, obtained by crawling algorithm, obtained by device detection, and obtained from registration or verification information sent by the user. Obtaining by crawler algorithm is to compile a piece of crawler code to monitor the user's login and obtain all the terminal data of the user during registration or verification. The data collected in this process includes the final registration information and verification information, as well as intermediate Information, such as whether it was intercepted during transmission, etc. The device is detected as data recognized by the client itself, that is, on the client, after the registration information or verification information is input through the input tool, before the final transmission, the registration information or verification information monitored by the input tool on the client ; The registration or verification information sent by the user is the registration or verification information sent by the user through the client and received by the back-end server. It can be said that the data obtained through device detection is the original data input by the user, and the data obtained through the crawler algorithm is the data in the process of sending the original data from the client to the server, and the data sent from the user is the data received by the server. The original data is monitored in three stages from data input, transmission to reception, which can ensure data consistency. As long as the data obtained by comparison is inconsistent in any link, it means that the data registered or verified by the user is abnormal.
S2120、计算每一种获取方式所获取的样本数据的支持度和置信度;S2120. Calculate the support and confidence of the sample data acquired by each acquisition method;
这里的支持度揭示了上述几种方式获取的数据同时出现的概率,置信度为表示上述各种方式获取的数据的可信度,通过验证可以得出各种获取样本数据的准确度,根据准确度来对各种方式设置一个数值,以代表其置信度,阈值越高,代表这种方式获取的数据越可信,每种方式都有一个数值来表示其置信度,例如,在上述实施例中,通过爬虫算法获取、设备检测获得以及从用户发送的注册或验证信息中三种方式获取用户注册或验证的数据,通过前期的数据对比和测算,可对三种方式分别设置一个置信度,比如通过爬虫算法获取的数据置信度为A,通过设备检测方式获取的数据置信度为B,通过用户发送的注册或验证信息的方式置信度为C,当通过步骤S2110方式获取了数据后,则根据数据的来源,匹配对应的置信度。The degree of support here reveals the probability that the data obtained by the above methods will appear at the same time. The confidence degree indicates the credibility of the data obtained by the above methods. Through verification, the accuracy of various sample data can be obtained. According to the accuracy Set a value for each method to represent its confidence. The higher the threshold, the more credible the data obtained in this way. Each method has a value to indicate its confidence. For example, in the above embodiment In, the user registration or verification data is obtained through three methods: crawler algorithm acquisition, device detection acquisition, and registration or verification information sent by users. Through previous data comparison and calculation, a confidence level can be set for each of the three methods. For example, the confidence level of the data obtained through the crawler algorithm is A, the confidence level of the data obtained through the device detection method is B, and the confidence level of the registration or verification information sent by the user is C. When the data is obtained through step S2110, then According to the source of the data, match the corresponding confidence.
S2130、选取所述支持度和置信度最大的操作终端数据的组合作为基准数据;S2130. Select the combination of the operation terminal data with the greatest support and confidence as reference data;
支持度(Support)的公式是:Support(A->B)=P(A U B)。支持度揭示了A与B同时出现的概率。如果A与B同时出现的概率小,说明A与B的关系不大;如果A与B同时出现的非常频繁,则说明A与B总是相关的。The formula of Support is: Support(A->B)=P(A U B). Support reveals the probability of A and B appearing at the same time. If the probability of A and B appearing at the same time is small, it means that the relationship between A and B is not great; if A and B appear very frequently at the same time, it means that A and B are always related.
置信度(Confidence)的公式式:Confidence(A->B)=P(A | B)。置信度揭示了A出现时,B是否也会出现或有多大概率出现。如果置信度度为100%,则A和B可以捆绑销售了。如果置信度太低,则说明A的出现与B是否出现关系不大。Confidence formula: Confidence(A->B)=P(A | B). Confidence level reveals whether B will also appear when A appears, or how likely it is to appear. If the confidence level is 100%, then A and B can be sold in a bundle. If the confidence is too low, it means that the appearance of A has little to do with whether or not B appears.
在本申请中,由于上述的操作终端数据来自于不同的获取方式,因此可以获取得到多组操作终端数据,每一组操作终端数据的每一个数据都会按照支持度和置信度的公式计算得到对应的数值,选取对应的项目下支持度和置信度都最大的数据,组合起来作为本次计算的基准数据。In this application, because the above-mentioned operating terminal data comes from different acquisition methods, multiple sets of operating terminal data can be obtained, and each data of each set of operating terminal data will be calculated according to the formula of support and confidence. Select the data with the largest support and confidence under the corresponding item, and combine them as the benchmark data for this calculation.
S2140、将每一种所述获取方式所获取的操作终端数据与所述基准数据的比对结果按照第一规则进行标记,构成特征集。S2140. Mark the comparison result of the operation terminal data acquired by each of the acquisition methods with the reference data according to the first rule to form a feature set.
由于可以从多种途径下获取到操作终端数据,将操作终端数据与基准数据进行对比,按照第一规则进行标记,则可得到一组特征数据,这组特征数据为特征集。Since the operating terminal data can be obtained from a variety of ways, comparing the operating terminal data with the benchmark data, and marking according to the first rule, a set of feature data can be obtained, and this set of feature data is a feature set.
在一实施例中,所述第一规则为,所述操作终端数据中与基准数据相同的数据标记为1,作为正样本,与所述基准数据不同的标记为0,作为负样本。这样,上述多组操作终端数据则构成了由0或1组成的特征集。In an embodiment, the first rule is that the data in the operating terminal data that is the same as the reference data is marked as 1 as a positive sample, and the data that is different from the reference data is marked as 0 as a negative sample. In this way, the above multiple sets of operation terminal data constitute a feature set consisting of 0 or 1.
S2200、所述Naive Bayes检测模型同时学习正样本和负样本的识别;所述正类的OneClassSVM检测模型和所述正类的孤立森林分类与检测模型学习正样本的识别;所述负类的OneClassSVM检测模型和负类的孤立森林分类与检测模型学习负样本的识别。S2200, the Naive The Bayes detection model learns the identification of positive samples and negative samples at the same time; the OneClassSVM detection model of the positive class and the isolated forest classification and detection model of the positive class learn the identification of positive samples; the OneClassSVM detection model of the negative class and the negative class The isolated forest classification and detection model learns to identify negative samples.
Naive Bayes检测模型为一种分类算法,对其分别进行正样本和负样本的识别,例如,当输入需要识别的数据时,输出是正类或负类,其中,与正样本一样时,为正类,输出1,与负样本一样时,为负类,输出-1。由于在Naive Bayes检测模型中既进行了正样本的训练,又进行了负样本的训练,因此能够较为准确地得到正类输出和负类输出。正类的OneClassSVM检测模型主要通过正样本进行训练,因此对于正类的输出较为准确,负类的OneClassSVM检测模型主要通过负样本进行训练,因此对于负类的输出较为准确,同样,正类的孤立森林分类与检测模型主要对于正类的输出较为准确,负类的孤立森林分类与检测模型对于负类的输出较为准确。Naive The Bayes detection model is a classification algorithm that recognizes positive samples and negative samples separately. For example, when inputting data that needs to be recognized, the output is positive or negative. Among them, when the same as the positive sample, it is the positive class. Output 1, when it is the same as the negative sample, it is a negative type, and -1 is output. Because in Naive In the Bayes detection model, both positive sample training and negative sample training are carried out, so the positive output and negative output can be obtained more accurately. The OneClassSVM detection model of the positive class is mainly trained by positive samples, so the output of the positive class is more accurate, and the OneClassSVM detection model of the negative class is mainly trained by negative samples, so the output of the negative class is more accurate. Similarly, the positive class is isolated The forest classification and detection model is mainly accurate for the output of the positive class, and the isolated forest classification and detection model of the negative class is more accurate for the output of the negative class.
当通过上述步骤S2000获取了组合结果信息后,则对所述组合结果信息进行投票,以得到最终结果信息,具体的,请参阅图4,对所述组合结果信息中的多个子结果按预设规则进行投票,以得到最终结果信息的方法包括:After the combination result information is obtained through the above step S2000, the combination result information is voted to obtain the final result information. Specifically, please refer to FIG. 4, and the multiple sub-results in the combination result information are preset The rules for voting to obtain the final result information include:
S3100、将获取的组合结果信息中的多个子结果根据Bagging策略进行投票;S3100: Vote multiple sub-results in the obtained combined result information according to the Bagging strategy;
S3200、选取标记数量最多的结果信息作为最终结果。S3200. The result information with the largest number of marks is selected as the final result.
在本申请中,操作终端数据是通过爬虫算法获取、设备检测获得以及从用户发送的注册或验证信息中获取,获取的方式有多种,每一种方式获取的数据可能相同也可能不同,这就导致了操作终端数据的多样性,而操作终端数据输入至组合检测模型中进行检测后,得到组合结果信息,其中组合检测模型的检测结果是相互独立的,不同的检测模型其训练的原理不同,训练的数据也可能不一样,每个检测模型都有自己的特点,因此得到的组合结果信息也可能不一样,将不同的检测模型得到的检测结果组合起来就得到了组合结果信息。将获取的组合结果信息根据Bagging策略进行投票。In this application, the operating terminal data is obtained through crawler algorithms, device detection, and registration or verification information sent by users. There are many ways to obtain it. The data obtained by each method may be the same or different. This leads to the diversity of the operating terminal data, and the operating terminal data is input into the combined detection model for detection, and the combined result information is obtained. The detection results of the combined detection model are independent of each other, and different detection models have different training principles. , The training data may also be different. Each detection model has its own characteristics, so the combined result information obtained may also be different. Combine the detection results obtained by different detection models to obtain the combined result information. Vote the obtained combination result information according to the Bagging strategy.
Bagging(装袋)又叫自助聚集,是一种根据均匀概率分布从数据中重复抽样(有放回)的技术。每个抽样生成的自助样本集上,训练一个基分类器;对训练过的分类器进行投票,将测试样本指派到得票最高的类中。每个自助样本集都和原数据一样大。有放回抽样,一些样本可能在同一训练集中出现多次,一些可能被忽略。Bagging (bagging) is also called self-aggregation, which is a technique of repeatedly sampling (with replacement) from data according to a uniform probability distribution. On the self-service sample set generated by each sample, train a base classifier; vote on the trained classifier, and assign the test sample to the class with the highest vote. Each self-service sample set is as large as the original data. With replacement sampling, some samples may appear multiple times in the same training set, and some may be ignored.
通过对组合结果信息进行对比,选取相同数据最多的作为最终结果。例如,在一实施例中,假设获取的操作终端数据为:设备类型、系统类型、版本号、分辨率、IP地址,在构建组合特征集步骤中,通过By comparing the combined result information, the one with the most identical data is selected as the final result. For example, in an embodiment, it is assumed that the acquired operating terminal data are: device type, system type, version number, resolution, and IP address. In the step of constructing a combined feature set, pass
假设采用Naive Bayes检测模型、正类和负类的OneClassSVM检测模型、正类和负类的孤立森林分类与检测模型五种检测模型得到的组合结果信息如下:
数据1 数据2 数据3 数据4 数据5
Naive Bayes检测模型 1 1 0 0 1
正类OneClassSVM检测模型 1 0 1 0 1
负类OneClassSVM检测模型 1 1 0 1 1
正类的孤立森林分 0 0 0 0 1
类与检测模型
负类的孤立森林分类与检测模型 1 1 0 1 0
最终结果 1 1 0 0 1
Assuming that the Naive Bayes detection model, the OneClassSVM detection model for positive and negative classes, and the isolated forest classification and detection model for positive and negative classes are used, the combined result information obtained by the five detection models is as follows:
Data 1 Data 2 Data 3 Data 4 Data 5
Naive Bayes detection model 1 1 0 0 1
Positive OneClassSVM detection model 1 0 1 0 1
Negative class OneClassSVM detection model 1 1 0 1 1
Isolated forest points 0 0 0 0 1
Class and detection model
Negative classification and detection model of isolated forest 1 1 0 1 0
Final Results 1 1 0 0 1
上述示例中,五种检测模型分别输出五组数据,由于每组数据都是对操作终端数据的输出结果与基准数据进行对比后,按照第一规则进行标记后构成的数据,因此,每组数据都具有统一性,即被归纳为0或1的数据,这样方便进行对比投票,从上述五组数据种,可以看出,对于数据1,出现“1”数量最多,因此数据1的最终结果为“1”,数据2的最终结果为“1”,数据3的最终结果为“0”,数据4的最终结果为“0”,数据5的最终结果为“1”,因此最终结果为“1、1、0、0、1”。In the above example, the five detection models output five sets of data. Since each set of data is the data formed by comparing the output results of the operating terminal data with the benchmark data, and marking them according to the first rule, each set of data All have uniformity, that is, data that is summarized as 0 or 1, which facilitates comparative voting. From the above five sets of data, it can be seen that for data 1, the number of "1"s is the largest, so the final result of data 1 is "1", the final result of data 2 is "1", the final result of data 3 is "0", the final result of data 4 is "0", the final result of data 5 is "1", so the final result is "1" , 1, 0, 0, 1".
本申请的技术方案中,通过分解复杂的文本型设备数据,采用有效的特征转化方法,将难以理解的多个标称属性数据结合样本分布情况,转化为0-1二值的组合特征,生成具有区分性的组合特征集合,挖掘出有效的分类特征集,即可利用此特征集进行模型训练,得到更好的异常检测模型,同时,采用五种算法构建Bagging策略下的检测模型,根据Bagging策略,构建多模型用于异常检测,Naive Bayes从样本整体分布上给出一个综合概率,OneClassSVM和孤立森林分别从正常、异常方面给出样本的检测结果,利用五者的判定结果,能够较为全面的判断注册、验证用户是否异常,有效的避免了只利用数据量较多的正常样训练出单一检测模型的片面性问题,同时降低了由于样本不均衡导致Naive Bayes分类的不准性,提高了异常检测的准确率。In the technical solution of this application, by decomposing complex text-based device data, using an effective feature conversion method, combining the incomprehensible multiple nominal attribute data with the sample distribution, transforming it into a 0-1 binary combination feature to generate Combining feature sets with distinguishing characteristics, mining effective classification feature sets, you can use this feature set for model training, and get a better anomaly detection model. At the same time, five algorithms are used to construct the detection model under the Bagging strategy. According to Bagging Strategy, build multiple models for anomaly detection, Naive Bayes gives a comprehensive probability from the overall distribution of the sample. OneClassSVM and Isolation Forest respectively give the test results of the samples from the normal and abnormal aspects. Using the five judgment results, it can more comprehensively judge whether the registration and verify whether the user is abnormal, and it is effective It avoids the one-sided problem of training a single detection model using only normal samples with a large amount of data, and reduces the Naive caused by unbalanced samples. The inaccuracy of Bayes classification improves the accuracy of anomaly detection.
另一方面,请参阅图5,本申请公开一种异常检测装置,包括:On the other hand, please refer to FIG. 5. This application discloses an abnormality detection device, including:
获取模块1000:被配置为执行通过获取用户进行注册或验证时的操作终端数据,其中,所述操作终端数据为包括设备类型、系统信息和IP地址中的两种或两种以上的组合数据,所述系统信息包括系统类型、版本号和分辨率;处理模块2000:被配置为执行将所述操作终端数据输入至组合检测模型中进行检测以得到组合结果信息,其中,所述组合检测模型中包括两个或两个以上的检测模型,每个所述检测模型均输出对应的子结果,多个所述子结果生成组合结果信息;执行模块3000:被配置为执行对所述组合结果信息中的多个子结果按预设规则进行投票,以得到最终结果信息。Obtaining module 1000: configured to execute operation terminal data when registering or verifying by acquiring a user, where the operation terminal data is combined data including two or more of device type, system information and IP address, The system information includes the system type, version number, and resolution; the processing module 2000: is configured to execute the input of the operating terminal data into the combined detection model for detection to obtain combined result information, wherein the combined detection model It includes two or more detection models, each of the detection models outputs a corresponding sub-result, and a plurality of the sub-results generate combined result information; the execution module 3000: configured to execute the combination result information A number of sub-results are voted according to preset rules to obtain the final result information.
可选的,所述组合检测模型中的检测模型包括:Naive Bayes检测模型、正类的OneClassSVM检测模型、负类的OneClassSVM检测模型、正类的孤立森林分类与检测模型和负类的孤立森林分类与检测模型,所述Naive Bayes检测模型同时学习正样本和负样本的识别;所述正类的OneClassSVM检测模型和所述正类的孤立森林分类与检测模型学习正样本的识别;所述负类的OneClassSVM检测模型和负类的孤立森林分类与检测模型学习负样本的识别。Optionally, the detection model in the combined detection model includes: Naive The Bayes detection model, the OneClassSVM detection model of the positive class, the OneClassSVM detection model of the negative class, the isolated forest classification and detection model of the positive class, and the isolated forest classification and detection model of the negative class, the Naive The Bayes detection model learns the identification of positive samples and negative samples at the same time; the OneClassSVM detection model of the positive class and the isolated forest classification and detection model of the positive class learn the identification of positive samples; the OneClassSVM detection model of the negative class and the negative class The isolated forest classification and detection model learns to identify negative samples.
可选的,所述处理模块中还包括:特征集构建模块:被配置为获取样本数据以构造组合特征集,其中,所述组合特征集包括正样本和负样本。Optionally, the processing module further includes: a feature set construction module: configured to obtain sample data to construct a combined feature set, wherein the combined feature set includes positive samples and negative samples.
可选的,所述特征集构建模块还包括:样本获取模块:被配置为执行将通过至少两种获取方式而获取的用户注册或验证时的操作终端数据作为样本数据,其中,所述获取方式包括通过爬虫算法获取、设备检测获得以及从用户发送的注册或验证信息中获取;计算模块:被配置为执行计算每一种获取方式所获取的样本数据的支持度和置信度;第一选取模块:被配置为执行选取所述支持度和置信度最大的操作终端数据的组合作为基准数据;标记模块:被配置为执将每一种所述获取方式所获取的操作终端数据与所述基准数据的比对结果按照第一规则进行标记,构成特征集。Optionally, the feature set construction module further includes: a sample acquisition module: configured to execute operation terminal data obtained through at least two acquisition methods during user registration or verification as sample data, wherein the acquisition method Including acquisition through crawler algorithm, device detection, and acquisition from registration or verification information sent by users; calculation module: configured to perform calculations on the support and confidence of sample data acquired by each acquisition method; first selection module : Configured to perform selection of the combination of the operating terminal data with the highest degree of support and confidence as the reference data; marking module: configured to perform the combination of the operating terminal data acquired by each of the acquisition methods and the reference data The comparison result of is marked according to the first rule to form a feature set.
可选的,所述获取所述用户进行注册或验证时的操作终端数据的方法包括:通过爬虫算法获取、设备检测获得以及从用户发送的注册或验证信息中获取。Optionally, the method for obtaining operation terminal data when the user performs registration or verification includes: obtaining by crawling algorithm, obtaining by device detection, and obtaining from registration or verification information sent by the user.
可选的,所述第一规则为:所述操作终端数据中与基准数据相同的数据标记为1,作为正样本,与所述基准数据不同的标记为0,作为负样本。Optionally, the first rule is: the data in the operating terminal data that is the same as the reference data is marked as 1, as a positive sample, and the data that is different from the reference data is marked as 0, as a negative sample.
可选的,所述执行模块包括:投票模块:被配置为执行将获取的组合结果信息中的多个子结果根据Bagging策略进行投票;第二选取模块:被配置为执行选取标记数量最多的结果信息作为最终结果。Optionally, the execution module includes: a voting module: configured to perform voting for multiple sub-results in the obtained combined result information according to a Bagging strategy; a second selection module: configured to execute the result information with the largest number of selected marks As the final result.
由于上述的异常检测装置是异常检测方法一一对应的装置,其功能和执行原理一样,此处不再赘述。Since the above-mentioned abnormality detection device is a one-to-one correspondence of the abnormality detection method, its function and execution principle are the same, so it will not be repeated here.
本发明实施例提供计算机设备基本结构框图请参阅图5。Please refer to FIG. 5 for the basic structural block diagram of the computer equipment provided by the embodiment of the present invention.
该计算机设备包括通过系统总线连接的处理器、非易失性存储介质、存储器和网络接口。其中,该计算机设备的非易失性存储介质存储有操作系统、数据库和计算机可读指令,数据库中可存储有控件信息序列,该计算机可读指令被处理器执行时,可使得处理器实现一种异常检测方法。该计算机设备的处理器用于提供计算和控制能力,支撑整个计算机设备的运行。该计算机设备的存储器中可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种异常检测方法。该计算机设备的网络接口用于与终端连接通信。本领域技术人员可以理解,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。The computer device includes a processor, a nonvolatile storage medium, a memory, and a network interface connected through a system bus. Wherein, the non-volatile storage medium of the computer device stores an operating system, a database, and computer-readable instructions. The database may store control information sequences. When the computer-readable instructions are executed by the processor, the processor can realize a An anomaly detection method. The processor of the computer equipment is used to provide calculation and control capabilities, and supports the operation of the entire computer equipment. Computer readable instructions may be stored in the memory of the computer device, and when the computer readable instructions are executed by the processor, the processor may execute an abnormality detection method. The network interface of the computer device is used to connect and communicate with the terminal. Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
计算机设备通过接收关联的客户端发送的提示行为的状态信息,即关联终端是否开启提示以及贷款人是否关闭该提示任务。通过验证上述任务条件是否达成,进而向关联终端发送对应的预设指令,以使关联终端能够根据该预设指令执行相应的操作,从而实现了对关联终端的有效监管。同时,在提示信息状态与预设的状态指令不相同时,服务器端控制关联终端持续进行响铃,以防止关联终端的提示任务在执行一段时间后自动终止的问题。The computer device receives the status information of the prompt behavior sent by the associated client, that is, whether the associated terminal opens the prompt and whether the lender closes the prompt task. By verifying whether the above-mentioned task conditions are fulfilled, the corresponding preset instruction is sent to the associated terminal, so that the associated terminal can perform corresponding operations according to the preset instruction, thereby realizing effective supervision of the associated terminal. At the same time, when the prompt information state is different from the preset state command, the server side controls the associated terminal to continue ringing to prevent the prompt task of the associated terminal from being automatically terminated after a period of time.
本发明还提供一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述任一实施例所述的异常检测方法。The present invention also provides a storage medium storing computer-readable instructions, which when executed by one or more processors, cause one or more processors to perform the abnormality detection described in any of the above embodiments method.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a computer readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a magnetic disk, an optical disk, a read-only storage memory (Read-Only Non-volatile storage media such as Memory, ROM, or Random Access Memory (RAM), etc.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the flowchart of the drawings are shown in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least part of the steps in the flowchart of the drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a part of sub-steps or stages of other steps.
以上所述仅是本发明的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above are only part of the embodiments of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications are also It should be regarded as the protection scope of the present invention.

Claims (20)

  1. 一种异常检测方法,其特征在于,包括: An anomaly detection method, characterized in that it comprises:
    获取用户进行注册或验证时的操作终端数据,其中,所述操作终端数据为包括设备类型、系统信息和IP地址中的两种或两种以上的组合数据,所述系统信息包括系统类型、版本号和分辨率;Obtain the operating terminal data of the user during registration or verification, where the operating terminal data is a combination of two or more of device type, system information and IP address, and the system information includes system type and version Number and resolution;
    将所述操作终端数据输入至组合检测模型中进行检测以得到组合结果信息,其中,所述组合检测模型中包括两个或两个以上的检测模型,每个所述检测模型均输出对应的子结果,多个所述子结果生成组合结果信息; Input the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, and each detection model outputs a corresponding sub As a result, multiple said sub-results generate combined result information; To
    对所述组合结果信息中的多个子结果按预设规则进行投票,以得到最终结果信息。Voting is performed on multiple sub-results in the combined result information according to a preset rule to obtain final result information.
  2. 根据权利要求1所述的异常检测方法,其特征在于,所述组合检测模型中的检测模型包括:Naive Bayes检测模型、正类的OneClassSVM检测模型、负类的OneClassSVM检测模型、正类的孤立森林分类与检测模型和负类的孤立森林分类与检测模型。The abnormality detection method according to claim 1, wherein the detection model in the combined detection model comprises: Naive Bayes detection model, OneClassSVM detection model of positive class, OneClassSVM detection model of negative class, isolated forest classification and detection model of positive class, and isolated forest classification and detection model of negative class.
  3. 根据权利要求2所述的异常检测方法,其特征在于,所述组合检测模型的训练方法包括:The anomaly detection method according to claim 2, wherein the training method of the combined detection model comprises:
    获取样本数据以构造组合特征集,其中,所述组合特征集包括正样本和负样本;Acquiring sample data to construct a combined feature set, wherein the combined feature set includes a positive sample and a negative sample;
    所述Naive Bayes检测模型同时学习正样本和负样本的识别;The Naive Bayes detection model simultaneously learns the recognition of positive samples and negative samples;
    所述正类的OneClassSVM检测模型和所述正类的孤立森林分类与检测模型学习正样本的识别;The positive class OneClassSVM detection model and the positive class isolated forest classification and detection model learn the identification of positive samples;
    所述负类的OneClassSVM检测模型和负类的孤立森林分类与检测模型学习负样本的识别。The OneClassSVM detection model of the negative class and the isolated forest classification and detection model of the negative class learn the identification of negative samples.
  4. 根据权利要求3所述的异常检测方法,其特征在于,所述获取样本数据以构造组合特征集的方法包括:The abnormality detection method according to claim 3, wherein the method of obtaining sample data to construct a combined feature set comprises:
    将通过至少两种获取方式而获取的用户注册或验证时的操作终端数据作为样本数据,其中,所述获取方式包括通过爬虫算法获取、设备检测获得以及从用户发送的注册或验证信息中获取;Using at least two acquisition methods to obtain user registration or verification operation terminal data as sample data, where the acquisition methods include acquisition through crawler algorithms, device detection, and acquisition from registration or verification information sent by the user;
    计算每一种获取方式所获取的样本数据的支持度和置信度;Calculate the support and confidence of the sample data acquired by each acquisition method;
    选取所述支持度和置信度最大的操作终端数据的组合作为基准数据;Selecting the combination of the operation terminal data with the greatest support and confidence as the reference data;
    将每一种所述获取方式所获取的操作终端数据与所述基准数据的比对结果按照第一规则进行标记,构成特征集。The comparison result of the operation terminal data acquired by each acquisition method and the reference data is marked according to the first rule to form a feature set.
  5. 根据权利要求4所述的异常检测方法,其特征在于,所述第一规则为:所述操作终端数据中与基准数据相同的数据标记为1,作为正样本,与所述基准数据不同的标记为0,作为负样本。The abnormality detection method according to claim 4, wherein the first rule is: the data in the operating terminal data that is the same as the reference data is marked as 1, as a positive sample, and a label different from the reference data 0, as a negative sample.
  6. 根据权利要求4所述的异常检测方法,其特征在于,所述对所述组合结果信息中的多个子结果按预设规则进行投票,以得到最终结果信息的方法包括:The abnormality detection method according to claim 4, wherein the method of voting on multiple sub-results in the combined result information according to preset rules to obtain the final result information comprises:
    将获取的组合结果信息中的多个子结果根据Bagging策略进行投票;Vote multiple sub-results in the obtained combined result information according to the Bagging strategy;
    选取标记数量最多的结果信息作为最终结果。The result information with the most marks is selected as the final result.
  7. 一种异常检测装置,其特征在于,包括:An abnormality detection device, characterized in that it comprises:
    获取模块:被配置为执行通过获取用户进行注册或验证时的操作终端数据,其中,所述操作终端数据为包括设备类型、系统信息和IP地址中的两种或两种以上的组合数据,所述系统信息包括系统类型、版本号和分辨率;Obtaining module: configured to execute the operation terminal data when the user is registered or verified by acquiring, wherein the operation terminal data is a combination of two or more data including device type, system information and IP address, so The system information includes system type, version number and resolution;
    处理模块:被配置为执行将所述操作终端数据输入至组合检测模型中进行检测以得到组合结果信息,其中,所述组合检测模型中包括两个或两个以上的检测模型,每个所述检测模型均输出对应的子结果,多个所述子结果生成组合结果信息;Processing module: configured to execute input of the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, each of which The detection models all output corresponding sub-results, and multiple said sub-results generate combined result information;
    执行模块:被配置为执行对所述组合结果信息中的多个子结果按预设规则进行投票,以得到最终结果信息。Execution module: configured to perform voting on multiple sub-results in the combined result information according to preset rules to obtain final result information.
  8. 根据权利要求7所述的异常检测装置,其特征在于,所述组合检测模型中的检测模型包括:Naive Bayes检测模型、正类的OneClassSVM检测模型、负类的OneClassSVM检测模型、正类的孤立森林分类与检测模型和负类的孤立森林分类与检测模型。The abnormality detection device according to claim 7, wherein the detection model in the combined detection model comprises: Naive Bayes detection model, OneClassSVM detection model of positive class, OneClassSVM detection model of negative class, isolated forest classification and detection model of positive class, and isolated forest classification and detection model of negative class.
  9. 根据权利要求8所述的异常检测装置,其特征在于,所述处理模块中还包括:The abnormality detection device according to claim 8, wherein the processing module further comprises:
    特征集构建模块:被配置为获取样本数据以构造组合特征集,其中,所述组合特征集包括正样本和负样本;Feature set construction module: configured to obtain sample data to construct a combined feature set, wherein the combined feature set includes positive samples and negative samples;
    所述Naive Bayes检测模型同时学习正样本和负样本的识别;所述正类的OneClassSVM检测模型和所述正类的孤立森林分类与检测模型学习正样本的识别;所述负类的OneClassSVM检测模型和负类的孤立森林分类与检测模型学习负样本的识别。The Naive The Bayes detection model learns the identification of positive samples and negative samples at the same time; the OneClassSVM detection model of the positive class and the isolated forest classification and detection model of the positive class learn the identification of positive samples; the OneClassSVM detection model of the negative class and the negative class The isolated forest classification and detection model learns to identify negative samples.
  10. 根据权利要求9所述的异常检测装置,其特征在于,所述特征集构建模块还包括:The abnormality detection device according to claim 9, wherein the feature set construction module further comprises:
    样本获取模块:被配置为执行将通过至少两种获取方式而获取的用户注册或验证时的操作终端数据作为样本数据,其中,所述获取方式包括通过爬虫算法获取、设备检测获得以及从用户发送的注册或验证信息中获取;Sample acquisition module: configured to execute operation terminal data obtained through at least two acquisition methods during user registration or verification as sample data, wherein the acquisition methods include acquisition through crawler algorithms, device detection, and sending from users Obtained from the registration or verification information;
    计算模块:被配置为执行计算每一种获取方式所获取的样本数据的支持度和置信度;Calculation module: configured to perform calculations on the support and confidence of the sample data acquired by each acquisition method;
    第一选取模块:被配置为执行选取所述支持度和置信度最大的操作终端数据的组合作为基准数据;The first selection module: configured to perform selection of the combination of the operation terminal data with the greatest support and confidence as the reference data;
    标记模块:被配置为执将每一种所述获取方式所获取的操作终端数据与所述基准数据的比对结果按照第一规则进行标记,构成特征集。The marking module is configured to mark the comparison result of the operation terminal data obtained by each of the acquisition methods and the reference data according to the first rule to form a feature set.
  11. 根据权利要求10所述的异常检测装置,其特征在于,所述第一规则为:所述操作终端数据中与基准数据相同的数据标记为1,作为正样本,与所述基准数据不同的标记为0,作为负样本。The abnormality detection device according to claim 10, wherein the first rule is: the data in the operating terminal data that is the same as the reference data is marked as 1, as a positive sample, and a label different from the reference data 0, as a negative sample.
  12. 根据权利要求10所述的异常检测装置,其特征在于,所述执行模块包括:The abnormality detection device according to claim 10, wherein the execution module comprises:
    投票模块:被配置为执行将获取的组合结果信息中的多个子结果根据Bagging策略进行投票;Voting module: configured to perform voting for multiple sub-results in the obtained combined result information according to the Bagging strategy;
    第二选取模块:被配置为执行选取标记数量最多的结果信息作为最终结果。The second selection module: configured to execute the result information with the largest number of selection marks as the final result.
  13. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:A computer device includes a memory and a processor, and computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor executes the following steps:
    获取用户进行注册或验证时的操作终端数据,其中,所述操作终端数据为包括设备类型、系统信息和IP地址中的两种或两种以上的组合数据,所述系统信息包括系统类型、版本号和分辨率;Obtain the operating terminal data of the user during registration or verification, where the operating terminal data is a combination of two or more of device type, system information and IP address, and the system information includes system type and version Number and resolution;
    将所述操作终端数据输入至组合检测模型中进行检测以得到组合结果信息,其中,所述组合检测模型中包括两个或两个以上的检测模型,每个所述检测模型均输出对应的子结果,多个所述子结果生成组合结果信息; Input the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, and each detection model outputs a corresponding sub As a result, multiple said sub-results generate combined result information; To
    对所述组合结果信息中的多个子结果按预设规则进行投票,以得到最终结果信息。Voting is performed on multiple sub-results in the combined result information according to a preset rule to obtain final result information.
  14. 根据权利要求13所述的计算机设备,其特征在于,所述组合检测模型中的检测模型包括:Naive Bayes检测模型、正类的OneClassSVM检测模型、负类的OneClassSVM检测模型、正类的孤立森林分类与检测模型和负类的孤立森林分类与检测模型。The computer device according to claim 13, wherein the detection model in the combined detection model comprises: Naive Bayes detection model, OneClassSVM detection model of positive class, OneClassSVM detection model of negative class, isolated forest classification and detection model of positive class, and isolated forest classification and detection model of negative class.
  15. 根据权利要求14所述的计算机设备,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:The computer device according to claim 14, wherein when the computer-readable instructions are executed by the processor, the processor is caused to perform the following steps:
    获取样本数据以构造组合特征集,其中,所述组合特征集包括正样本和负样本;Acquiring sample data to construct a combined feature set, wherein the combined feature set includes a positive sample and a negative sample;
    所述Naive Bayes检测模型同时学习正样本和负样本的识别;The Naive Bayes detection model simultaneously learns the recognition of positive samples and negative samples;
    所述正类的OneClassSVM检测模型和所述正类的孤立森林分类与检测模型学习正样本的识别;The positive class OneClassSVM detection model and the positive class isolated forest classification and detection model learn the identification of positive samples;
    所述负类的OneClassSVM检测模型和负类的孤立森林分类与检测模型学习负样本的识别。The OneClassSVM detection model of the negative class and the isolated forest classification and detection model of the negative class learn the identification of negative samples.
  16. 根据权利要求15所述的计算机设备,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:The computer device according to claim 15, wherein when the computer-readable instructions are executed by the processor, the processor is caused to perform the following steps:
    将通过至少两种获取方式而获取的用户注册或验证时的操作终端数据作为样本数据,其中,所述获取方式包括通过爬虫算法获取、设备检测获得以及从用户发送的注册或验证信息中获取;Using at least two acquisition methods to obtain user registration or verification operation terminal data as sample data, where the acquisition methods include acquisition through crawler algorithms, device detection, and acquisition from registration or verification information sent by the user;
    计算每一种获取方式所获取的样本数据的支持度和置信度;Calculate the support and confidence of the sample data acquired by each acquisition method;
    选取所述支持度和置信度最大的操作终端数据的组合作为基准数据;Selecting the combination of the operation terminal data with the greatest support and confidence as the reference data;
    将每一种所述获取方式所获取的操作终端数据与所述基准数据的比对结果按照第一规则进行标记,构成特征集。The comparison result of the operation terminal data acquired by each acquisition method and the reference data is marked according to the first rule to form a feature set.
  17. 一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:A storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    获取用户进行注册或验证时的操作终端数据,其中,所述操作终端数据为包括设备类型、系统信息和IP地址中的两种或两种以上的组合数据,所述系统信息包括系统类型、版本号和分辨率;Obtain the operating terminal data of the user during registration or verification, where the operating terminal data is a combination of two or more of device type, system information and IP address, and the system information includes system type and version Number and resolution;
    将所述操作终端数据输入至组合检测模型中进行检测以得到组合结果信息,其中,所述组合检测模型中包括两个或两个以上的检测模型,每个所述检测模型均输出对应的子结果,多个所述子结果生成组合结果信息; Input the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, and each detection model outputs a corresponding sub As a result, multiple said sub-results generate combined result information; To
    对所述组合结果信息中的多个子结果按预设规则进行投票,以得到最终结果信息。Voting is performed on multiple sub-results in the combined result information according to a preset rule to obtain final result information.
  18. 根据权利要求17所述的存储有计算机可读指令的存储介质,其特征在于,所述组合检测模型中的检测模型包括:Naive Bayes检测模型、正类的OneClassSVM检测模型、负类的OneClassSVM检测模型、正类的孤立森林分类与检测模型和负类的孤立森林分类与检测模型。The storage medium storing computer-readable instructions according to claim 17, wherein the detection model in the combined detection model comprises: Naive Bayes detection model, OneClassSVM detection model of positive class, OneClassSVM detection model of negative class, isolated forest classification and detection model of positive class, and isolated forest classification and detection model of negative class.
  19. 根据权利要求18所述的存储有计算机可读指令的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:The storage medium storing computer-readable instructions according to claim 18, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:
    获取样本数据以构造组合特征集,其中,所述组合特征集包括正样本和负样本;Acquiring sample data to construct a combined feature set, wherein the combined feature set includes a positive sample and a negative sample;
    所述Naive Bayes检测模型同时学习正样本和负样本的识别;The Naive Bayes detection model simultaneously learns the recognition of positive samples and negative samples;
    所述正类的OneClassSVM检测模型和所述正类的孤立森林分类与检测模型学习正样本的识别;The positive class OneClassSVM detection model and the positive class isolated forest classification and detection model learn the identification of positive samples;
    所述负类的OneClassSVM检测模型和负类的孤立森林分类与检测模型学习负样本的识别。The OneClassSVM detection model of the negative class and the isolated forest classification and detection model of the negative class learn the identification of negative samples.
  20. 根据权利要求19所述的存储有计算机可读指令的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:The storage medium storing computer-readable instructions according to claim 19, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    将通过至少两种获取方式而获取的用户注册或验证时的操作终端数据作为样本数据,其中,所述获取方式包括通过爬虫算法获取、设备检测获得以及从用户发送的注册或验证信息中获取;Using at least two acquisition methods to obtain user registration or verification operation terminal data as sample data, where the acquisition methods include acquisition through crawler algorithms, device detection, and acquisition from registration or verification information sent by the user;
    计算每一种获取方式所获取的样本数据的支持度和置信度;Calculate the support and confidence of the sample data acquired by each acquisition method;
    选取所述支持度和置信度最大的操作终端数据的组合作为基准数据;Selecting the combination of the operation terminal data with the greatest support and confidence as the reference data;
    将每一种所述获取方式所获取的操作终端数据与所述基准数据的比对结果按照第一规则进行标记,构成特征集。 The comparison result of the operation terminal data acquired by each acquisition method and the reference data is marked according to the first rule to form a feature set. To
PCT/CN2019/117607 2019-06-28 2019-11-12 Abnormality detection method and apparatus, computer device and storage medium WO2020258657A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910575550.9 2019-06-28
CN201910575550.9A CN110443274B (en) 2019-06-28 2019-06-28 Abnormality detection method, abnormality detection device, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020258657A1 true WO2020258657A1 (en) 2020-12-30

Family

ID=68428777

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117607 WO2020258657A1 (en) 2019-06-28 2019-11-12 Abnormality detection method and apparatus, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN110443274B (en)
WO (1) WO2020258657A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112817452A (en) * 2021-01-28 2021-05-18 Oppo广东移动通信有限公司 Sample data acquisition method and device, electronic equipment and storage medium
CN112905488A (en) * 2021-03-30 2021-06-04 平安国际智慧城市科技股份有限公司 Link testing method and device, computer equipment and storage medium
CN113537642A (en) * 2021-08-20 2021-10-22 日月光半导体制造股份有限公司 Product quality prediction method, device, electronic equipment and storage medium
CN113627551A (en) * 2021-08-17 2021-11-09 平安普惠企业管理有限公司 Multi-model-based certificate classification method, device, equipment and storage medium
CN114266313A (en) * 2021-12-23 2022-04-01 国网天津市电力公司营销服务中心 Line loss detection system based on random forest model
CN115134153A (en) * 2022-06-30 2022-09-30 中国电信股份有限公司 Safety evaluation method and device and model training method and device
CN117150403A (en) * 2023-08-22 2023-12-01 国网湖北省电力有限公司营销服务中心(计量中心) Decision node behavior anomaly detection method and system

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956143A (en) * 2019-12-03 2020-04-03 交控科技股份有限公司 Abnormal behavior detection method and device, electronic equipment and storage medium
CN112906727B (en) * 2019-12-04 2024-08-30 天翼云科技有限公司 Method and system for real-time on-line detection of virtual machine state
CN110969514A (en) * 2019-12-04 2020-04-07 重庆特斯联智慧科技股份有限公司 Renting security method and system based on Internet of things
CN111259985B (en) * 2020-02-19 2023-06-30 腾讯云计算(长沙)有限责任公司 Classification model training method and device based on business safety and storage medium
CN111707355A (en) * 2020-06-19 2020-09-25 浙江讯飞智能科技有限公司 Equipment operation state detection method, device, equipment and storage medium
CN111783883A (en) * 2020-06-30 2020-10-16 平安普惠企业管理有限公司 Abnormal data detection method and device
CN112541536A (en) * 2020-12-09 2021-03-23 长沙理工大学 Under-sampling classification integration method, device and storage medium for credit scoring
CN113657461A (en) * 2021-07-28 2021-11-16 北京宝兰德软件股份有限公司 Log anomaly detection method, system, device and medium based on text classification
CN114065187B (en) * 2022-01-18 2022-04-08 中诚华隆计算机技术有限公司 Abnormal login detection method and device, computing equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107612938A (en) * 2017-10-27 2018-01-19 朱秋华 A kind of network user's anomaly detection method, device, equipment and storage medium
CN108881194A (en) * 2018-06-07 2018-11-23 郑州信大先进技术研究院 Enterprises user anomaly detection method and device
US20190034836A1 (en) * 2015-08-31 2019-01-31 International Business Machines Corporation Automatic generation of training data for anomaly detection using other user's data samples
CN110166462A (en) * 2019-05-25 2019-08-23 深圳市元征科技股份有限公司 Access control method, system, electronic equipment and computer storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10915558B2 (en) * 2017-01-25 2021-02-09 General Electric Company Anomaly classifier
CN107294993B (en) * 2017-07-05 2021-02-09 重庆邮电大学 WEB abnormal traffic monitoring method based on ensemble learning
CN109032829B (en) * 2018-07-23 2020-12-08 腾讯科技(深圳)有限公司 Data anomaly detection method and device, computer equipment and storage medium
CN109936561B (en) * 2019-01-08 2022-05-13 平安科技(深圳)有限公司 User request detection method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190034836A1 (en) * 2015-08-31 2019-01-31 International Business Machines Corporation Automatic generation of training data for anomaly detection using other user's data samples
CN107612938A (en) * 2017-10-27 2018-01-19 朱秋华 A kind of network user's anomaly detection method, device, equipment and storage medium
CN108881194A (en) * 2018-06-07 2018-11-23 郑州信大先进技术研究院 Enterprises user anomaly detection method and device
CN110166462A (en) * 2019-05-25 2019-08-23 深圳市元征科技股份有限公司 Access control method, system, electronic equipment and computer storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112817452A (en) * 2021-01-28 2021-05-18 Oppo广东移动通信有限公司 Sample data acquisition method and device, electronic equipment and storage medium
CN112905488A (en) * 2021-03-30 2021-06-04 平安国际智慧城市科技股份有限公司 Link testing method and device, computer equipment and storage medium
CN113627551A (en) * 2021-08-17 2021-11-09 平安普惠企业管理有限公司 Multi-model-based certificate classification method, device, equipment and storage medium
CN113537642A (en) * 2021-08-20 2021-10-22 日月光半导体制造股份有限公司 Product quality prediction method, device, electronic equipment and storage medium
CN114266313A (en) * 2021-12-23 2022-04-01 国网天津市电力公司营销服务中心 Line loss detection system based on random forest model
CN115134153A (en) * 2022-06-30 2022-09-30 中国电信股份有限公司 Safety evaluation method and device and model training method and device
CN117150403A (en) * 2023-08-22 2023-12-01 国网湖北省电力有限公司营销服务中心(计量中心) Decision node behavior anomaly detection method and system
CN117150403B (en) * 2023-08-22 2024-05-28 国网湖北省电力有限公司营销服务中心(计量中心) Decision node behavior anomaly detection method and system

Also Published As

Publication number Publication date
CN110443274A (en) 2019-11-12
CN110443274B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
WO2020258657A1 (en) Abnormality detection method and apparatus, computer device and storage medium
WO2020143322A1 (en) User request detection method and apparatus, computer device, and storage medium
WO2020029585A1 (en) Neural network federation modeling method and device employing transfer learning, and storage medium
WO2020164267A1 (en) Text classification model construction method and apparatus, and terminal and storage medium
WO2020155773A1 (en) Method of monitoring for suspicious text input, device, computer apparatus, and storage medium
WO2020215681A1 (en) Indication information generation method and apparatus, terminal, and storage medium
WO2019037197A1 (en) Method and device for training topic classifier, and computer-readable storage medium
WO2020015067A1 (en) Data acquisition method, device, equipment and storage medium
WO2018149083A1 (en) Service data transfer method and device, storage medium, and terminal
WO2020015064A1 (en) System fault processing method, apparatus, device and storage medium
WO2019117651A1 (en) Search method using data structure for supporting multiple search in blockchain-based iot environment, and device according to method
WO2018124729A1 (en) Privacy-preserving transformation of continuous data
WO2020119403A1 (en) Hospitalization data abnormity detection method, apparatus and device, and readable storage medium
CN101751535A (en) Data loss protection through application data access classification
WO2020107762A1 (en) Ctr estimation method and device, and computer readable storage medium
WO2019024485A1 (en) Data sharing method and device and computer readable storage medium
WO2020107761A1 (en) Advertising copy processing method, apparatus and device, and computer-readable storage medium
WO2020233073A1 (en) Blockchain environment test method, device and apparatus, and storage medium
WO2020258672A1 (en) Network access anomaly detection method and device
WO2017107367A1 (en) Method for user identifier processing, terminal and nonvolatile computer readable storage medium thereof
WO2020073494A1 (en) Webpage backdoor detecting method, device, storage medium and apparatus
WO2020107591A1 (en) Double insurance limiting method, apparatus, device, and readable storage medium
WO2019031839A1 (en) System and method for neural networks
WO2020062641A1 (en) Method for identifying user role, and user equipment, storage medium, and apparatus for identifying user role
WO2020122291A1 (en) Apparatus and method for automating artificial intelligence-based apartment house management work instructions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19935538

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19935538

Country of ref document: EP

Kind code of ref document: A1