WO2020258657A1

WO2020258657A1 - Abnormality detection method and apparatus, computer device and storage medium

Info

Publication number: WO2020258657A1
Application number: PCT/CN2019/117607
Authority: WO
Inventors: 黎立桂
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-06-28
Filing date: 2019-11-12
Publication date: 2020-12-30
Also published as: CN110443274A; CN110443274B

Abstract

An abnormality detection method, comprising: acquiring operation terminal data when a user performs registration or verification, wherein the operation terminal data is combined data comprising two or more of a device type, system information and an IP address; inputting the operation terminal data into a combined detection model for detection, so as to obtain combined result information, wherein the combined detection model comprises two or more detection models, each detection model outputs a corresponding sub-result, and the combined result information is generated from a plurality of sub-results; and voting on the combined result information to obtain final result information. According to the method, a feature conversion method is used to convert, in conjunction with a sample distribution condition, a plurality of pieces of unreadable attribute data into 0-1 two-valued combined features, a distinguishing combined feature set is generated, a detection model is built under a Bagging policy, and whether the user who performs registration or verification is abnormal is determined more comprehensively, thereby improving the accuracy of abnormality detection.

Description

Anomaly detection method, device, computer equipment and storage medium To

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 28, 2019, the application number is 201910575550.9, and the invention title is "anomaly detection method, device, computer equipment and storage medium", the entire content of which is incorporated by reference In application.

Technical field

The present invention relates to the field of computer application technology. Specifically, the present invention relates to an abnormality detection method, device, computer equipment and storage medium.

Background technique

Abnormal user behavior refers to "abnormal" behavior that violates the social civilized norms or group behavior habits and standards. Especially with the improvement of people's awareness of public safety and network safety, there is an increasing focus on the detection of abnormal behavior in crowd scenes, networks and other environments.

At present, the detection of user behavior abnormality usually performs matching detection based on the characteristics of individual abnormal behavior, or comparison detection based on the characteristics of individual normal behavior. However, since the attributes of the samples are basically nominal attributes, only a few attributes such as resolution are numerical. Complicated text-based device data and incomprehensible nominal attribute data make it difficult to dig out effective classification features, and thus a good anomaly detection model cannot be obtained, resulting in low anomaly detection accuracy.

Summary of the invention

The purpose of the present invention is to solve at least one of the above-mentioned technical defects, and to disclose an abnormality detection method, device, computer equipment, and storage medium, which can comprehensively obtain cursor trigger data to accurately identify abnormal cursor trigger data.

In order to achieve the above objective, the present invention discloses an abnormality detection method, including:

Obtaining operation terminal data when the user performs registration or verification, where the operation terminal data is combined data including two or more of device type, system information, and IP address;

Input the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, and each detection model outputs a corresponding sub As a result, multiple said sub-results generate combined result information; To

Voting on the combined result information to obtain the final result information.

On the other hand, the present application discloses an abnormality detection device, including:

Obtaining module: configured to perform obtaining operation terminal data when the user is registered or authenticated, where the operation terminal data is combined data including two or more of device type, system information and IP address, System information includes system type, version number and resolution;

Processing module: configured to execute input of the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, each of which The detection models all output corresponding sub-results, and multiple said sub-results generate combined result information;

Execution module: configured to perform voting on multiple sub-results in the combined result information according to preset rules to obtain final result information.

On the other hand, the present application discloses a computer device including a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor executes the foregoing Any of the steps of the abnormality detection method.

On the other hand, the present application discloses a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute any of the above Steps of anomaly detection method.

The beneficial effects of the present invention are: the abnormality detection method and device disclosed in this application, by decomposing complex text-type device data, adopting an effective feature conversion method, and combining the incomprehensible multiple nominal attribute data with sample distribution, Converted into a 0-1 binary combination feature, generate a discriminative combination feature set, and mine an effective classification feature set. This feature set can be used for model training to obtain a better anomaly detection model. At the same time, five An algorithm is used to construct the detection model under the Bagging strategy. According to the Bagging strategy, multiple models are constructed for anomaly detection, Naive Bayes gives a comprehensive probability from the overall distribution of the sample. OneClassSVM and Isolation Forest respectively give the test results of the samples from the normal and abnormal aspects. Using the five judgment results, it can more comprehensively judge whether the registration and verify whether the user is abnormal, and it is effective It avoids the one-sided problem of training a single detection model using only normal samples with a large amount of data, which can avoid Naive due to imbalanced samples to a certain extent The problem of inaccurate Bayes classification improves the accuracy of anomaly detection.

The additional aspects and advantages of the present invention will be partly given in the following description, which will become obvious from the following description, or be understood through the practice of the present invention.

Description of the drawings

The above and/or additional aspects and advantages of the present invention will become obvious and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, in which:

Figure 1 is a schematic diagram of an abnormality detection method of the present invention;

Figure 2 is a flowchart of the training method of the combined detection model of the present invention;

Figure 3 is a flow chart of the method for obtaining sample data to construct a combined feature set according to the present invention;

Figure 4 is a flowchart of the method for obtaining final result information according to the present invention;

Figure 5 is a schematic structural diagram of an abnormality detection device of the present invention;

Figure 6 is a block diagram of the basic structure of the computer equipment of the present invention.

Detailed ways

The embodiments of the present invention are described in detail below. Examples of the embodiments are shown in the accompanying drawings, in which the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary, and are only used to explain the present invention, and cannot be construed as limiting the present invention.

Those skilled in the art can understand that, unless specifically stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of the present invention refers to the presence of the described features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof. It should be understood that when we refer to an element as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element, or intervening elements may also be present. In addition, “connected” or “coupled” used herein may include wireless connection or wireless coupling. The term "and/or" as used herein includes all or any unit and all combinations of one or more associated listed items.

Those skilled in the art can understand that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meanings as commonly understood by those of ordinary skill in the art to which the present invention belongs. It should also be understood that terms such as those defined in general dictionaries should be understood to have a meaning consistent with the meaning in the context of the prior art, and unless specifically defined as here, they will not be idealized or overly Explain the formal meaning.

Those skilled in the art can understand that the term "terminal" and "terminal equipment" used herein include both wireless signal receiver equipment, equipment that only has wireless signal receivers without transmitting capability, and equipment receiving and transmitting hardware. A device, which has a device capable of performing two-way communication receiving and transmitting hardware on a two-way communication link. Such equipment may include: cellular or other communication equipment, which has a single-line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service, personal communication system), which can combine voice, data processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant, personal digital assistant), which can include radio frequency receivers, pagers, Internet/Intranet access, web browsers, notepads, calendars, and/or GPS (Global Positioning System (Global Positioning System) receiver; conventional laptop and/or palmtop computer or other device, which has and/or includes a radio frequency receiver, conventional laptop and/or palmtop computer or other device. The "terminal" and "terminal equipment" used here may be portable, transportable, installed in vehicles (aviation, sea and/or land), or suitable and/or configured to operate locally, and/or In a distributed form, it runs on the earth and/or any other location in space. The "terminal" and "terminal equipment" used here can also be communication terminals, internet terminals, music/video playback terminals, such as PDA, MID (Mobile Internet Device, mobile Internet device) and/or mobile phone with music/video playback function, it can also be a smart TV, set-top box and other devices.

Specifically, please refer to FIG. 1. The present invention discloses an abnormality detection method, including:

S1000. Obtain operation terminal data when the user performs registration or verification, where the operation terminal data is combined data including two or more of device type, system information, and IP address, and the system information includes system type , Version number and resolution;

The technical solution of the present application is mainly used to verify the detection of abnormal behaviors of user operations, especially to monitor abnormal operations during the verification process when the user registers a new account or logs in.

In acquiring the data that the user sends the registration information to the server through the client, it includes the user's registered account information and identity information, and also carries the IP address of the device where the client is located. Further by setting the acquisition parameters, you can further obtain information about The device type and system information of the device where the client is located. The device type here includes the hardware support of the device, such as mobile phones, tablets, computer terminals or other devices, and the system information is the software supported by these hardware, such as IOS system, OS system, WINDOWS system, Andriod system. Further, system information also Including specific system version number information and system resolution and other information. In this application, the operating terminal data includes at least two or more combination information of device type, system information, and IP address. For example, the combination information can be three types of data: device type, system type, and IP address, four types of data: device type, system type, version number, and IP address, or device type, system type, version number, The five resolutions and IP addresses can also be other data and any combination of these data.

S2000. Input the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, and each of the detection models outputs a corresponding Sub-results of, multiple said sub-results generate combined result information;

It can be seen from step S1000 that the obtained operation terminal data is combined information, which includes at least two or more of the device type, system information, and IP address, and input these combined information into the combined detection model for detection, namely The corresponding combination result information can be obtained. In this application, the combined detection model includes at least two detection models, and the output results of each detection model are independent of each other. Therefore, at least two sets of result information are output for the combined information. For example, the combination information includes three types of device type, system type, and IP address. The combined detection model includes five types: A, B, C, D, and E. Each detection model is independent of each other, so five sets of device-specific The result information of the combination information of type, system type and IP address, for example, the result information is (A1, A2, A3), (B1, B2, B3), (C1, C2, C3), (D1, D2, D3) And (E1, E2, E3), the number 1 represents the detection result information of the device type, the number 2 represents the detection result of the system type, and the number 3 represents the detection result of the IP address.

S3000. Voting a plurality of sub-results in the combined result information according to preset rules to obtain final result information.

Multiple detection models in the combined detection module respectively output corresponding sub-results for the same operation terminal data to generate combined result information, and then vote on the sub-results in the combined result information according to certain rules to obtain final result information. Certain rules disclosed here include but are not limited to selecting the same number of sub-results as the final result.

This application obtains the user's operating terminal data and extracts multiple data from it as combined data for identification. The combined data makes the judgment result more accurate. In addition, the detection model for identifying combined data is also a combined detection model, which is trained through a variety of training methods The different detection models of the company recognize the unified combined data, obtain the final result information by voting, and comprehensively judge whether the registration and verification of the user are abnormal, which effectively avoids only using the normal sample with a large amount of data to train a single detection The one-sided problem of the model also reduces the inaccuracy of a single detection model due to sample imbalance and improves the accuracy of anomaly detection.

In an embodiment, the detection model in the combined detection model includes: Naive Bayes detection model, OneClassSVM detection model of positive class, OneClassSVM detection model of negative class, isolated forest classification and detection model of positive class, and isolated forest classification and detection model of negative class.

Naive The Bayes (Naive Bayes Model) detection model is a classification algorithm, and uses Bayes' theorem. In addition, it is a generative model (generative model), using the method of directly modeling the joint probability P(x,c) to obtain the target probability value. Bayes' theorem describes the possibility of an event. This possibility is based on the knowledge of some situations related to the event in advance. The mathematical formula is used to express Bayes' theorem:

Use mathematical formulas to express Bayes' theorem:

P(c∣x)=P(x) P(x∣c)/ P(c)= P(x,c) /P(x)

c represents a situation where a random event occurs. x stands for evidence\condition, which generally refers to factors related to random events.

P(c|x): Under the condition of x, the probability of occurrence of c in a random event (posterior probability).

P(c): (without considering relevant factors) the probability of occurrence of c in a random event (prior probability).

P(x|c): The probability of the occurrence of condition x (posterior probability) under the condition of known event occurrence c.

P(x): The probability of occurrence of x (prior probability).

The OneClassSVM detection model means that there are only positive samples and negative samples in the training data. Those that meet the requirements are positive samples, and all others that do not meet the requirements are negative samples. Use One-Class SVM, it has the ability to capture the shape of the data set, so it has a better effect on strong non-Gaussian data, such as two completely separate data sets. Strictly speaking, the one-class SVM is not an abnormal point detection algorithm, but a singular point detection algorithm: its training set cannot contain abnormal samples, otherwise, it may affect the selection of the boundary during training. In this application, the OneClassSVM detection model includes the OneClassSVM detection model of the positive class and the OneClassSVM detection model of the negative class. The OneClassSVM detection model of the positive class is only given positive samples for training, while the OneClassSVM detection model of the negative class is only given negative samples. Conduct training.

Isolated forest classification and detection model (Isolation Forest) is a fast anomaly detection method based on Ensemble, with linear time complexity and high accuracy, and is a state-of-the-art algorithm that meets the requirements of big data processing. Applicable and continuous data numerical data). Anomaly detection is defined as "more likely to be Separated)"-can be understood as a point that is sparsely distributed and far away from a high-density group. To explain with statistics, in the data space, a sparsely distributed area means that the probability of data occurring in this area is very low, so it can be considered The data falling in these areas is abnormal. In this application, the isolated forest classification and detection model also includes the positive isolated forest classification and detection model and the negative isolated forest classification and detection model, where the positive isolated forest classification The detection model is trained through positive samples, and the isolated forest classification and detection model of the negative class is trained through negative samples.

In one embodiment, referring to FIG. 2, the training method of the combined detection model composed of the above five detection models includes:

S2100. Obtain sample data to construct a combined feature set, where the combined feature set includes a positive sample and a negative sample;

The positive samples disclosed above are selected according to the possible recognition purpose and meet the purpose of data information. The expression of these data information can be text, numbers, strings, pictures, sounds, etc. The face of this application is that the user detects abnormal user input behavior, which is judged by the device type, system information, and IP address of the user client. Therefore, in this application, the positive sample refers to the legal device type, system Information and IP address, such as legal device types including mobile phone, PC, tablet and computer. When it is recognized that the login and registration information is from the above-mentioned public device type, it is a positive sample. When the recognized login is And whether the registration information belongs to any of mobile phones, PCs, tablets and computers, but through some terminals that are not recognized as legal device types such as smart bracelets, the device data of the smart bracelet is a negative sample. These sample data are obtained through collection.

In an embodiment, referring to FIG. 3, the method of obtaining sample data to construct a combined feature set includes:

S2110. Use the operation terminal data during user registration or verification acquired through at least two acquisition methods as sample data, where the acquisition methods include acquisition through a crawler algorithm, device detection, and registration or verification information sent from the user. Obtain;

In an embodiment, the sample data comes from different ways of obtaining, for example, obtained by crawling algorithm, obtained by device detection, and obtained from registration or verification information sent by the user. Obtaining by crawler algorithm is to compile a piece of crawler code to monitor the user's login and obtain all the terminal data of the user during registration or verification. The data collected in this process includes the final registration information and verification information, as well as intermediate Information, such as whether it was intercepted during transmission, etc. The device is detected as data recognized by the client itself, that is, on the client, after the registration information or verification information is input through the input tool, before the final transmission, the registration information or verification information monitored by the input tool on the client ; The registration or verification information sent by the user is the registration or verification information sent by the user through the client and received by the back-end server. It can be said that the data obtained through device detection is the original data input by the user, and the data obtained through the crawler algorithm is the data in the process of sending the original data from the client to the server, and the data sent from the user is the data received by the server. The original data is monitored in three stages from data input, transmission to reception, which can ensure data consistency. As long as the data obtained by comparison is inconsistent in any link, it means that the data registered or verified by the user is abnormal.

S2120. Calculate the support and confidence of the sample data acquired by each acquisition method;

The degree of support here reveals the probability that the data obtained by the above methods will appear at the same time. The confidence degree indicates the credibility of the data obtained by the above methods. Through verification, the accuracy of various sample data can be obtained. According to the accuracy Set a value for each method to represent its confidence. The higher the threshold, the more credible the data obtained in this way. Each method has a value to indicate its confidence. For example, in the above embodiment In, the user registration or verification data is obtained through three methods: crawler algorithm acquisition, device detection acquisition, and registration or verification information sent by users. Through previous data comparison and calculation, a confidence level can be set for each of the three methods. For example, the confidence level of the data obtained through the crawler algorithm is A, the confidence level of the data obtained through the device detection method is B, and the confidence level of the registration or verification information sent by the user is C. When the data is obtained through step S2110, then According to the source of the data, match the corresponding confidence.

S2130. Select the combination of the operation terminal data with the greatest support and confidence as reference data;

The formula of Support is: Support(A->B)=P(A U B). Support reveals the probability of A and B appearing at the same time. If the probability of A and B appearing at the same time is small, it means that the relationship between A and B is not great; if A and B appear very frequently at the same time, it means that A and B are always related.

Confidence formula: Confidence(A->B)=P(A | B). Confidence level reveals whether B will also appear when A appears, or how likely it is to appear. If the confidence level is 100%, then A and B can be sold in a bundle. If the confidence is too low, it means that the appearance of A has little to do with whether or not B appears.

In this application, because the above-mentioned operating terminal data comes from different acquisition methods, multiple sets of operating terminal data can be obtained, and each data of each set of operating terminal data will be calculated according to the formula of support and confidence. Select the data with the largest support and confidence under the corresponding item, and combine them as the benchmark data for this calculation.

S2140. Mark the comparison result of the operation terminal data acquired by each of the acquisition methods with the reference data according to the first rule to form a feature set.

Since the operating terminal data can be obtained from a variety of ways, comparing the operating terminal data with the benchmark data, and marking according to the first rule, a set of feature data can be obtained, and this set of feature data is a feature set.

In an embodiment, the first rule is that the data in the operating terminal data that is the same as the reference data is marked as 1 as a positive sample, and the data that is different from the reference data is marked as 0 as a negative sample. In this way, the above multiple sets of operation terminal data constitute a feature set consisting of 0 or 1.

S2200, the Naive The Bayes detection model learns the identification of positive samples and negative samples at the same time; the OneClassSVM detection model of the positive class and the isolated forest classification and detection model of the positive class learn the identification of positive samples; the OneClassSVM detection model of the negative class and the negative class The isolated forest classification and detection model learns to identify negative samples.

Naive The Bayes detection model is a classification algorithm that recognizes positive samples and negative samples separately. For example, when inputting data that needs to be recognized, the output is positive or negative. Among them, when the same as the positive sample, it is the positive class. Output 1, when it is the same as the negative sample, it is a negative type, and -1 is output. Because in Naive In the Bayes detection model, both positive sample training and negative sample training are carried out, so the positive output and negative output can be obtained more accurately. The OneClassSVM detection model of the positive class is mainly trained by positive samples, so the output of the positive class is more accurate, and the OneClassSVM detection model of the negative class is mainly trained by negative samples, so the output of the negative class is more accurate. Similarly, the positive class is isolated The forest classification and detection model is mainly accurate for the output of the positive class, and the isolated forest classification and detection model of the negative class is more accurate for the output of the negative class.

After the combination result information is obtained through the above step S2000, the combination result information is voted to obtain the final result information. Specifically, please refer to FIG. 4, and the multiple sub-results in the combination result information are preset The rules for voting to obtain the final result information include:

S3100: Vote multiple sub-results in the obtained combined result information according to the Bagging strategy;

S3200. The result information with the largest number of marks is selected as the final result.

In this application, the operating terminal data is obtained through crawler algorithms, device detection, and registration or verification information sent by users. There are many ways to obtain it. The data obtained by each method may be the same or different. This leads to the diversity of the operating terminal data, and the operating terminal data is input into the combined detection model for detection, and the combined result information is obtained. The detection results of the combined detection model are independent of each other, and different detection models have different training principles. , The training data may also be different. Each detection model has its own characteristics, so the combined result information obtained may also be different. Combine the detection results obtained by different detection models to obtain the combined result information. Vote the obtained combination result information according to the Bagging strategy.

Bagging (bagging) is also called self-aggregation, which is a technique of repeatedly sampling (with replacement) from data according to a uniform probability distribution. On the self-service sample set generated by each sample, train a base classifier; vote on the trained classifier, and assign the test sample to the class with the highest vote. Each self-service sample set is as large as the original data. With replacement sampling, some samples may appear multiple times in the same training set, and some may be ignored.

By comparing the combined result information, the one with the most identical data is selected as the final result. For example, in an embodiment, it is assumed that the acquired operating terminal data are: device type, system type, version number, resolution, and IP address. In the step of constructing a combined feature set, pass

Assuming that the Naive Bayes detection model, the OneClassSVM detection model for positive and negative classes, and the isolated forest classification and detection model for positive and negative classes are used, the combined result information obtained by the five detection models is as follows:

	Data 1	Data 2	Data 3	Data 4	Data 5
Naive Bayes detection model	1	1	0	0	1
Positive OneClassSVM detection model	1	0	1	0	1
Negative class OneClassSVM detection model	1	1	0	1	1
Isolated forest points	0	0	0	0	1
Class and detection model
Negative classification and detection model of isolated forest	1	1	0	1	0
Final Results	1	1	0	0	1

In the above example, the five detection models output five sets of data. Since each set of data is the data formed by comparing the output results of the operating terminal data with the benchmark data, and marking them according to the first rule, each set of data All have uniformity, that is, data that is summarized as 0 or 1, which facilitates comparative voting. From the above five sets of data, it can be seen that for data 1, the number of "1"s is the largest, so the final result of data 1 is "1", the final result of data 2 is "1", the final result of data 3 is "0", the final result of data 4 is "0", the final result of data 5 is "1", so the final result is "1" , 1, 0, 0, 1".

In the technical solution of this application, by decomposing complex text-based device data, using an effective feature conversion method, combining the incomprehensible multiple nominal attribute data with the sample distribution, transforming it into a 0-1 binary combination feature to generate Combining feature sets with distinguishing characteristics, mining effective classification feature sets, you can use this feature set for model training, and get a better anomaly detection model. At the same time, five algorithms are used to construct the detection model under the Bagging strategy. According to Bagging Strategy, build multiple models for anomaly detection, Naive Bayes gives a comprehensive probability from the overall distribution of the sample. OneClassSVM and Isolation Forest respectively give the test results of the samples from the normal and abnormal aspects. Using the five judgment results, it can more comprehensively judge whether the registration and verify whether the user is abnormal, and it is effective It avoids the one-sided problem of training a single detection model using only normal samples with a large amount of data, and reduces the Naive caused by unbalanced samples. The inaccuracy of Bayes classification improves the accuracy of anomaly detection.

On the other hand, please refer to FIG. 5. This application discloses an abnormality detection device, including:

Obtaining module 1000: configured to execute operation terminal data when registering or verifying by acquiring a user, where the operation terminal data is combined data including two or more of device type, system information and IP address, The system information includes the system type, version number, and resolution; the processing module 2000: is configured to execute the input of the operating terminal data into the combined detection model for detection to obtain combined result information, wherein the combined detection model It includes two or more detection models, each of the detection models outputs a corresponding sub-result, and a plurality of the sub-results generate combined result information; the execution module 3000: configured to execute the combination result information A number of sub-results are voted according to preset rules to obtain the final result information.

Optionally, the detection model in the combined detection model includes: Naive The Bayes detection model, the OneClassSVM detection model of the positive class, the OneClassSVM detection model of the negative class, the isolated forest classification and detection model of the positive class, and the isolated forest classification and detection model of the negative class, the Naive The Bayes detection model learns the identification of positive samples and negative samples at the same time; the OneClassSVM detection model of the positive class and the isolated forest classification and detection model of the positive class learn the identification of positive samples; the OneClassSVM detection model of the negative class and the negative class The isolated forest classification and detection model learns to identify negative samples.

Optionally, the processing module further includes: a feature set construction module: configured to obtain sample data to construct a combined feature set, wherein the combined feature set includes positive samples and negative samples.

Optionally, the feature set construction module further includes: a sample acquisition module: configured to execute operation terminal data obtained through at least two acquisition methods during user registration or verification as sample data, wherein the acquisition method Including acquisition through crawler algorithm, device detection, and acquisition from registration or verification information sent by users; calculation module: configured to perform calculations on the support and confidence of sample data acquired by each acquisition method; first selection module : Configured to perform selection of the combination of the operating terminal data with the highest degree of support and confidence as the reference data; marking module: configured to perform the combination of the operating terminal data acquired by each of the acquisition methods and the reference data The comparison result of is marked according to the first rule to form a feature set.

Optionally, the method for obtaining operation terminal data when the user performs registration or verification includes: obtaining by crawling algorithm, obtaining by device detection, and obtaining from registration or verification information sent by the user.

Optionally, the first rule is: the data in the operating terminal data that is the same as the reference data is marked as 1, as a positive sample, and the data that is different from the reference data is marked as 0, as a negative sample.

Optionally, the execution module includes: a voting module: configured to perform voting for multiple sub-results in the obtained combined result information according to a Bagging strategy; a second selection module: configured to execute the result information with the largest number of selected marks As the final result.

Since the above-mentioned abnormality detection device is a one-to-one correspondence of the abnormality detection method, its function and execution principle are the same, so it will not be repeated here.

Please refer to FIG. 5 for the basic structural block diagram of the computer equipment provided by the embodiment of the present invention.

The computer device includes a processor, a nonvolatile storage medium, a memory, and a network interface connected through a system bus. Wherein, the non-volatile storage medium of the computer device stores an operating system, a database, and computer-readable instructions. The database may store control information sequences. When the computer-readable instructions are executed by the processor, the processor can realize a An anomaly detection method. The processor of the computer equipment is used to provide calculation and control capabilities, and supports the operation of the entire computer equipment. Computer readable instructions may be stored in the memory of the computer device, and when the computer readable instructions are executed by the processor, the processor may execute an abnormality detection method. The network interface of the computer device is used to connect and communicate with the terminal. Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

The computer device receives the status information of the prompt behavior sent by the associated client, that is, whether the associated terminal opens the prompt and whether the lender closes the prompt task. By verifying whether the above-mentioned task conditions are fulfilled, the corresponding preset instruction is sent to the associated terminal, so that the associated terminal can perform corresponding operations according to the preset instruction, thereby realizing effective supervision of the associated terminal. At the same time, when the prompt information state is different from the preset state command, the server side controls the associated terminal to continue ringing to prevent the prompt task of the associated terminal from being automatically terminated after a period of time.

The present invention also provides a storage medium storing computer-readable instructions, which when executed by one or more processors, cause one or more processors to perform the abnormality detection described in any of the above embodiments method.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a computer readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a magnetic disk, an optical disk, a read-only storage memory (Read-Only Non-volatile storage media such as Memory, ROM, or Random Access Memory (RAM), etc.

It should be understood that, although the various steps in the flowchart of the drawings are shown in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least part of the steps in the flowchart of the drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a part of sub-steps or stages of other steps.

The above are only part of the embodiments of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications are also It should be regarded as the protection scope of the present invention.

Claims

An anomaly detection method, characterized in that it comprises:

Obtain the operating terminal data of the user during registration or verification, where the operating terminal data is a combination of two or more of device type, system information and IP address, and the system information includes system type and version Number and resolution;

Input the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, and each detection model outputs a corresponding sub As a result, multiple said sub-results generate combined result information; To

Voting is performed on multiple sub-results in the combined result information according to a preset rule to obtain final result information.
The abnormality detection method according to claim 1, wherein the detection model in the combined detection model comprises: Naive Bayes detection model, OneClassSVM detection model of positive class, OneClassSVM detection model of negative class, isolated forest classification and detection model of positive class, and isolated forest classification and detection model of negative class.
The anomaly detection method according to claim 2, wherein the training method of the combined detection model comprises:

Acquiring sample data to construct a combined feature set, wherein the combined feature set includes a positive sample and a negative sample;

The Naive Bayes detection model simultaneously learns the recognition of positive samples and negative samples;

The positive class OneClassSVM detection model and the positive class isolated forest classification and detection model learn the identification of positive samples;

The OneClassSVM detection model of the negative class and the isolated forest classification and detection model of the negative class learn the identification of negative samples.
The abnormality detection method according to claim 3, wherein the method of obtaining sample data to construct a combined feature set comprises:

Using at least two acquisition methods to obtain user registration or verification operation terminal data as sample data, where the acquisition methods include acquisition through crawler algorithms, device detection, and acquisition from registration or verification information sent by the user;

Calculate the support and confidence of the sample data acquired by each acquisition method;

Selecting the combination of the operation terminal data with the greatest support and confidence as the reference data;

The comparison result of the operation terminal data acquired by each acquisition method and the reference data is marked according to the first rule to form a feature set.
The abnormality detection method according to claim 4, wherein the first rule is: the data in the operating terminal data that is the same as the reference data is marked as 1, as a positive sample, and a label different from the reference data 0, as a negative sample.
The abnormality detection method according to claim 4, wherein the method of voting on multiple sub-results in the combined result information according to preset rules to obtain the final result information comprises:

Vote multiple sub-results in the obtained combined result information according to the Bagging strategy;

The result information with the most marks is selected as the final result.
An abnormality detection device, characterized in that it comprises:

Obtaining module: configured to execute the operation terminal data when the user is registered or verified by acquiring, wherein the operation terminal data is a combination of two or more data including device type, system information and IP address, so The system information includes system type, version number and resolution;

Processing module: configured to execute input of the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, each of which The detection models all output corresponding sub-results, and multiple said sub-results generate combined result information;

Execution module: configured to perform voting on multiple sub-results in the combined result information according to preset rules to obtain final result information.
The abnormality detection device according to claim 7, wherein the detection model in the combined detection model comprises: Naive Bayes detection model, OneClassSVM detection model of positive class, OneClassSVM detection model of negative class, isolated forest classification and detection model of positive class, and isolated forest classification and detection model of negative class.
The abnormality detection device according to claim 8, wherein the processing module further comprises:

Feature set construction module: configured to obtain sample data to construct a combined feature set, wherein the combined feature set includes positive samples and negative samples;

The Naive The Bayes detection model learns the identification of positive samples and negative samples at the same time; the OneClassSVM detection model of the positive class and the isolated forest classification and detection model of the positive class learn the identification of positive samples; the OneClassSVM detection model of the negative class and the negative class The isolated forest classification and detection model learns to identify negative samples.
The abnormality detection device according to claim 9, wherein the feature set construction module further comprises:

Sample acquisition module: configured to execute operation terminal data obtained through at least two acquisition methods during user registration or verification as sample data, wherein the acquisition methods include acquisition through crawler algorithms, device detection, and sending from users Obtained from the registration or verification information;

Calculation module: configured to perform calculations on the support and confidence of the sample data acquired by each acquisition method;

The first selection module: configured to perform selection of the combination of the operation terminal data with the greatest support and confidence as the reference data;

The marking module is configured to mark the comparison result of the operation terminal data obtained by each of the acquisition methods and the reference data according to the first rule to form a feature set.
The abnormality detection device according to claim 10, wherein the first rule is: the data in the operating terminal data that is the same as the reference data is marked as 1, as a positive sample, and a label different from the reference data 0, as a negative sample.
The abnormality detection device according to claim 10, wherein the execution module comprises:

Voting module: configured to perform voting for multiple sub-results in the obtained combined result information according to the Bagging strategy;

The second selection module: configured to execute the result information with the largest number of selection marks as the final result.
A computer device includes a memory and a processor, and computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor executes the following steps:

Obtain the operating terminal data of the user during registration or verification, where the operating terminal data is a combination of two or more of device type, system information and IP address, and the system information includes system type and version Number and resolution;

Input the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, and each detection model outputs a corresponding sub As a result, multiple said sub-results generate combined result information; To

Voting is performed on multiple sub-results in the combined result information according to a preset rule to obtain final result information.
The computer device according to claim 13, wherein the detection model in the combined detection model comprises: Naive Bayes detection model, OneClassSVM detection model of positive class, OneClassSVM detection model of negative class, isolated forest classification and detection model of positive class, and isolated forest classification and detection model of negative class.
The computer device according to claim 14, wherein when the computer-readable instructions are executed by the processor, the processor is caused to perform the following steps:

Acquiring sample data to construct a combined feature set, wherein the combined feature set includes a positive sample and a negative sample;

The Naive Bayes detection model simultaneously learns the recognition of positive samples and negative samples;

The positive class OneClassSVM detection model and the positive class isolated forest classification and detection model learn the identification of positive samples;

The OneClassSVM detection model of the negative class and the isolated forest classification and detection model of the negative class learn the identification of negative samples.
The computer device according to claim 15, wherein when the computer-readable instructions are executed by the processor, the processor is caused to perform the following steps:

Using at least two acquisition methods to obtain user registration or verification operation terminal data as sample data, where the acquisition methods include acquisition through crawler algorithms, device detection, and acquisition from registration or verification information sent by the user;

Calculate the support and confidence of the sample data acquired by each acquisition method;

Selecting the combination of the operation terminal data with the greatest support and confidence as the reference data;

The comparison result of the operation terminal data acquired by each acquisition method and the reference data is marked according to the first rule to form a feature set.
A storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Obtain the operating terminal data of the user during registration or verification, where the operating terminal data is a combination of two or more of device type, system information and IP address, and the system information includes system type and version Number and resolution;

Input the operation terminal data into a combined detection model for detection to obtain combined result information, wherein the combined detection model includes two or more detection models, and each detection model outputs a corresponding sub As a result, multiple said sub-results generate combined result information; To

Voting is performed on multiple sub-results in the combined result information according to a preset rule to obtain final result information.
The storage medium storing computer-readable instructions according to claim 17, wherein the detection model in the combined detection model comprises: Naive Bayes detection model, OneClassSVM detection model of positive class, OneClassSVM detection model of negative class, isolated forest classification and detection model of positive class, and isolated forest classification and detection model of negative class.
The storage medium storing computer-readable instructions according to claim 18, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:

Acquiring sample data to construct a combined feature set, wherein the combined feature set includes a positive sample and a negative sample;

The Naive Bayes detection model simultaneously learns the recognition of positive samples and negative samples;

The positive class OneClassSVM detection model and the positive class isolated forest classification and detection model learn the identification of positive samples;

The OneClassSVM detection model of the negative class and the isolated forest classification and detection model of the negative class learn the identification of negative samples.
The storage medium storing computer-readable instructions according to claim 19, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Using at least two acquisition methods to obtain user registration or verification operation terminal data as sample data, where the acquisition methods include acquisition through crawler algorithms, device detection, and acquisition from registration or verification information sent by the user;

Calculate the support and confidence of the sample data acquired by each acquisition method;

Selecting the combination of the operation terminal data with the greatest support and confidence as the reference data;

The comparison result of the operation terminal data acquired by each acquisition method and the reference data is marked according to the first rule to form a feature set. To