CN110166462B - Access control method, system, electronic device and computer storage medium - Google Patents

Access control method, system, electronic device and computer storage medium Download PDF

Info

Publication number
CN110166462B
CN110166462B CN201910442550.1A CN201910442550A CN110166462B CN 110166462 B CN110166462 B CN 110166462B CN 201910442550 A CN201910442550 A CN 201910442550A CN 110166462 B CN110166462 B CN 110166462B
Authority
CN
China
Prior art keywords
data
user
detection
model
detection model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910442550.1A
Other languages
Chinese (zh)
Other versions
CN110166462A (en
Inventor
刘新
潘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Launch Technology Co Ltd
Original Assignee
Shenzhen Launch Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Launch Technology Co Ltd filed Critical Shenzhen Launch Technology Co Ltd
Priority to CN201910442550.1A priority Critical patent/CN110166462B/en
Publication of CN110166462A publication Critical patent/CN110166462A/en
Application granted granted Critical
Publication of CN110166462B publication Critical patent/CN110166462B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general

Abstract

The application provides an access control method, an access control system, electronic equipment and a computer storage medium. The method comprises the following steps: receiving an access request of a user; inputting the equipment information into a detection model to obtain a detection result of the user, wherein the detection model is obtained by model fusion of a first detection model, a second detection model and a third detection model, and the detection result comprises a normal user or an abnormal user; and executing a corresponding access strategy according to the detection result.

Description

Access control method, system, electronic device and computer storage medium
Technical Field
The present application relates to the field of computers, and in particular, to an access control method, system, electronic device, and computer storage medium.
Background
The internet is full of a large number of crawlers, if a user uses a crawler tool to crawl website contents at will, the website contents may be acquired and utilized by other people in a low-cost manner, and loss is brought to the website. Moreover, frequent crawler requests also increase the traffic load, which deteriorates the experience of normal browsing users. Therefore, from the perspective of internet security, a website usually performs abnormal user detection to detect abnormal users such as crawlers and key sprites, and thus, the security of website data and the stability of a system are ensured.
The technical scheme for detecting the abnormal user comprises a scheme for detecting a crawler tool and a scheme for detecting a verification code, wherein the access control method for the verification code is to judge the user type according to behavior data by acquiring data such as click time, mouse dragging track and the like in the user verification process, and the method is high in error rate and easy to judge a real user as an abnormal user to influence the user experience; the detection scheme for the crawler usually judges whether the crawler is a crawler according to parameters carried by an IP and a header of a request and request frequency, and then makes a corresponding anti-crawler strategy to reject the request, but a crawler developer can effectively deal with various anti-crawler strategies due to changing the IP and random request time interval or simulating a browser, and develops a targeted directional crawler according to the anti-crawler strategy of a target website, so that the security maintenance of the website is complex, the anti-crawler strategy needs to be updated in real time, and the development pressure of the developer is increased.
Disclosure of Invention
The application provides an access control method, an access control system, electronic equipment and a computer storage medium, which are used for solving the problems that when a website detects abnormal users, the detection error rate is high, detection strategies need to be developed directionally, and the like.
In a first aspect, an access control method is provided, the method including:
receiving an access request of a user, wherein the access request comprises equipment information of the user, and the equipment information comprises equipment type, system information and address information when the user accesses and verifies;
inputting the equipment information into a detection model to obtain a detection result of the user, wherein the detection model is obtained by model fusion of a first detection model, a second detection model and a third detection model, and the detection result comprises a normal user or an abnormal user;
and executing a corresponding access strategy according to the detection result.
Optionally, the first detection model is obtained by performing supervised training on a first sample set, the second detection model is obtained by performing unsupervised training on a second sample set, the third detection model is obtained by performing unsupervised training on a third sample set, the first sample set includes the second sample set and the third sample set, the first sample set includes known device information and known detection results, wherein the first sample set is obtained by converting data in the first data set into binary feature data for model training through feature engineering; the second sample set is obtained by converting data in the second data set into binary characteristic data for model training through characteristic engineering; the third sample set is obtained by converting data in the third data set into binary characteristic data for model training through characteristic engineering; inputting the device information into a detection model, and obtaining a detection result of the user includes: converting the equipment information into binary characteristic data through characteristic engineering; and inputting the binary characteristic data into a detection model to obtain a detection result of the user.
Optionally, before receiving the access request of the user, the method further includes: receiving a simulation access request of a simulation normal user and a simulation abnormal user, wherein the simulation access request comprises equipment information of the simulation normal user and equipment information of the simulation abnormal user; and taking the equipment information of the simulated normal user as a second data set, and taking the equipment information of the simulated abnormal user as a third data set.
Optionally, the device information includes a device type, system information, and address information when the user performs access verification, and before receiving an access request from the user, the method further includes: acquiring equipment information data during user access verification by a front-end point burying method, wherein the equipment information data comprises equipment type data, system information data and address information data; acquiring real system information and real address information corresponding to the equipment type data through a third-party platform; matching the real system information with the system information data, and matching the real address information with the address information data; under the condition that the real system information is inconsistent with the system information data, taking the device information data corresponding to the system information data as the third data set; under the condition that the real address information is inconsistent with the address information data, taking the equipment information data corresponding to the address information data as the third data set; and under the condition that the real system information is consistent with the system information data and the real address information is also consistent with the address information data, taking the equipment information data corresponding to the system information data as the second data set.
Optionally, before receiving the access request of the user, the method further includes: taking the second data set as the known equipment information in the first data set, wherein the corresponding known detection result is a normal user; and taking the third data set as the known equipment information in the first data set, wherein the corresponding known detection result is an abnormal user.
Optionally, the first detection model is a binary classification model obtained after supervised training of a gaussian distribution-based naive bayesian network model using the first sample set; the second detection model is a single classification model obtained by using the second sample set to perform unsupervised training on an SVM network model based on a single classification support vector set; the third detection model is a single classification model obtained by using the third sample set to perform unsupervised training on a SVM network model based on a single classification support vector set, wherein the first detection model, the second detection model and the third detection model are trained in parallel based on a repeated sampling bagging idea.
Optionally, the model fusion comprises: and performing fusion calculation on a first detection result, a second detection result and a third detection result by adopting a first voting criterion to obtain a detection result, wherein the first detection result is the detection result of the first detection model, the second detection result is the detection result of the second detection model, and the third detection result is the detection result of the third detection model.
Optionally, the executing the corresponding access policy according to the detection result includes: if the detection result is an abnormal user, rejecting the access request of the user or requiring the user to re-verify; and responding to the access request of the user under the condition that the detection result is a normal user.
In a second aspect, there is provided an access control apparatus, the apparatus comprising:
a receiving unit, configured to receive an access request of a user, where the access request includes device information of the user, and the device information includes a device type, system information, and address information when the user performs access verification;
the detection unit is used for inputting the equipment information into a detection model to obtain a detection result of the user, wherein the detection model is obtained by model fusion of a first detection model, a second detection model and a third detection model, and the detection result comprises a normal user or an abnormal user;
and the execution unit is used for executing the corresponding access strategy according to the detection result.
Optionally, the first detection model is obtained by performing supervised training on a first sample set, the second detection model is obtained by performing unsupervised training on a second sample set, the third detection model is obtained by performing unsupervised training on a third sample set, the first sample set includes the second sample set and the third sample set, the first sample set includes known device information and known detection results, wherein the first sample set is obtained by converting data in the first data set into binary feature data for model training through feature engineering; the second sample set is obtained by converting data in the second data set into binary characteristic data for model training through characteristic engineering; the third sample set is obtained by converting data in the third data set into binary characteristic data for model training through characteristic engineering; inputting the device information into a detection model, and obtaining a detection result of the user includes: converting the equipment information into binary characteristic data through characteristic engineering; and inputting the binary characteristic data into a detection model to obtain a detection result of the user.
Optionally, the apparatus further includes a sample obtaining unit, configured to receive, before receiving an access request of a user, a simulated access request for simulating a normal user and a simulated abnormal user, where the simulated access request includes device information of the simulated normal user and device information of the simulated abnormal user; the sample acquisition unit is used for taking the equipment information of the simulated normal user as a second data set and taking the equipment information of the simulated abnormal user as a third data set.
Optionally, the device information includes a device type, system information, and address information during the user access verification, and the sample obtaining unit is further configured to obtain device information data during the user access verification by a front-end point burying method before receiving an access request of a user, where the device information data includes device type data, system information data, and address information data; acquiring real system information and real address information corresponding to the equipment type data through a third-party platform; matching the real system information with the system information data, and matching the real address information with the address information data; under the condition that the real system information is inconsistent with the system information data, taking the device information data corresponding to the system information data as the third data set; under the condition that the real address information is inconsistent with the address information data, taking the equipment information data corresponding to the address information data as the third data set; and under the condition that the real system information is consistent with the system information data and the real address information is also consistent with the address information data, taking the equipment information data corresponding to the system information data as the second data set.
Optionally, the sample obtaining unit is further configured to, before receiving an access request of a user, use the second data set as known device information in the first data set, where a corresponding known detection result is a normal user;
the sample acquisition unit is further configured to use the third data set as known device information in the first data set, and a corresponding known detection result is an abnormal user.
Optionally, the first detection model is a binary classification model obtained after supervised training of a gaussian distribution-based naive bayesian network model using the first sample set; the second detection model is a single classification model obtained by using the second sample set to perform unsupervised training on an SVM network model based on a single classification support vector set; the third detection model is a single classification model obtained by using the third sample set to perform unsupervised training on a SVM network model based on a single classification support vector set, wherein the first detection model, the second detection model and the third detection model are trained in parallel based on a repeated sampling bagging idea.
Optionally, the apparatus further includes a model fusion unit, configured to perform fusion calculation on a first detection result, a second detection result, and a third detection result by using a first voting criterion, so as to obtain a detection result, where the first detection result is a detection result of the first detection model, the second detection result is a detection result of the second detection model, and the third detection result is a detection result of the third detection model.
Optionally, the execution unit is configured to, after the detection result of the user is obtained, reject the access request of the user or require the user to re-authenticate the user if the detection result is an abnormal user; the execution unit is used for responding to the access request of the user under the condition that the detection result is a normal user.
In a third aspect, an embodiment of the present application provides a server, including a processor, an input device, an output device, and a memory, where the memory is used to store a computer program that supports a terminal to execute the above method, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method of the first aspect.
According to the access control method, the system, the electronic device and the computer storage medium, the access request of the user is received, wherein the access request comprises the device information of the user, the device information comprises the device type, the system information and the address information when the user accesses and verifies, the device information is input into a detection model, the detection result of the user is obtained, and the corresponding access strategy is executed according to the detection result. The method utilizes the characteristic engineering and the machine learning algorithm, three detection models are constructed by using three sample sets with the same characteristics respectively, and the three detection models are fused into one detection model through model fusion, so that the prediction result is good, the one-sided problem of a single detection model is avoided, and the inaccuracy of classification of the detection models caused by sample imbalance is reduced simultaneously, so that the accuracy rate of detection of abnormal users is greatly improved, and along with the increase of the sample amount, the detection model can continuously learn more abnormal user characteristics, further solves the problem that developers need to directionally develop detection strategies aiming at specific abnormal user scenes.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of an access control method provided in the present application;
FIG. 2 is a schematic structural diagram of a detection model provided in the present application;
FIG. 3 is a schematic flow chart of a sample acquisition method provided herein;
FIG. 4 is a schematic flow chart of an access control method provided herein;
fig. 5 is a schematic structural diagram of an access control device provided in the present application;
fig. 6 is a schematic diagram of a server structure provided in the present application.
Detailed Description
The present application will be described in further detail below with reference to the accompanying drawings by way of specific embodiments. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted in different instances or may be replaced by other methods. In some instances, certain operations related to the present application have not been shown or described in the specification, in order not to obscure the core portions of the present application with excessive description. It is not necessary for those skilled in the art to describe these related operations in detail, and they can fully understand the related operations according to the description in the specification and the general technical knowledge in the field.
Fig. 1 is a schematic flowchart of an access control method provided in the present application. As can be seen from fig. 1, the access control method provided in the present application includes the following steps:
s101: an access request of a user is received.
In this embodiment, the access request includes device information of the user, where the device information includes a device type, system information, and address information when the user performs access authentication. The device type may be a specific device model of a device used in user access authentication, such as Iphone XS, Vivo X20, Macbook Air, and the like, and the system information may include a system type of the device, such as IOS, Andriod, Window, Linux, and the like, a system version, such as IOS12.3, Andriod4.1, Windows10, and the like, and may further include device parameter information such as resolution, screen pixel density, CPU model, and the like, such as resolution 1920 × 1080, screen pixel density 401ppi, CPU model Apple a + M9, frequency 2.1GHz, and the like.
In this embodiment, the device information of the user may be obtained by means of front-end embedding. It should be understood that, a developer may use a variety of front-end embedding methods, insert monitoring logic in a front-end page, for example, command embedding, insert monitoring logic in a place to be monitored in a front-end page code, for example, full embedding, the front-end automatically collects all events, reports embedding data, filters and calculates useful data by the back-end, for example, a policy combining command embedding and full embedding, for example, 30% command embedding and 70% full embedding, and selects different embedding schemes under different scenes. In specific implementation, the front-end page after the point is buried can obtain the device information of the user by using the JavaScript script. It should be understood that the foregoing examples are merely illustrative, and the front end embedding manner may also include other embedding manners capable of acquiring the user equipment information, which is not limited herein.
S102: and inputting the equipment information into a detection model to obtain a detection result of the user.
In an embodiment of the present application, the detection model is obtained by model fusion of a first detection model, a second detection model, and a third detection model, the first detection model is obtained by performing supervised training using a first sample set, the second detection model is obtained by performing unsupervised training on a second sample set, the third detection model is obtained by performing unsupervised training on a third sample set, the first sample set includes the second sample set and the third sample set, and the first sample set includes known device information and known detection results. Fig. 2 is a schematic structural diagram of a detection model provided in the present application, as can be seen from fig. 2, the detection model is formed by fusing three models, and a sample set used by a first detection model is all samples, that is, a first sample set, a sample set used by a second detection model is samples of a normal user, that is, a second sample set, a sample set used in a third detection model training is samples of an abnormal user, that is, a third sample set, and the first sample set is composed of the second sample set and the third sample set. It can be understood that, the detection model is obtained by fusing three detection models, so that in the prediction stage, the first detection model can give out a comprehensive detection probability, the second detection model and the third detection model can respectively give out the detection results of the samples from the aspects of normality and abnormality, and by utilizing the judgment of the three detection models, the user abnormality detection can be carried out completely, and the one-sidedness problem that a single detection model is trained by only using normal samples with more data volume is effectively avoided.
S103: and executing a corresponding access strategy according to the detection result.
In this embodiment of the present application, the detection result includes an abnormal user and a normal user, and the executing the corresponding access policy according to the detection result includes: if the detection result is an abnormal user, rejecting the access request of the user or requiring the user to re-verify; and responding to the access request of the user under the condition that the detection result is a normal user. Simply, it is determined whether to respond to the request of the user based on the detection result given by the detection model. Optionally, in order to improve the accuracy of the detection, when the probability of the abnormal user in the detection result does not reach the preset threshold, for example, the last detection result shows that only 52% of the probability is the abnormal user, and the preset threshold is more than 70%, at this time, the verification request may be sent to the user again to request the user to perform the verification again, so as to improve the accuracy of the verification.
The following describes the training method of the detection model in detail, wherein the training of the detection model can be divided into three stages, namely a sample acquisition stage, a feature engineering stage, and a model training stage.
The sample acquisition phase is explained first.
It can be understood that before model training, a large amount of data is acquired and converted into a sample set for model training. It can be understood that, since the model is used for detecting an abnormal user, the sample should include known device information and a corresponding known detection stage, and such data can be directly obtained in the network by using a crawler as a picture sample, and a developer needs to obtain the device information by himself and mark a corresponding abnormal user label and a corresponding normal user label for the device information. Therefore, two methods are adopted for sample acquisition in the sample acquisition stage. One is a method of simulation verification, and the other is a method of comparing accurate data.
In this embodiment of the present application, the method for simulating authentication means that, before the receiving of the access request of the user, the method further includes: receiving a simulation access request of a simulation normal user and a simulation abnormal user, wherein the simulation access request comprises equipment information of the simulation normal user and equipment information of the simulation abnormal user; and taking the equipment information of the simulated normal user as a second data set, and taking the equipment information of the simulated abnormal user as a third data set. For example, the developer may perform normal registration verification using a mobile phone, and place the device information acquired at the front end at this time as a sample of a normal user into the second data set. An automatic verification tool such as a key puck can also be used for performing registration verification of the abnormality, and the device information acquired at the front end at this time is taken as a sample of the abnormal user and is put into the third data set. It can be understood that the device information and the corresponding tag obtained by the method for simulation verification are accurate tags and tags without errors, but a large amount of experiments are required for simulation verification, and much time is consumed, so that another method for obtaining sample data is provided by the application.
In this embodiment of the present application, the method for comparing accurate data means that before receiving an access request from a user, the method further includes: acquiring equipment information data during user access verification by a front-end point burying method, wherein the equipment information data comprises equipment type data, system information data and address information data; acquiring real system information and real address information corresponding to the equipment type data through a third-party platform; matching the real system information with the system information data, and matching the real address information with the address information data; under the condition that the real system information is inconsistent with the system information data, taking the device information data corresponding to the system information data as the third data set; under the condition that the real address information is inconsistent with the address information data, taking the equipment information data corresponding to the address information data as the third data set; and under the condition that the real system information is consistent with the system information data and the real address information is also consistent with the address information data, taking the equipment information data corresponding to the system information data as the second data set. In short, according to the current device type of the user obtained by the embedded point, the real system information and the address type corresponding to the device type are crawled by a web crawler and compared with the device type obtained by the embedded point, if the real system information and the address type are consistent, the device information is not tampered by the user, so that the device information is a normal user, the device information can be used as a sample of the normal user and put into the second data set, and if the real system information and the address type are inconsistent, the device information is tampered by the user, the device information is used as a sample of the abnormal user and put into the third data set.
For example, FIG. 3 shows how sample data may be obtained using a more accurate data method. As shown in fig. 3, assuming that the device data obtained by the front end embedded point is Iphone6, the system information is Linux, the IP address is 116.24.67.65, and the system type of the Iphone6 device obtained by a third party such as a web crawler is IOS, the historical IP address of the user obtained by historical access data is 116.24.67.65, the device data obtained by the two ways are compared, although the device type is always the same, the IP address is also always the same, but the system type is different, which may be the case because the user verifies after modifying the IP address and the device type by using an automatic verification tool, so the device data can be marked as data of an abnormal user and put into a third data set. It should be understood that the data acquisition by the comparison method is simple and fast, and a large amount of sample data can be obtained quickly, but the sample tags obtained by simply comparing whether the data are consistent are not as accurate as the sample tags obtained by the simulation verification, so that developers can selectively use the sample acquisition method in combination with a specific application scenario, for example, a part of samples are obtained by using a method of relatively accurate data, and a part of samples are obtained by using a method of simulation verification, and the above examples are merely illustrative and cannot be used for specific limitation.
In this embodiment of the present application, before receiving an access request from a user, the method further includes: taking the second data set as the known equipment information in the first data set, wherein the corresponding known detection result is a normal user; and taking the third data set as the known equipment information in the first data set, wherein the corresponding known detection result is an abnormal user. That is, the data in the first data set includes sample data of a normal user and also includes sample data of an abnormal user, the second data set only includes data of the normal user, and the third data set only includes data of the abnormal user. Three different models are trained by utilizing three data sets, so that the user abnormity detection can be carried out very comprehensively, and the one-sidedness problem that a single detection model is trained by only utilizing normal samples with more data volumes is effectively avoided.
Next, the feature extraction stage is explained.
It can be understood that the data used for model training determines the upper limit of machine learning, and the perfect algorithm only approaches the upper limit as much as possible, so that after the first data set, the second data set and the third data set are obtained at the sample stage, the data sets need to be converted into sample data of model training by using feature engineering, better training data features can be obtained, the machine learning model approaches the upper limit, and the performance of the finally trained detection model is greatly improved.
In the embodiment of the application, the first sample set is obtained by converting data in the first data set into binary feature data for model training through feature engineering; the second sample set is obtained by converting data in the second data set into binary characteristic data for model training through characteristic engineering; the third sample set is obtained by converting data in the third data set into binary characteristic data for model training through characteristic engineering; inputting the device information into a detection model, and obtaining a detection result of the user includes: converting the equipment information into binary characteristic data through characteristic engineering; and inputting the binary characteristic data into a detection model to obtain a detection result of the user. It can be understood that, since the data used in the training is binary feature data, the prediction stage should also be to convert the device information obtained in step S101 into binary feature data, and then input the binary feature data into the detection model for detection.
In specific implementation, the feature engineering may convert data of different specifications into a unified specification through a non-dimensionalization idea, for example, may convert a nominal attribute in the device information into a binary feature, and may specifically perform feature mining on the first data set, the second data set, and the third data set through the following three ways.
Marking by taking value-taking point as reference
The marking with the value taking point as a reference means that a distinctive value taking point is selected for part of the device information according to the distribution of data of normal users and abnormal users, and with the value taking point as the reference, a sample equal to the value taking point is marked as 0, and a sample unequal to the value taking point is marked as 1. For example, for a normal user who normally uses the IE browser to perform registration verification, the device resolution is generally higher than 1200 × 780, but for an abnormal user who uses the server to perform background automatic verification, the device resolution is lower than this value, so 1200 × 780 can be used as a distinctive value-taking point, a sample higher than this resolution is marked as 0, and a sample lower than this resolution is marked as 1. It should be understood that the value-taking point is a specific number, so the device information corresponding to the method can only be a numerical feature, and nominal attributes such as device type, system type and the like cannot be converted into a binary feature by using the method.
Marking by taking (II) value subset as reference
The marking with the value subset as the reference means that a value subset with distinctiveness is selected for partial device information according to the distribution of data of normal users and abnormal users, and with the value subset as the reference, samples belonging to the value subset are marked as 0, and samples not belonging to the value subset are marked as 1. For example, for the Linux system, the normal version types are centros 6, centros 7, ubantu, rehat, and so on, so the set of normal version types can be used as the differentiated value subset of the normal data object. Samples belonging to this subset of version types are labeled 0 and samples not belonging to this subset of version types are labeled 1. Thereby converting the version type and other nominal attributes into binary characteristics.
(III) marking by taking value set as reference
The marking with the value set as the reference means that a distinctive value set is selected for part of the device information according to the distribution of data of normal users and abnormal users, and with the value set as the reference, samples belonging to the value set are marked as 0, and samples not equal to the value set are marked as 1. That is, both methods (one) and (two) feature mine for a single feature, such as for resolution, for system type, etc., while method (three) feature mine for multiple features combined. For example, the device type (PC type, mobile phone model, etc.) and the resolution are used as a differentiated value set, assuming that the device information currently acquired through step S101 is Iphone7, and in the sample of a normal user, the support degree and the confidence degree of the 1920 × 1080 resolution values corresponding to the Iphone7 are large (frequently occurring), then the group of binary pair values can distinguish abnormal and normal data, the sample equal to the binary pair value is marked as 0, and the sample unequal to the binary pair value is marked as 1. And the features of different specifications are dimensionless into uniform binary features so as to facilitate the following sample training.
It should be noted that the value taking point, the value taking subset, and the value taking set may be obtained by analyzing the normal user sample in the first sample set, for example, in the foregoing, the set of the normal version type of the Linux system is used as the distinctive value taking subset of the normal data object. The statistical analysis method can also be obtained according to the distribution of normal users and abnormal user samples, for example, assuming that 12 samples are currently owned and labeled in sequence, "1" represents a normal user, "1" identifies an abnormal user, the normal samples are a, B, C, E, H, J, K, and the abnormal samples: d, F, G, I, abnormal sample distribution ratio 33.3%, and table 1 shows the respective attributes of the 12 samples.
TABLE 1 sample data and corresponding tags
Sample(s) Resolution, operating system, label Sample(s) Resolution, operating system, label
A 1200*456,windows,1 G 800*600,windows,-1
B 1900*720,windows,1 H 1280*720,windows,1
C 900*600,windows,1 I 400*300,windows,-1
D 1980*780,windows,-1 J 1200*456,windows,1
E 1600*900,windows,1 K 1980*780,windows,1
F 800*600,windows,-1 L 1980*780,windows,1
It can be understood that, for the windows system, the low resolution is likely to be the crawler server, and therefore, for the combined features formed by the windows system and the resolution, the value set can be determined by analyzing the sample distribution of the normal users and the abnormal users. For example, if a windows system with a resolution of 800 × 600 is selected as the differentiated value set, the normal samples are a, B, C, E, H, J, K, and the abnormal samples are: f, G, I, the distribution percentage of the abnormal samples is 25%, it should be understood that the abnormal rate is about 3% and is a reasonable range, therefore the resolution 800 × 600 is not suitable for the distinctive point of the combined feature, and assuming that the resolution is 400 × 300, the abnormal percentage can be calculated to be 8.3%, therefore the distinctive value set selected according to the 12 sample distribution is the windows system with the resolution of 400 × 300. It should be noted that, for each sample, the corresponding feature is obtained through the resolution value distribution, which is higher than the discrimination value, the feature value is 1, otherwise, the feature value is 0.
It should be understood that the feature mining method used in the feature engineering is only used for illustration, and the present application may also convert the nominal attribute in the device information into sample data of a binary feature or other sample data with a better training data feature by using other feature mining methods, and the present application is not limited in particular.
Third, the model training phase is explained.
In this embodiment of the present application, the first detection model is a binary classification model obtained after supervised training of a gaussian distribution based naive Bayes network (Navive Bayes) model by using the first sample set; the second detection model is a single classification model obtained by using the second sample set to perform unsupervised training on an SVM network model (One Class Support Vector Machine, One Class SVM) based on single classification Support Vector set; the third detection model is a single classification model obtained by using the third sample set to perform unsupervised training on a SVM network model based on a single classification support vector set, wherein the first detection model, the second detection model and the third detection model are trained in parallel based on a repeated sampling bagging idea. That is to say, the trained first detection model is a two-class model, the trained second detection model and the trained third detection model are single-class models, the first detection model can learn the characteristics of positive and negative samples (abnormal user samples and normal user samples) simultaneously in the training process, the comprehensive evaluation probability of whether each input data is abnormal can be predicted in the prediction process, the second detection model and the third detection model only learn the characteristics of samples of a single class, and the unilateral evaluation probability can be given by judging whether the input data has the characteristics of the class of the input data in the prediction process. The characteristics of the sample are learned at multiple angles by utilizing the three models, the abnormal condition of the registered verification user can be judged comprehensively according to the prediction result predicted by the model, the one-sided problem of a single detection model trained by utilizing a normal sample with more data is effectively avoided, the inaccuracy of classification of a naive Bayesian network caused by sample imbalance is reduced, and the accuracy of the final detection result is greatly improved.
In this embodiment of the present application, after training and obtaining a first detection model, a second detection model, and a third detection model by using a first sample set, a second sample set, and a third sample set, respectively, an operation of model fusion needs to be performed on the three models to obtain a final detection model, where the model fusion includes: and performing fusion calculation on a first detection result, a second detection result and a third detection result by adopting a first voting criterion to obtain a detection result, wherein the first detection result is the detection result of the first detection model, the second detection result is the detection result of the second detection model, and the third detection result is the detection result of the third detection model. In a specific implementation, in a model training process, a first detection model, a second detection model, and a third detection model may be constructed based on a bagging strategy, and a final prediction result is determined based on a first voting criterion, where the first voting criterion may be determined according to sample distribution in a training stage, for example, voting by using a weighted average method, and the present application is not limited specifically.
It should be understood that the bagging strategy refers to the random sampling with the put back for the original training set containing m samples, that is, the samples may still be collected at the next sampling, and finally T training sets are obtained, wherein each training set contains m samples. Because random sampling is adopted, the sampling set of each time is different from the original training set and is also different from other sampling sets, a plurality of sampling sets are used for respectively training the first detection model, the second detection model and the third detection model, and finally the training results of different models are fused by adopting a regression (averaging) method, for example, the adopted weighted average method is used for fusing the prediction results of the three models, so that the detection model with stronger classification performance than the first detection model, the second detection model and the third detection model is obtained. For the existing anti-crawler strategies, the corresponding anti-crawler strategies can be formulated only according to user request data, and by using the access control method provided by the application, the same detection model can be used for identifying diversified abnormal scenes only by autonomously learning the distinguishing rules in abnormal users and policy user data through feature engineering and machine learning algorithms in the early stage. The prediction result can be used for carrying out back propagation on the detection model, the detection model is continuously updated, and more complex conditions can be covered along with the increase of the sample size.
Fig. 4 is a schematic flow chart of an access control method provided in the present application, and as can be seen from fig. 4, after acquiring device data of a user, in step S101, data (text types) of multiple dimensions in the device data are first converted into numerical features capable of characterizing whether a data object is abnormal according to a feature engineering pair. Then inputting the data into three detection models, wherein the first detection model is trained by using a first sample set consisting of normal user samples and abnormal user samples, and the learned classification features on the overall distribution can give a comprehensive probability, the second detection model is trained aiming at a second sample set composed of normal users and learning the data characteristics of the normal users, the third detection model is trained aiming at a third sample set composed of abnormal users and learning the data characteristics of the abnormal users, thereby using the judgment results of the three to give a comprehensive prediction result, effectively avoiding the one-sidedness problem of a single detection model trained by using more normal samples, meanwhile, the problem that the classification of the first detection model is inaccurate due to uneven samples is solved, and the accuracy of detection of abnormal users is further improved.
In this embodiment of the present application, the detection result includes an abnormal user and a normal user, and after the detection result of the user is obtained, the method further includes: if the detection result is an abnormal user, rejecting the access request of the user or requiring the user to re-verify; and responding to the access request of the user under the condition that the detection result is a normal user. Simply, it is determined whether to respond to the request of the user based on the detection result given by the detection model. Optionally, in order to improve the accuracy of the detection, when the probability of the abnormal user in the detection result does not reach the preset threshold, for example, the last detection result shows that only 52% of the probability is the abnormal user, and the preset threshold is more than 70%, at this time, the verification request may be sent to the user again to request the user to perform the verification again, so as to improve the accuracy of the verification.
In the method, an access request of a user is received, wherein the access request includes device information of the user, and the device information includes a device type, system information and address information when the user accesses and verifies, so that the device information is input into a detection model, a detection result of the user is obtained, and a corresponding access policy is executed according to the detection result. By the method, three detection models are constructed by utilizing three sample sets with the same characteristics respectively through a characteristic engineering algorithm and a machine learning algorithm, and the three detection models are fused into one detection model through model fusion, so that the prediction result is good, the one-sidedness problem of a single detection model is avoided, meanwhile, the inaccuracy of classification of the detection models caused by sample unbalance is reduced, the detection accuracy of abnormal users is greatly improved, the detection models can continuously learn more abnormal user characteristics along with the increase of the sample amount, and the problem that developers need directionally develop detection strategies aiming at specific abnormal user scenes is further solved.
Fig. 5 is a schematic structural diagram of an access control device provided in the present application. As can be seen from fig. 5, the access control apparatus 500 provided in the present application includes a receiving unit 510, a detecting unit 520, and an executing unit 530, wherein,
the receiving unit 510 is configured to receive an access request of a user, where the access request includes device information of the user, and the device information includes a device type, system information, and address information when the user performs access verification;
the detection unit 520 is configured to input the device information into a detection model, so as to obtain a detection result of the user, where the detection model is obtained by model fusion of a first detection model, a second detection model, and a third detection model, the first detection model is obtained by performing supervised training on a first sample set, the second detection model is obtained by performing unsupervised training on a second sample set, the third detection model is obtained by performing unsupervised training on a third sample set, the first sample set includes the second sample set and the third sample set, and the first sample set includes known device information and a known detection result.
The executing unit 530 is configured to execute a corresponding access policy according to the detection result.
In this embodiment, the access request received by the receiving unit 510 includes device information of the user, where the device information includes a device type, system information, and address information when the user performs access verification. The device type may be a specific device model of a device used in user access authentication, such as Iphone XS, Vivo X20, Macbook Air, and the like, and the system information may include a system type of the device, such as IOS, Andriod, Window, Linux, and the like, a system version, such as IOS12.3, Andriod4.1, Windows10, and the like, and may further include device parameter information such as resolution, screen pixel density, CPU model, and the like, such as resolution 1920 × 1080, screen pixel density 401ppi, CPU model Apple a + M9, frequency 2.1GHz, and the like.
In this embodiment, the device information of the user may be obtained by means of front-end embedding. It should be understood that, a developer may use a variety of front-end embedding methods, insert monitoring logic in a front-end page, for example, command embedding, insert monitoring logic in a place to be monitored in a front-end page code, for example, full embedding, the front-end automatically collects all events, reports embedding data, filters and calculates useful data by the back-end, for example, a policy combining command embedding and full embedding, for example, 30% command embedding and 70% full embedding, and selects different embedding schemes under different scenes. In specific implementation, the front-end page after the point is buried can obtain the device information of the user by using the JavaScript script. It should be understood that the foregoing examples are merely illustrative, and the front end embedding manner may also include other embedding manners capable of acquiring the user equipment information, which is not limited herein.
In this embodiment, the detection model used by the detection unit 520 is obtained by model fusion of a first detection model, a second detection model and a third detection model, the first detection model is obtained by performing supervised training using a first sample set, the second detection model is obtained by performing unsupervised training on a second sample set, the third detection model is obtained by performing unsupervised training on a third sample set, the first sample set includes the second sample set and the third sample set, and the first sample set includes known device information and known detection results. Fig. 2 is a schematic structural diagram of a detection model provided in the present application, as can be seen from fig. 2, the detection model is formed by fusing three models, and a sample set used by a first detection model is all samples, that is, a first sample set, a sample set used by a second detection model is samples of a normal user, that is, a second sample set, a sample set used in a third detection model training is samples of an abnormal user, that is, a third sample set, and the first sample set is composed of the second sample set and the third sample set. It can be understood that, the detection model is obtained by fusing three detection models, so that in the prediction stage, the first detection model can give out a comprehensive detection probability, the second detection model and the third detection model can respectively give out the detection results of the samples from the aspects of normality and abnormality, and by utilizing the judgment of the three detection models, the user abnormality detection can be carried out completely, and the one-sidedness problem that a single detection model is trained by only using normal samples with more data volume is effectively avoided.
The method for training the detection model is described in detail below with reference to fig. 3, wherein the training of the detection model can be divided into three stages, namely, a sample acquisition stage, a feature engineering stage, and a model training stage.
The sample acquisition phase is explained first.
It can be understood that before model training, a large amount of data is acquired and converted into a sample set for model training. It can be understood that, since the model is used for detecting an abnormal user, the sample should include known device information and a corresponding known detection stage, and such data can be directly obtained in the network by using a crawler as a picture sample, and a developer needs to obtain the device information by himself and mark a corresponding abnormal user label and a corresponding normal user label for the device information. Therefore, two methods are adopted for sample acquisition in the sample acquisition stage. One is a method of simulation verification, and the other is a method of comparing accurate data.
In this embodiment of the application, the method of the simulation verification means that the apparatus further includes a sample obtaining unit 540, and the receiving unit 510 is further configured to receive a simulation access request for simulating a normal user and a simulation abnormal user before receiving an access request of a user, where the simulation access request includes device information of the simulation normal user and device information of the simulation abnormal user; the sample acquiring unit 540 is configured to use the device information of the simulated normal user as a second data set, and use the device information of the simulated abnormal user as a third data set. For example, the developer may perform normal registration verification using a mobile phone, and place the device information acquired at the front end at this time as a sample of a normal user into the second data set. An automatic verification tool such as a key puck can also be used for performing registration verification of the abnormality, and the device information acquired at the front end at this time is taken as a sample of the abnormal user and is put into the third data set. It can be understood that the device information and the corresponding tag obtained by the method for simulation verification are accurate tags and tags without errors, but a large amount of experiments are required for simulation verification, and much time is consumed, so that another method for obtaining sample data is provided by the application.
In this embodiment of the present application, the method for comparing accurate data refers to that the receiving unit 510 is further configured to obtain device information data during user access verification by a front-end point-burying method before receiving an access request of a user, where the device information data includes device type data, system information data, and address information data; the sample acquiring unit 540 is further configured to acquire real system information and real address information corresponding to the device type data through a third-party platform; the sample obtaining unit 540 is further configured to match the real system information with the system information data, and match the real address information with the address information data; the sample obtaining unit 540 is further configured to, in a case that the real system information is inconsistent with the system information data, take device information data corresponding to the system information data as the third data set; the sample obtaining unit 540 is further configured to, when the real address information is inconsistent with the address information data, take device information data corresponding to the address information data as the third data set; the sample obtaining unit 540 is further configured to, when the real system information is consistent with the system information data and the real address information is also consistent with the address information data, use the device information data corresponding to the system information data as the second data set. In short, according to the current device type of the user obtained by the embedded point, the real system information and the address type corresponding to the device type are crawled by a web crawler and compared with the device type obtained by the embedded point, if the real system information and the address type are consistent, the device information is not tampered by the user, so that the device information is a normal user, the device information can be used as a sample of the normal user and put into the second data set, and if the real system information and the address type are inconsistent, the device information is tampered by the user, the device information is used as a sample of the abnormal user and put into the third data set.
For example, FIG. 3 shows how sample data may be obtained using a more accurate data method. As shown in fig. 3, assuming that the device data obtained by the front end embedded point is Iphone6, the system information is Linux, the IP address is 116.24.67.65, and the system type of the Iphone6 device obtained by a third party such as a web crawler is IOS, the historical IP address of the user obtained by historical access data is 116.24.67.65, the device data obtained by the two ways are compared, although the device type is always the same, the IP address is also always the same, but the system type is different, which may be the case because the user verifies after modifying the IP address and the device type by using an automatic verification tool, so the device data can be marked as data of an abnormal user and put into a third data set. It should be understood that the data acquisition by the comparison method is simple and fast, and a large amount of sample data can be obtained quickly, but the sample tags obtained by simply comparing whether the data are consistent are not as accurate as the sample tags obtained by the simulation verification, so that developers can selectively use the sample acquisition method in combination with a specific application scenario, for example, a part of samples are obtained by using a method of relatively accurate data, and a part of samples are obtained by using a method of simulation verification, and the above examples are merely illustrative and cannot be used for specific limitation.
In this embodiment of the application, the sample obtaining unit 540 is further configured to, before receiving an access request from a user, use the second data set as known device information in the first data set, where a corresponding known detection result is a normal user; the sample acquiring unit 540 is further configured to use the third data set as the known device information in the first data set, where the corresponding known detection result is an abnormal user. That is, the data in the first data set includes sample data of a normal user and also includes sample data of an abnormal user, the second data set only includes data of the normal user, and the third data set only includes data of the abnormal user. Three different models are trained by utilizing three data sets, so that the user abnormity detection can be carried out very comprehensively, and the one-sidedness problem that a single detection model is trained by only utilizing normal samples with more data volumes is effectively avoided.
Next, the feature extraction stage is explained.
It can be understood that the data used for model training determines the upper limit of machine learning, and the perfect algorithm only approaches the upper limit as much as possible, so that after the first data set, the second data set and the third data set are obtained at the sample stage, the data sets need to be converted into sample data of model training by using feature engineering, better training data features can be obtained, the machine learning model approaches the upper limit, and the performance of the finally trained detection model is greatly improved.
In the embodiment of the application, the first sample set is obtained by converting data in the first data set into binary feature data for model training through feature engineering; the second sample set is obtained by converting data in the second data set into binary characteristic data for model training through characteristic engineering; the third sample set is obtained by converting data in the third data set into binary characteristic data for model training through characteristic engineering; inputting the device information into a detection model, and obtaining a detection result of the user includes: converting the equipment information into binary characteristic data through characteristic engineering; and inputting the binary characteristic data into a detection model to obtain a detection result of the user. It can be understood that, since the data used in the training is binary feature data, the prediction stage should also be to convert the device information obtained in step S101 into binary feature data, and then input the binary feature data into the detection model for detection.
In specific implementation, the feature engineering may convert data of different specifications into a unified specification through a non-dimensionalization idea, for example, may convert a nominal attribute in the device information into a binary feature, and may specifically perform feature mining on the first data set, the second data set, and the third data set through the following three ways.
Marking by taking value-taking point as reference
The marking with the value taking point as a reference means that a distinctive value taking point is selected for part of the device information according to the distribution of data of normal users and abnormal users, and with the value taking point as the reference, a sample equal to the value taking point is marked as 0, and a sample unequal to the value taking point is marked as 1. For example, for a normal user who normally uses the IE browser to perform registration verification, the device resolution is generally higher than 1200 × 780, but for an abnormal user who uses the server to perform background automatic verification, the device resolution is lower than this value, so 1200 × 780 can be used as a distinctive value-taking point, a sample higher than this resolution is marked as 0, and a sample lower than this resolution is marked as 1. It should be understood that the value-taking point is a specific number, so the device information corresponding to the method can only be a numerical feature, and nominal attributes such as device type, system type and the like cannot be converted into a binary feature by using the method.
Marking by taking (II) value subset as reference
The marking with the value subset as the reference means that a value subset with distinctiveness is selected for partial device information according to the distribution of data of normal users and abnormal users, and with the value subset as the reference, samples belonging to the value subset are marked as 0, and samples not belonging to the value subset are marked as 1. For example, for the Linux system, the normal version types are centros 6, centros 7, ubantu, rehat, and so on, so the set of normal version types can be used as the differentiated value subset of the normal data object. Samples belonging to this subset of version types are labeled 0 and samples not belonging to this subset of version types are labeled 1. Thereby converting the version type and other nominal attributes into binary characteristics.
(III) marking by taking value set as reference
The marking with the value set as the reference means that a distinctive value set is selected for part of the device information according to the distribution of data of normal users and abnormal users, and with the value set as the reference, samples belonging to the value set are marked as 0, and samples not equal to the value set are marked as 1. That is, both methods (one) and (two) feature mine for a single feature, such as for resolution, for system type, etc., while method (three) feature mine for multiple features combined. For example, the device type (PC type, mobile phone model, etc.) and the resolution are used as a differentiated value set, assuming that the device information currently acquired through step S101 is Iphone7, and in the sample of a normal user, the support degree and the confidence degree of the 1920 × 1080 resolution values corresponding to the Iphone7 are large (frequently occurring), then the group of binary pair values can distinguish abnormal and normal data, the sample equal to the binary pair value is marked as 0, and the sample unequal to the binary pair value is marked as 1. And the features of different specifications are dimensionless into uniform binary features so as to facilitate the following sample training.
It should be noted that the value taking point, the value taking subset, and the value taking set may be obtained by analyzing the normal user sample in the first sample set, for example, in the foregoing, the set of the normal version type of the Linux system is used as the distinctive value taking subset of the normal data object. The statistical analysis method can also be obtained according to the distribution of normal users and abnormal user samples, for example, assuming that 12 samples are currently owned and labeled in sequence, "1" represents an abnormal user, "1" identifies a normal user, and normal samples are a, B, C, E, H, J, K, and abnormal samples: d, F, G, I, abnormal sample distribution ratio 33.3%, and table 1 shows the respective attributes of the 12 samples.
It can be understood that, for the windows system, the low resolution is likely to be the crawler server, and therefore, for the combined features formed by the windows system and the resolution, the value set can be determined by analyzing the sample distribution of the normal users and the abnormal users. For example, if a windows system with a resolution of 800 × 600 is selected as the differentiated value set, the normal samples are a, B, C, E, H, J, K, and the abnormal samples are: d, F, G, I, the abnormal sample distribution percentage is 33.3%, the abnormal percentage calculated according to the value set is higher than 5% of the actual distribution, so the resolution 800 × 600 is not suitable for being used as the distinctive point of value taking of the combined feature, the lowered resolution value is assumed to be 400 × 300, and the abnormal percentage can be calculated to be 8.3%, so the distinctive value set selected according to the 12 sample distribution conditions is the windows system with the resolution of 400 × 300. It should be noted that, for each sample, the corresponding feature is obtained through the resolution value distribution, which is higher than the discrimination value, the feature value is 1, otherwise, the feature value is 0.
It should be understood that the feature mining method used in the feature engineering is only used for illustration, and the present application may also convert the nominal attribute in the device information into sample data of a binary feature or other sample data with a better training data feature by using other feature mining methods, and the present application is not limited in particular.
Third, the model training phase is explained.
In this embodiment of the present application, the first detection model is a binary classification model obtained after supervised training of a gaussian distribution based naive Bayes network (Navive Bayes) model by using the first sample set; the second detection model is a single classification model obtained by using the second sample set to perform unsupervised training on an SVM network model (One Class Support Vector Machine, One Class SVM) based on single classification Support Vector set; the third detection model is a single classification model obtained by using the third sample set to perform unsupervised training on a SVM network model based on a single classification support vector set, wherein the first detection model, the second detection model and the third detection model are trained in parallel based on a repeated sampling bagging idea. That is to say, the trained first detection model is a two-class model, the trained second detection model and the trained third detection model are single-class models, the first detection model can learn the characteristics of positive and negative samples (abnormal user samples and normal user samples) simultaneously in the training process, the comprehensive evaluation probability of whether each input data is abnormal can be predicted in the prediction process, the second detection model and the third detection model only learn the characteristics of samples of a single class, and the unilateral evaluation probability can be given by judging whether the input data has the characteristics of the class of the input data in the prediction process. The characteristics of the sample are learned at multiple angles by utilizing the three models, the abnormal condition of the registered verification user can be judged comprehensively according to the prediction result predicted by the model, the one-sided problem of a single detection model trained by utilizing a normal sample with more data is effectively avoided, the inaccuracy of classification of a naive Bayesian network caused by sample imbalance is reduced, and the accuracy of the final detection result is greatly improved.
In this embodiment of the present application, after training and obtaining the first detection model, the second detection model, and the third detection model by using the first sample set, the second sample set, and the third sample set, respectively, the three models need to be subjected to model fusion to obtain a final detection model, and the apparatus further includes a model fusion unit 550, where the model fusion unit 550 is configured to: and performing fusion calculation on a first detection result, a second detection result and a third detection result by adopting a first voting criterion to obtain a detection result, wherein the first detection result is the detection result of the first detection model, the second detection result is the detection result of the second detection model, and the third detection result is the detection result of the third detection model. In a specific implementation, in a model training process, a first detection model, a second detection model, and a third detection model may be constructed based on a bagging strategy, and a final prediction result is determined based on a first voting criterion, where the first voting criterion may be determined according to sample distribution in a training stage, for example, voting by using a weighted average method, and the present application is not limited specifically.
It should be understood that the bagging strategy refers to the random sampling with the put back for the original training set containing m samples, that is, the samples may still be collected at the next sampling, and finally T training sets are obtained, wherein each training set contains m samples. Because random sampling is adopted, the sampling set of each time is different from the original training set and is also different from other sampling sets, a plurality of sampling sets are used for respectively training the first detection model, the second detection model and the third detection model, and finally the training results of different models are fused by adopting a regression (averaging) method, for example, the adopted weighted average method is used for fusing the prediction results of the three models, so that the detection model with stronger classification performance than the first detection model, the second detection model and the third detection model is obtained. For the existing anti-crawler strategies, the corresponding anti-crawler strategies can be formulated only according to user request data, and by using the access control method provided by the application, the same detection model can be used for identifying diversified abnormal scenes only by autonomously learning the distinguishing rules in abnormal users and policy user data through feature engineering and machine learning algorithms in the early stage. The prediction result can be used for carrying out back propagation on the detection model, the detection model is continuously updated, and more complex conditions can be covered along with the increase of the sample size.
Fig. 4 is a schematic flow chart of an access control method provided in the present application, and as can be seen from fig. 4, after acquiring device data of a user, in step S101, data (text types) of multiple dimensions in the device data are first converted into numerical features capable of characterizing whether a data object is abnormal according to a feature engineering pair. Then inputting the data into three detection models, wherein the first detection model is trained by using a first sample set consisting of normal user samples and abnormal user samples, and the learned classification features on the overall distribution can give a comprehensive probability, the second detection model is trained aiming at a second sample set composed of normal users and learning the data characteristics of the normal users, the third detection model is trained aiming at a third sample set composed of abnormal users and learning the data characteristics of the abnormal users, thereby using the judgment results of the three to give a comprehensive prediction result, effectively avoiding the one-sidedness problem of a single detection model trained by using more normal samples, meanwhile, the problem that the classification of the first detection model is inaccurate due to uneven samples is solved, and the accuracy of detection of abnormal users is further improved.
In this embodiment of the present application, the detection result includes an abnormal user and a normal user, and the execution unit 530 is specifically configured to, after obtaining the detection result of the user, deny an access request of the user or request the user to re-authenticate when the detection result is the abnormal user; the execution unit 530 is specifically configured to respond to the access request of the user when the detection result is a normal user. Simply, it is determined whether to respond to the request of the user based on the detection result given by the detection model. Optionally, in order to improve the accuracy of the detection, when the probability of the abnormal user in the detection result does not reach the preset threshold, for example, the last detection result shows that only 52% of the probability is the abnormal user, and the preset threshold is more than 70%, at this time, the verification request may be sent to the user again to request the user to perform the verification again, so as to improve the accuracy of the verification.
In the above apparatus, an access request of a user is received, where the access request includes device information of the user, and the device information includes a device type, system information, and address information when the user accesses the authentication, so that the device information is input to a detection model to obtain a detection result of the user. Through the device, utilize characteristic engineering and machine learning algorithm, three kinds of detection models have been constructed to the sample set of same characteristic respectively of use, and fuse three kinds of detection models into one detection model through the model fusion, make the prediction result good avoid single detection model's one-sidedness problem, reduced simultaneously because the sample is unbalanced leads to the categorised inaccuracy of detection model, thereby improve the rate of accuracy that the abnormal user detected greatly, and along with the increase of sample size, this detection model can learn more abnormal user characteristics constantly, further solved the developer and need be directed against specific abnormal user scene, the problem of directional development detection strategy.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a server provided in the present application. The access control method of this embodiment may be implemented in a cloud service cluster as shown in fig. 6, or may be implemented in a single server, which is not specifically limited in this application, where the cloud service cluster includes at least one computing node 610 and at least one storage node 620.
The compute node 610 includes one or more processors 611, a communications interface 612, and memory 613. The processor 611, the communication interface 612, and the memory 613 may be connected by a bus 614.
The processor 611 includes one or more general-purpose processors, which may be any type of device capable of Processing electronic instructions, including a Central Processing Unit (CPU), a microprocessor, a microcontroller, a main processor, a controller, and an ASIC (Application Specific Integrated Circuit), among others. It can be a dedicated processor for the compute node 610 only or can be shared with other compute nodes 610. Processor 611 executes various types of digitally stored instructions, such as software or firmware programs stored in memory 613, which enable computing node 610 to provide a wide variety of services. For example, the processor 611 can execute programs or process data to perform at least a portion of the methods discussed herein.
The communication interface 612 may be a wired interface (e.g., an ethernet interface) for communicating with other computing nodes or users. When communication interface 612 is a wired interface, communication interface 612 may employ a Protocol family over TCP/IP, such as RAAS Protocol, Remote Function Call (RFC) Protocol, Simple Object Access Protocol (SOAP) Protocol, Simple Network Management Protocol (SNMP) Protocol, Common Object Request Broker Architecture (CORBA) Protocol, and distributed Protocol, among others.
The Memory 613 may include a Volatile Memory (Volatile Memory), such as a Random Access Memory (RAM); the Memory may also include a Non-Volatile Memory (Non-Volatile Memory), such as a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a Hard Disk Drive (HDD), or a Solid-State Drive (SSD) Memory, which may also include a combination of the above types of memories.
Storage node 620 includes one or more processors 621, a communication interface 622, and memory 623. The processor 621, the communication interface 622, and the memory 623 may be connected by a bus 624.
The processor 621 includes one or more general-purpose processors, wherein a general-purpose processor may be any type of device capable of processing electronic instructions, including a CPU, microprocessor, microcontroller, host processor, controller, and ASIC, among others. It can be a dedicated processor for only storage node 620 or can be shared with other storage nodes 620. Processor 621 executes various types of digital storage instructions, such as software or firmware programs stored in memory 223, which enable storage node 620 to provide a wide variety of services. For example, the processor 611 can perform sample acquisition, feature engineering, and model training to perform at least a portion of the methods discussed herein.
The communication interface 622 may be a wired interface (e.g., an ethernet interface) for communicating with other computing devices or users.
The storage node 620 includes one or more storage controllers 621 and a storage array 625. The memory controller 621 and the memory array 625 may be connected by a bus 626.
The memory controller 621 includes one or more general-purpose processors, wherein a general-purpose processor may be any type of device capable of processing electronic instructions, including a CPU, microprocessor, microcontroller, host processor, controller, and ASIC, among others. It can be a dedicated processor for only a single storage node 620 or can be shared with the compute node 40 or other storage nodes 620. It is understood that in this embodiment, each storage node includes one storage controller, and in other embodiments, a plurality of storage nodes may share one storage controller, which is not limited herein.
The memory array 625 may include a plurality of memories. The memory may be a non-volatile memory, such as a ROM, flash memory, HDD or SSD memory, and may also include a combination of the above kinds of memory. For example, the storage array may be composed of a plurality of HDDs or a plurality of SDDs, or the storage array may be composed of HDDs and SDDs. In which a plurality of memories are combined in various ways with the aid of the memory controller 321 to form a memory bank, thereby providing higher storage performance than a single memory and providing a data backup technique. Alternatively, the memory array 625 may include one or more data centers. The plurality of data centers may be located at the same site or at different sites, and are not limited herein. The memory array 625 may store program codes as well as program data. Wherein the program code comprises feature first detection model code and second detection model code, etc. The program data includes: the first sample set, the second sample set, and the third sample set are used for the processor 611 to obtain the detection model by training using the program data.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, memory Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a few embodiments of the present application, and those skilled in the art can make various changes or modifications to the present application without departing from the spirit and scope of the present application, based on the disclosure of the application document.

Claims (10)

1. An access control method, characterized in that the method comprises:
receiving an access request of a user, wherein the access request comprises equipment information of the user;
inputting the equipment information into a detection model to obtain the detection result of the user, wherein the detection model is obtained by model fusion of a first detection model, a second detection model and a third detection model, the detection result comprises normal users or abnormal users, the first detection model is obtained by using a first sample set to carry out supervised training, the second detection model is obtained by performing unsupervised training on a second sample set, the third detection model is obtained by performing unsupervised training on a third sample set, the first sample set comprises the second sample set and the third sample set, the second sample set comprises device information of known normal users and corresponding known detection results, the third sample set comprises equipment information of known abnormal users and corresponding known detection results;
and executing a corresponding access strategy according to the detection result.
2. The method according to claim 1, wherein the first sample set is obtained by converting data in the first data set into binary feature data for model training through feature engineering;
the second sample set is obtained by converting data in the second data set into binary characteristic data for model training through characteristic engineering;
the third sample set is obtained by converting data in the third data set into binary characteristic data for model training through characteristic engineering;
inputting the device information into a detection model, and obtaining a detection result of the user includes:
converting the equipment information into binary characteristic data through characteristic engineering;
and inputting the binary characteristic data into a detection model to obtain a detection result of the user.
3. The method of claim 2, wherein prior to receiving the user's access request, the method further comprises:
receiving a simulation access request of a simulation normal user and a simulation abnormal user, wherein the simulation access request comprises equipment information of the simulation normal user and equipment information of the simulation abnormal user;
and taking the equipment information of the simulated normal user as a second data set, and taking the equipment information of the simulated abnormal user as a third data set.
4. The method of claim 3, wherein the device information comprises a device type, system information, and address information of the user during the access authentication, and before receiving the access request of the user, the method further comprises:
acquiring equipment information data during user access verification by a front-end point burying method, wherein the equipment information data comprises equipment type data, system information data and address information data;
acquiring real system information and real address information corresponding to the equipment type data through a third-party platform;
matching the real system information with the system information data, and matching the real address information with the address information data;
under the condition that the real system information is inconsistent with the system information data, taking the device information data corresponding to the system information data as the third data set;
under the condition that the real address information is inconsistent with the address information data, taking the equipment information data corresponding to the address information data as the third data set;
and under the condition that the real system information is consistent with the system information data and the real address information is also consistent with the address information data, taking the equipment information data corresponding to the system information data as the second data set.
5. The method of claim 4, wherein prior to receiving the user's access request, the method further comprises:
taking the second data set as the known equipment information in the first data set, wherein the corresponding known detection result is a normal user;
and taking the third data set as the known equipment information in the first data set, wherein the corresponding known detection result is an abnormal user.
6. The method according to any one of claims 1 to 5,
the first detection model is a binary classification model obtained after supervised training is carried out on a naive Bayesian network model based on Gaussian distribution by using the first sample set;
the second detection model is a single classification model obtained by using the second sample set to perform unsupervised training on an SVM network model based on a single classification support vector set;
the third detection model is a single classification model obtained by using the third sample set to perform unsupervised training on a SVM network model based on a single classification support vector set, wherein the first detection model, the second detection model and the third detection model are trained in parallel based on a repeated sampling bagging idea.
7. The method of claim 1, wherein the model fusion comprises:
and performing fusion calculation on a first detection result, a second detection result and a third detection result by adopting a first voting criterion to obtain a detection result, wherein the first detection result is the detection result of the first detection model, the second detection result is the detection result of the second detection model, and the third detection result is the detection result of the third detection model.
8. The method of claim 1, wherein the performing the corresponding access policy according to the detection result comprises:
if the detection result is an abnormal user, rejecting the access request of the user or requiring the user to re-verify;
and responding to the access request of the user under the condition that the detection result is a normal user.
9. An access control apparatus, characterized in that the apparatus comprises:
a receiving unit, configured to receive an access request of a user, where the access request includes device information of the user, and the device information includes a device type, system information, and address information when the user performs access verification;
a detection unit for inputting the device information into a detection model to obtain a detection result of the user, wherein the detection model is obtained by model fusion of a first detection model, a second detection model and a third detection model, the detection result comprises normal users or abnormal users, the first detection model is obtained by using a first sample set to carry out supervised training, the second detection model is obtained by performing unsupervised training on a second sample set, the third detection model is obtained by performing unsupervised training on a third sample set, the first sample set comprises the second sample set and the third sample set, the second sample set comprises device information of known normal users and corresponding known detection results, the third sample set comprises equipment information of known abnormal users and corresponding known detection results;
and the execution unit is used for executing the corresponding access strategy according to the detection result.
10. A server comprising a processor, an input device, an output device, and a memory, wherein the memory is configured to store a computer program comprising program instructions, and wherein the processor is configured to invoke the program instructions to perform the method of any of claims 1-8.
CN201910442550.1A 2019-05-25 2019-05-25 Access control method, system, electronic device and computer storage medium Active CN110166462B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910442550.1A CN110166462B (en) 2019-05-25 2019-05-25 Access control method, system, electronic device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910442550.1A CN110166462B (en) 2019-05-25 2019-05-25 Access control method, system, electronic device and computer storage medium

Publications (2)

Publication Number Publication Date
CN110166462A CN110166462A (en) 2019-08-23
CN110166462B true CN110166462B (en) 2022-02-25

Family

ID=67632840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910442550.1A Active CN110166462B (en) 2019-05-25 2019-05-25 Access control method, system, electronic device and computer storage medium

Country Status (1)

Country Link
CN (1) CN110166462B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443274A (en) * 2019-06-28 2019-11-12 平安科技(深圳)有限公司 Method for detecting abnormality, device, computer equipment and storage medium
CN110708306B (en) * 2019-09-29 2022-07-12 贝壳找房(北京)科技有限公司 Data processing method, device and storage medium
CN110781173A (en) * 2019-10-12 2020-02-11 杭州城市大数据运营有限公司 Data identification method and device, computer equipment and storage medium
CN111428108A (en) * 2020-03-25 2020-07-17 山东浪潮通软信息科技有限公司 Anti-crawler method, device and medium based on deep learning
CN111783883A (en) * 2020-06-30 2020-10-16 平安普惠企业管理有限公司 Abnormal data detection method and device
CN115134102A (en) * 2021-03-24 2022-09-30 北京字节跳动网络技术有限公司 Abnormal access detection method and device, storage medium and electronic equipment
CN113542223A (en) * 2021-06-16 2021-10-22 杭州拼便宜网络科技有限公司 Equipment fingerprint-based crawler-resisting method
CN113591909A (en) * 2021-06-23 2021-11-02 北京智芯微电子科技有限公司 Abnormality detection method, abnormality detection device, and storage medium for power system
CN114039745A (en) * 2021-10-08 2022-02-11 中移(杭州)信息技术有限公司 Method, device and medium for identifying abnormal flow of website
CN114679320A (en) * 2022-03-29 2022-06-28 杭州安恒信息技术股份有限公司 Server protection method and device and readable storage medium
CN115269583B (en) * 2022-09-29 2022-12-16 南通君合云起信息科技有限公司 Unsupervised cleaning method for big data processing

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025547B (en) * 2016-09-19 2020-10-16 创新先进技术有限公司 Payment channel detection method and device and terminal
CN108009058A (en) * 2017-11-17 2018-05-08 阿里巴巴集团控股有限公司 Warping apparatus recognition methods and device and electronic equipment
CN108038700A (en) * 2017-12-22 2018-05-15 上海前隆信息科技有限公司 A kind of anti-fraud data analysing method and system
CN108681667A (en) * 2018-04-02 2018-10-19 阿里巴巴集团控股有限公司 A kind of unit type recognition methods, device and processing equipment
CN109547254B (en) * 2018-11-28 2022-03-15 湖北文理学院 Intrusion detection method and device, electronic equipment and storage medium
CN109635872B (en) * 2018-12-17 2020-08-04 上海观安信息技术股份有限公司 Identity recognition method, electronic device and computer program product

Also Published As

Publication number Publication date
CN110166462A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
CN110166462B (en) Access control method, system, electronic device and computer storage medium
CN112417439B (en) Account detection method, device, server and storage medium
CN112395159B (en) Log detection method, system, device and medium
CN111191767B (en) Vectorization-based malicious traffic attack type judging method
CN106992994A (en) A kind of automatically-monitored method and system of cloud service
US20210126931A1 (en) System and a method for detecting anomalous patterns in a network
CN106209862A (en) A kind of steal-number defence implementation method and device
CN111176953B (en) Abnormality detection and model training method, computer equipment and storage medium
CN112769605B (en) Heterogeneous multi-cloud operation and maintenance management method and hybrid cloud platform
CN106485261A (en) A kind of method and apparatus of image recognition
CN114024884B (en) Test method, test device, electronic equipment and storage medium
CN112199671A (en) Artificial intelligence-based malicious data analysis method and device and electronic device
CN111741002A (en) Method and device for training network intrusion detection model
KR102291615B1 (en) Apparatus for predicting failure of communication network and method thereof
CN113918438A (en) Method and device for detecting server abnormality, server and storage medium
CN113778802A (en) Anomaly prediction method and device
CN110704614B (en) Information processing method and device for predicting user group type in application
CN109922083B (en) Network protocol flow control system
WO2020258509A1 (en) Method and device for isolating abnormal access of terminal device
CN107517474B (en) Network analysis optimization method and device
WO2023050670A1 (en) False information detection method and system, computer device, and readable storage medium
CN111950448B (en) High-voltage isolating switch fault state detection method and device based on machine vision
CN114915446A (en) Intelligent network security detection method fusing priori knowledge
CN111385342B (en) Internet of things industry identification method and device, electronic equipment and storage medium
CN114357849A (en) User behavior abnormity detection method, system and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant