CN112149743A

CN112149743A - Access control method, device, equipment and medium

Info

Publication number: CN112149743A
Application number: CN202011026304.7A
Authority: CN
Inventors: 姚吉; 范渊
Original assignee: Hangzhou Dbappsecurity Technology Co Ltd
Current assignee: DBAPPSecurity Co Ltd; Hangzhou Dbappsecurity Technology Co Ltd
Priority date: 2020-09-25
Filing date: 2020-09-25
Publication date: 2020-12-29

Abstract

The application discloses an access control method, device, equipment and medium, wherein the method comprises the following steps: acquiring training set data, wherein the training set data is log data with unit type marks, and the log data comprises corresponding IP addresses; training a classification model constructed in advance based on a naive Bayes algorithm by using the training set data to obtain a trained classification model; when log data to be classified are obtained, classifying the log data to be classified by using the trained classification model to obtain a unit type to which an IP address to be classified belongs in the log data to be classified; and controlling the access request sent by the IP address to be classified according to the unit type of the IP address to be classified. Therefore, the identification accuracy of the unit type of the IP address is high, and the risk control efficiency of the IP address according to the unit type of the IP address is improved.

Description

Access control method, device, equipment and medium

Technical Field

The present application relates to the field of network security technologies, and in particular, to a method, an apparatus, a device, and a medium for access control.

Background

In the conventional IP (Internet Protocol Address) portrait technical principle, identification of units associated with an IP Address is mainly performed by positioning according to the IP Address itself to obtain positioning information, and then a corresponding unit is found from a map based on the positioning information to determine a unit type to which the IP Address belongs.

The inventor finds that in the prior art, when positioning is performed by using an IP address, the obtained positioning information itself is very inaccurate and low in precision, and in a city with dense buildings, a specific cell or building cannot be determined basically, and even if the positioning is performed to the specific building, various units resident in a general office building cannot be determined to which type of unit a certain IP address corresponds, so that the problem of low accuracy in identifying the type of unit to which the IP address belongs is caused, and the risk control efficiency of the IP address is low.

Disclosure of Invention

In view of the above, an object of the present application is to provide an access control method, apparatus, device, and medium, which can identify a unit type to which an IP address belongs, so as to control an access request issued by the IP address according to the unit type to which the IP address belongs, and the identification accuracy of the unit type to which the IP address belongs is high, thereby improving the efficiency of risk control on the IP address according to the unit type to which the IP address belongs. The specific scheme is as follows:

in a first aspect, the present application discloses an access control method, including:

acquiring training set data, wherein the training set data is log data with unit type marks, and the log data comprises corresponding IP addresses;

training a classification model constructed in advance based on a naive Bayes algorithm by using the training set data to obtain a trained classification model;

when log data to be classified are obtained, classifying the log data to be classified by using the trained classification model to obtain a unit type to which an IP address to be classified belongs in the log data to be classified;

and controlling the access request sent by the IP address to be classified according to the unit type of the IP address to be classified.

Optionally, the acquiring training set data includes:

acquiring an IP address library, wherein the IP address library comprises different IP addresses and unit types corresponding to the IP addresses;

acquiring an access log set in an access log library of a target website;

marking the unit type of each log data in the access log set according to the IP address library;

and using the log data marked by the unit type as training set data.

Optionally, the marking each log data in the access log set according to the IP address library in a unit type includes:

determining an IP address in current log data;

determining a target unit type corresponding to the IP address in the current log data from the IP address library according to the IP address in the current log data;

and marking the unit type to which the IP address in the current log data belongs as the target unit type.

Optionally, the training of the classification model constructed in advance based on the naive bayes algorithm by using the training set data to obtain the trained classification model includes:

training a classification model constructed in advance based on a naive Bayes algorithm by using the training set data to obtain a classification model to be tested;

and testing the classification model to be tested, and obtaining the trained classification model when the test result of the classification model to be tested meets the preset requirement.

Optionally, the training the classification model constructed in advance based on the naive bayes algorithm by using the training set data to obtain the classification model to be tested, includes:

extracting feature items of each log data in the training set data to obtain feature items corresponding to each log data in the training set data, wherein the feature items comprise access time, a domain name of a target website, an access terminal type, an IP address and a unit type to which the IP address belongs;

taking the access time, the domain name of the target website and the type of the access terminal as variables, and taking the unit type of the IP address as a classification result;

and inputting the characteristic items into a classification model constructed in advance based on a naive Bayes algorithm for training to obtain a classification model to be tested.

Optionally, the testing the test classification model includes:

testing the test classification model by using the test set data to obtain a test result;

and determining the classification accuracy corresponding to the classification model to be tested according to the test result.

Optionally, after determining the classification accuracy corresponding to the classification model to be tested according to the test result, the method further includes:

judging whether the classification accuracy is not less than a preset accuracy threshold;

and if the classification accuracy is not less than a preset accuracy threshold, taking the classification model to be tested as a classification model after training.

In a second aspect, the present application discloses an access control device, comprising:

the data acquisition module is used for acquiring training set data, wherein the training set data is log data with unit type marks, and the log data comprises corresponding IP addresses;

the model training module is used for training a classification model constructed in advance based on a naive Bayes algorithm by using the training set data to obtain a trained classification model;

the classification module is used for classifying the log data to be classified by utilizing the trained classification model when the log data to be classified are obtained, so as to obtain the unit type of the IP address to be classified in the log data to be classified;

and the risk control module is used for controlling the access request sent by the IP address to be classified according to the unit type of the IP address to be classified.

In a third aspect, the present application discloses an electronic device, comprising:

a memory and a processor;

wherein the memory is used for storing a computer program;

the processor is configured to execute the computer program to implement the access control method disclosed in the foregoing.

In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the access control method disclosed above.

It can be seen that, in the present application, training set data is obtained first, wherein the training set data is log data with unit type marks, the log data includes corresponding IP addresses, then a classification model constructed in advance based on a naive bayesian algorithm is trained by using the training set data to obtain a classification model after training, when the log data to be classified is obtained, the classification model after training is used to classify the log data to be classified to obtain a unit type to which an IP address to be classified belongs in the log data to be classified, and then an access request sent by the IP address to be classified can be controlled according to the unit type to which the IP address to be classified belongs. Thus, firstly, training a classification model constructed in advance based on a naive Bayes algorithm by utilizing training set data to obtain a trained classification model, when the log data to be classified is obtained, the trained classification model can be used for classifying the log data to be classified so as to determine the unit type of the IP address in the log to be classified, then the access request sent by the IP address can be controlled according to the unit type of the IP address, and the unit type of the IP address is determined by utilizing the trained classification model, compared with the prior art that the unit type of the IP address is directly positioned by utilizing the IP address and then is found according to the positioning information and the map, the identification accuracy rate of the unit type to which the IP address belongs is higher, so that the risk control efficiency of the IP address according to the unit type to which the IP address belongs is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of an access control method disclosed herein;

FIG. 2 is a flow chart of a specific access control method disclosed herein;

fig. 3 is a schematic structural diagram of an access control device disclosed in the present application;

fig. 4 is a schematic structural diagram of an electronic device disclosed in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Currently, the identification method of the unit type to which the IP address belongs is to firstly use the IP address to perform positioning to obtain positioning information, and then use the obtained positioning information to determine the unit to which the IP address belongs from a map, so as to determine the unit type to which the IP address belongs. However, when positioning is performed by using an IP address, the obtained positioning information itself is very inaccurate and low in precision, and in a city with dense buildings, a specific cell or building cannot be determined basically, and even if the positioning is performed to the specific building, various units resident in a general office building cannot be determined to which type of unit a certain IP address specifically corresponds, thereby causing a problem of low accuracy in identifying the unit to which the IP address belongs. In view of this, the present application provides an access control method, which can identify a unit type to which an IP address belongs, so as to control an access request issued by the IP address according to the unit type to which the IP address belongs, and the accuracy of identifying the unit type to which the IP address belongs is high, thereby improving the efficiency of performing risk control on the IP address according to the unit type to which the IP address belongs.

Referring to fig. 1, an embodiment of the present application discloses an access control method, including:

step S11: acquiring training set data, wherein the training set data is log data with unit type marks, and the log data comprises corresponding IP addresses.

In a specific implementation process, training set data needs to be acquired first, wherein the training set data is log data with unit type marks, and the log data includes a corresponding IP address.

Specifically, the acquiring of training set data includes: acquiring an IP address library, wherein the IP address library comprises different IP addresses and unit types corresponding to the IP addresses; acquiring an access log set in an access log library of a target website; marking the unit type of each log data in the access log set according to the IP address library; and using the log data marked by the unit type as training set data.

The method comprises the steps of obtaining IP addresses which are clearly associated to specific units from a third-party offline IP detail library to obtain an IP address library, obtaining an access log set from an access log library of a target website, obtaining access log data generated in the access log library of the target website within a period of time to obtain the access log set, marking each log data in the access log set with unit types according to the IP address library, using the log data with unit type marks as training set data, and deleting log data without unit types marked.

The unit type marking is carried out on each log data in the access log set according to the IP address library, and the unit type marking method comprises the following steps: determining an IP address in current log data; determining a target unit type corresponding to the IP address in the current log data from the IP address library according to the IP address in the current log data; and marking the unit type to which the IP address in the current log data belongs as the target unit type. That is, the IP address is found from the current log data, then the unit type to which the IP address in the current log data belongs is found from the IP address library to obtain the target unit type, and then the unit type to which the IP address in the current log data belongs can be marked as the target unit type.

Step S12: and training a classification model constructed in advance based on a naive Bayes algorithm by using the training set data to obtain a trained classification model.

It can be understood that, after the training set data is obtained, the classification model constructed in advance based on the naive bayes algorithm can be trained by the training set data to obtain the trained classification model.

Specifically, the training set data is used for training a classification model which is constructed in advance based on a naive Bayesian algorithm to obtain a classification model to be tested, then the classification model to be tested is tested, and when the test result of the classification model to be tested meets the preset requirement, the classification model after training is obtained.

Step S13: and when the log data to be classified is obtained, classifying the log data to be classified by using the trained classification model to obtain the unit type of the IP address to be classified in the log data to be classified.

After the trained classification model is obtained, when log data to be classified is obtained, the trained classification model can be used for classifying the log to be classified, and the unit type of the IP address to be classified in the log data to be classified is obtained.

That is, when the log data to be classified is acquired, the log data to be classified is input into the trained classification model, and the output result of the trained classification model is the unit type to which the IP address in the log data to be classified belongs.

Step S14: and controlling the access request sent by the IP address to be classified according to the unit type of the IP address to be classified.

After the unit type of the IP address to be classified in the log data to be classified is determined, the access request sent by the IP address to be classified can be controlled according to the unit type of the IP address to be classified. That is, the IP address risk control may be performed according to the unit type to which the IP address to be classified belongs.

Referring to fig. 2, an embodiment of the present application discloses a specific access control method, where the method includes:

step S21: acquiring training set data, wherein the training set data is log data with unit type marks, and the log data comprises corresponding IP addresses.

Step S22: and training a classification model constructed in advance based on a naive Bayes algorithm by using the training set data to obtain a classification model to be tested.

After the training set data is obtained, the classification model constructed in advance based on the naive Bayes algorithm can be trained by using the training set data to obtain a classification model with a test.

Specifically, extracting feature items of each log data in the training set data to obtain feature items corresponding to each log data in the training set data, wherein the feature items comprise access time, a domain name of a target website, an access terminal type, an IP address and a unit type to which the IP address belongs; taking the access time, the domain name of the target website and the type of the access terminal as variables, and taking the unit type of the IP address as a classification result; and inputting the characteristic items into a classification model constructed in advance based on a naive Bayes algorithm for training to obtain a classification model to be tested.

That is, the access time, the domain name of the target website, the type of the access terminal, the IP address, and the unit type to which the IP address belongs in each log data in the training set data need to be extracted first to obtain the feature items corresponding to each log data in the training set data, the access time, the domain name of the target website, and the type of the access terminal are used as variables, the unit type to which the IP address belongs is used as a classification result, and then each feature item is input to a classification model constructed in advance based on a naive bayes algorithm for training, so that a classification model to be tested can be obtained.

The access time A of each piece of data in the training set data₁Target site domain name A₂Access terminal type A₃As a variant, with unit type C as the classification result, then one wants to classify at A₁、A₂、A₃Under the attribute, C_jIs represented by a conditional probability of P (C)_j|A₁A₂A₃). According to the Bayesian formula, it can be obtained that:

wherein, P (C)_j|A₁A₂A₃) Is shown in A₁、A₂、A₃Under simultaneous conditions of_jProbability of occurrence, P (A)₁A₂A₃|C_j) Is shown in C_jUnder the conditions that occur, A₁、A₂、A₃Probability of coincidence, P (C)_j) Is represented by C_jProbability of occurrence, P (A)₁A₂A₃) Is represented by A₁、A₂、A₃Probability of coincidence.

Step S23: and testing the classification model to be tested, and obtaining the trained classification model when the test result of the classification model to be tested meets the preset requirement.

It can be understood that after the classification model to be tested is obtained, in order to ensure the accuracy of the classification model to be tested, the classification model to be tested needs to be tested first, so as to determine whether the classification accuracy of the classification model to be tested can reach a preset accuracy threshold.

Specifically, the test classification model is tested by using test set data to obtain a test result; and determining the classification accuracy corresponding to the classification model to be tested according to the test result. After determining the classification accuracy rate corresponding to the classification model to be tested according to the test result, the method further comprises the following steps: judging whether the classification accuracy is not less than a preset accuracy threshold; and if the classification accuracy is not less than a preset accuracy threshold, taking the classification model to be tested as a classification model after training.

Step S24: and when the log data to be classified is obtained, classifying the log data to be classified by using the trained classification model to obtain the unit type of the IP address to be classified in the log data to be classified.

Step S25: and controlling the access request sent by the IP address to be classified according to the unit type of the IP address to be classified.

Referring to fig. 3, an embodiment of the present application discloses an access control apparatus, including:

the data acquisition module 11 is configured to acquire training set data, where the training set data is log data with unit type marks, and the log data includes a corresponding IP address;

the model training module 12 is configured to train a classification model, which is constructed in advance based on a naive bayes algorithm, by using the training set data to obtain a trained classification model;

the classification module 13 is configured to, when log data to be classified is obtained first, classify the log data to be classified by using the trained classification model to obtain a unit type to which an IP address to be classified belongs in the log data to be classified;

and the risk control module 14 is configured to control an access request sent by the IP address to be classified according to the unit type to which the IP address to be classified belongs.

Further, the data obtaining module 11 is configured to:

acquiring an access log set in an access log library of a target website;

and using the log data marked by the unit type as training set data.

Further, the data obtaining module 11 is configured to:

determining an IP address in current log data;

Specifically, the model training module 12 is configured to:

Further, the model training module 12 is configured to:

Referring to fig. 4, a schematic structural diagram of an electronic device 20 provided in the embodiment of the present application is shown, where the electronic device 20 may specifically implement the steps of the access control method disclosed in the foregoing embodiment.

In general, the electronic device 20 in the present embodiment includes: a processor 21 and a memory 22.

The processor 21 may include one or more processing cores, such as a four-core processor, an eight-core processor, and so on. The processor 21 may be implemented by at least one hardware of a DSP (digital signal processing), an FPGA (field-programmable gate array), and a PLA (programmable logic array). The processor 21 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 21 may be integrated with a GPU (graphics processing unit) which is responsible for rendering and drawing images to be displayed on the display screen. In some embodiments, the processor 21 may include an AI (artificial intelligence) processor for processing computing operations related to machine learning.

Memory 22 may include one or more computer-readable storage media, which may be non-transitory. Memory 22 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 22 is at least used for storing a computer program 221, wherein after being loaded and executed by the processor 21, the steps of the access control method disclosed in any of the foregoing embodiments can be implemented.

In some embodiments, the electronic device 20 may further include a display 23, an input/output interface 24, a communication interface 25, a sensor 26, a power supply 27, and a communication bus 28.

Those skilled in the art will appreciate that the configuration shown in FIG. 4 is not limiting to electronic device 20 and may include more or fewer components than those shown.

Further, an embodiment of the present application also discloses a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the access control method disclosed in any of the foregoing embodiments.

For the specific process of the access control method, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of other elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing detailed description is directed to an access control method, apparatus, device, and medium provided by the present application, and specific examples are applied in the present application to explain the principles and embodiments of the present application, and the descriptions of the foregoing embodiments are only used to help understand the method and core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An access control method, comprising:

2. The access control method of claim 1, wherein the obtaining training set data comprises:

acquiring an access log set in an access log library of a target website;

and using the log data marked by the unit type as training set data.

3. The access control method according to claim 2, wherein the unit type marking each log data in the access log set according to the IP address repository comprises:

determining an IP address in current log data;

4. The access control method according to any one of claims 1 to 3, wherein the training of the classification model constructed in advance based on the naive Bayes algorithm by using the training set data to obtain the trained classification model comprises:

5. The access control method according to claim 4, wherein the training a classification model constructed in advance based on a naive Bayes algorithm by using the training set data to obtain a classification model to be tested comprises:

6. The access control method of claim 4, wherein the testing the test classification model comprises:

7. The access control method according to claim 6, wherein after determining the classification accuracy corresponding to the classification model to be tested according to the test result, the method further comprises:

8. An access control apparatus, comprising:

9. An electronic device, comprising:

a memory and a processor;

wherein the memory is used for storing a computer program;

the processor is configured to execute the computer program to implement the access control method of any one of claims 1 to 7.

10. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the access control method of any one of claims 1 to 7.