CN108234435A - A kind of automatic testing method based on IP classification - Google Patents

A kind of automatic testing method based on IP classification Download PDF

Info

Publication number
CN108234435A
CN108234435A CN201611201889.5A CN201611201889A CN108234435A CN 108234435 A CN108234435 A CN 108234435A CN 201611201889 A CN201611201889 A CN 201611201889A CN 108234435 A CN108234435 A CN 108234435A
Authority
CN
China
Prior art keywords
information
cluster
behavior
collection
distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611201889.5A
Other languages
Chinese (zh)
Inventor
周辉
唐亘
张克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mdt Infotech Ltd Shanghai
Original Assignee
Mdt Infotech Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mdt Infotech Ltd Shanghai filed Critical Mdt Infotech Ltd Shanghai
Priority to CN201611201889.5A priority Critical patent/CN108234435A/en
Publication of CN108234435A publication Critical patent/CN108234435A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources

Abstract

The present invention provides a kind of automatic testing method based on IP classification, including:The IP address information that server obtains is converted into the geography information that IP uses, and classifies based on the geography information to form cluster to the IP address information;Window behavior statistics is carried out to export behavioral statistics as a result, obtaining current behavior Annual distribution collection C by scheduled time dimension to the chosen elements in the IP parameter sets W and cluster;D to the historical behavior statistical result distribution weight under each chronomere and is collected to update historical behavior and be distributed to identical element or the addition of class statistic data with scheduled chronomere;The behavioral data that the historical behavior is distributed in collection D is classified to generate cluster centre using clustering algorithm;The current behavior Annual distribution collection C is calculated to the distance of each cluster centre and exports lowest distance value;Compare the lowest distance value with risk threshold value to export testing result.The present invention can accurately detect malice domain name.

Description

A kind of automatic testing method based on IP classification
Technical field
The present invention relates to computer realm, more particularly to a kind of automatic testing method based on IP classification.
Background technology
With the rapid development of network technology and the arrival of cybertimes, the wide and abundant resource that network is contained, Many facilities are brought to human society.However, just while people’s lives are increasingly dependent on network, by interests driving The network safety event of generation but emerges in an endless stream, and especially in recent years, Botnet, domain name amplification distributed denial of service are attacked Hit, numerous security incidents such as extension horse have seriously affected the normal use of network, also bring great harm to various circles of society, because It is additional important to seem to the detection of these events for this.In addition, using some domain names, terminal website is carried out based on IP address And malicious registration, the malice application of application also bring great security risk to Internet service provider.
Domain name system is one of important infrastructure of current internet, and a large amount of network service comes dependent on domain name service Carry out.Domain name resolution service is (hereinafter referred to as:DNS service) abstract IP address is mapped as being easy to the domain name of memory, make interconnection Network users more easily access various Internet resources, are one of infrastructure services important in internet architecture.Due to domain Name system is not detected the service behavior for relying on its development, and DNS service lacks malicious act detectability, therefore often Often utilized by rogue program.In order to detect these malicious events, need to be detected malice domain name.
Technologies of some now existing detection malice domain names frequently rely on black and white lists, by clearly " allowing " and " not allowing " limits the access of user, so as to fulfill " safety " effect.However, such method is usually associated with a large amount of mistakes Situation is reported and fails to report, adaptability is very poor under different user environment, business demand scene.
Invention content
The technical issues of technical solution of the present invention solves is how accurately to detect malice domain name.
In order to solve the above-mentioned technical problem, technical solution of the present invention provides a kind of automatic detection side classified based on IP Method, including:
Pre-defined data packet is obtained from server end to obtain the element in IP parameter sets W, the IP parameter sets W extremely Include IP address information less;
The IP address information is converted into the geography information that IP uses, and based on the geography information to the IP address Information classification is clustered with being formed;
Window behavior statistics is carried out with defeated by scheduled time dimension to the chosen elements in the IP parameter sets W and cluster It goes on a journey as statistical result, obtains current behavior Annual distribution collection C;
With scheduled chronomere to the historical behavior statistical result distribution weight under each chronomere and to identical member Element or class statistic data are added to update historical behavior distribution collection D;
The behavioral data that the historical behavior is distributed in collection D is classified to generate cluster centre using clustering algorithm;
The current behavior Annual distribution collection C is calculated to the distance of each cluster centre and exports lowest distance value;
Compare the lowest distance value with risk threshold value to export testing result.
Optionally, the data packet includes:Carry out facility information, the network information and the account information of IP behaviors.
Optionally, the element in the IP parameter sets W further includes:IP numerical value, IP network section, IP cutoff informations and Transmission Control Protocol Stack information;
The IP cutoff informations obtain in the following way:
IP address is expressed as to the binary number of 32;
Preceding n bit values are taken as the IP cutoff informations, wherein, n takes 24 to 32 natural number;
The TCP protocol stack information includes:Tcpts, Wscale and Tcp Source Port.
Optionally, it is described the IP address information is converted into the geography information that IP uses to include:
Using external IP geographical data bank, IP address information is converted into using ground information;
If based on used described in natural language recognition in ground information country, province, in city and street information at least Dry information forms the field of the geography information;
It is described that IP address information classification is included with forming cluster based on the geography information:
Field setting classification based on described address information is with to IP address information classification.
Optionally, the chosen elements include:IP network section and IP cutoff informations in the IP parameter sets W.
Optionally, the chosen elements and cluster in the IP parameter sets W carry out window by scheduled time dimension Behavioral statistics are included with exporting behavioral statistics result:
Monitor window behavior;
It counts the chosen elements during window behavior and clusters the display number in predetermined time dimension.
Optionally, the window behavior is based on sliding window or stationary window.
Optionally, the setting method of the predetermined time dimension is:During setting time or setting time starting point and time Terminal.
Optionally, the chronomere is day, and the behavioral statistics result is daily distributed.
Optionally, it determines to be distributed under j-th of chronomere to the weight k of historical behavior statistical result based on following algorithmj
kj=aj(a/ (1-a)), wherein a are the predetermined constant more than 0 and less than 1.
Optionally, it is described that the behavioral data that the historical behavior is distributed in collection D is classified with life using clustering algorithm Include into cluster centre:
Define distribution vector;
Set in historical behavior distribution collection D the distance between distribution vector two-by-two, the distance for it is common it is European away from From;
It is clustered using K-means algorithms for distance between distribution vector two-by-two, and is calculated using Elbow method Method determines most preferably to cluster number m and m cluster centre, be denoted as k1, k2 ... km }.
Optionally, it is described that the behavioral data that the historical behavior is distributed in collection D is classified with life using clustering algorithm Include into cluster centre:Based on the cluster centre update IP behavior clustering informations library generated;
Each described cluster centre is the cluster centre recorded in the IP behaviors clustering information library.
Optionally, the distance for calculating current behavior Annual distribution collection C to each cluster centre includes:It calculates Each distribution vector is to the distance of corresponding cluster centre in the current behavior Annual distribution collection C;
The risk threshold value is obtained based on following manner:
Distance based on the distribution vector in the current behavior Annual distribution collection C to corresponding cluster centre establishes probability point Cloth;
The intermediate value digit for taking the probability distribution is the risk threshold value.
Optionally, the distribution vector for the chosen elements or clusters the display number on scheduled time dimension.
Optionally, the method further includes:
Risk class is obtained based on the testing result;
It asks to determine acceptable risk class range according to Outer risks and exports corresponding cluster result.
The advantageous effect of technical solution of the present invention includes at least:
Technical solution of the present invention can effectively detect the IP address information of abnormal behavior, and can be based on IP address information pair The data packet generated during website or application operating is detected and carries out window behavior Statistical Clustering Analysis to each IP parameter, So as to detect unusual IP addresses information, so as to improve and monitor the accuracy of malicious IP addresses.
Technical solution of the present invention can also be directed to the parameter set of IP address, and the poly- of parameter set is carried out with reference to historical parameter data Class is assessed, and accumulation is weighted to history parameters collection based on chronomere, and the probability distribution based on cluster result calculates wind IP address information with threat is carried out quantitative evaluation by dangerous threshold value and risk class, so as to further improve monitoring malice IP The accuracy of address.
Technical solution of the present invention can also carry out IP address information based on above-mentioned cluster result the division of risk class, so as to Third party user is allow effectively to confirm its applicable risk range, makes the evaluation system of malicious IP addresses information can be according to The situation of tripartite user and assessed, expand the scope of application of technical solution of the present invention, accomplish the simultaneous of multiple assessment system Hold.
Description of the drawings
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other features of the invention, Objects and advantages will become more apparent upon:
Fig. 1 is a kind of flow diagram of the automatic testing method based on IP classification of technical solution of the present invention;
Fig. 2 is flow diagram of the technical solution of the present invention another kind based on the IP automatic testing methods classified.
Specific embodiment
In order to preferably technical scheme of the present invention be made clearly to show, the present invention is made below in conjunction with the accompanying drawings into one Walk explanation.
It is extremely important in the society that the detection of malice domain name is popularized in nowadays network.Multiple network application scenarios, such as silver Row loan, service log-on, electric business marketing etc., are all based on IP address, IP address are all based on to the user in application scenarios or businessman Carry out the operation of network means.And aforesaid operations all be unable to do without the network window opened based on IP address.Therefore lead in network In letter, IP address is the information of a kind of user that is very basic and generally having or businessman.Under network application scene, user Or businessman is based on IP address and carries out network operation progress, but bad user is potentially based on some IP address to merchant service malice Application, bad businessman are potentially based on IP address and carry out demagogy popularization to platform, above-mentioned behavior may to network environment and Resource service causes heavy losses, and generates the waste of money.
In above-mentioned action process, since bad user or businessman are carried out by the IP address of network, in network application Or in extension process, network can the specific operation based on user, such as the windows news such as login window periodically sends to server Scheduled data packet can protect the various information in relation to IP address in data packet, and technical solution of the present invention passes through detection service device The above-mentioned data packet of middle reception carries out the classification of IP address information, so as to fulfill the automatic inspection of technical solution of the present invention IP address It surveys.
A kind of automatic testing method based on IP classification as shown in Figure 1, including step:
Step S100 obtains pre-defined data packet to obtain IP parameter sets W, the IP parameter sets W from server end In element include at least IP address information.
In this step, in the network user, (the application meaning user uses user and businessman including general networking service User) it carries out the opening page of network service, login, registration, apply when operations, network service application can be sent to server The data packet pre-defined.The data packet is then pre-defined, pre-defines the process of data packet in other embodiment In can be performed as an additional step.The content of pre-defined data packet includes:It is (i.e. above-mentioned that definition carries out IP behaviors Network operation) facility information;Define the network information;And define account information.Wherein, the information content of account information can With the content structure of data packet described in selected as.
In the step s 100, the element in the IP parameter sets W specifically includes following information:IP numerical value, IP network section, IP are cut Disconnected information and TCP protocol stack information.Specifically, the element in the P parameter sets W is based on IP address information, IP address is represented For binary number (altogether 32), ipSeg_n (n values are from 24 to 32) then is denoted as by n before the binary number, as The IP cutoff informations;The information of TCP protocol stack includes tcpts, wscale, tcp source port.More specifically, The timestamp information that it is TCP protocol stack that tcpts information, which is, is an option in Transmission Control Protocol, represents that Transmission Control Protocol is shaken hands generation Timestamp;Wscale information is the window expansion factor that TCP window expands option, for expanding TCP advertised windows;tcp Source port information is the communication port information in TCP communication source.
In the technical solution of the present invention, the element information in the IP parameter sets W is in the information based on the data packet It is obtained acquired in appearance, therefore, the information content of the data packet can also have other than the citing of above-mentioned pre-defined content Other forms, the purpose is to obtain the required essential information of element in above-mentioned IP parameter sets W by packet content, The data packet form of technical solution of the present invention is not limited by examples detailed above.
With continued reference to Fig. 1, the automatic testing method based on IP classification described in technical solution of the present invention further includes:
The IP address information is converted into the geography information that IP uses, and based on the geography information pair by step S101 The IP address information classification is clustered with being formed.
It is described that the IP address information is converted into the geography information packet that IP uses specifically, in above-mentioned steps S101 It includes:Using external IP geographical data bank, IP address information is converted into using ground information;And based on natural language recognition institute At least several information in the country used in ground information, province, city and street information are stated, form the word of the geography information Section.It is described that IP address information classification is included with forming cluster based on the geography information:Based on the geography information Field sets classification to classify to the IP address information.
Citing is to illustrate the process of step S101, for example the IP address information obtained is 106.18.236.97, according to outside IP geographical data banks, it is the ground then wherein formed using ground information, i.e. Hunan China Telecom that can will convert the IP address information The field for managing information is " China ", " Hunan Province ", " telecommunications ".In view of the geography information conversion results of this, it can be by above-mentioned geography The field combination of information is to determine classification, for example, the IP address information with " China ", " telecommunications " geographical information field is classified as One kind, in other embodiments, the type that above-mentioned field combination determines can be set, for example, can also will have " China ", The IP address information of " Hunan Province " geographical information field is classified as one kind, with can also will having " China ", " Hunan Province " and " telecommunications " The IP address information of reason information field is classified as one kind.How examples detailed above combines the field of geography information with determining IP if giving The implementation process of location class categories.
It should be noted that the database for corresponding to geography information for IP address information according to external IP geographical data bank is accurate Difference is understood in the difference of exactness, the accuracy that IP address information is converted into the geography information that IP is used, that is to say, that for For the more complete external IP geographical data bank of the prior art, when known IP address information is converted into using ground information, Its geography information may have more detailed address information, and the corresponding obtained field of geography information also can be more, setting During the classification of IP address classification, more accurate classification information can also be used.Similar, it can be specific to state using ground information Family, province, city and street information, at this point, the field of obtained geography information includes national information field, province information word Section, urban information field and street information field, then when determining IP address class categories, national information word can also be used The combination of section, province information field, urban information field can also use national information field, province information field, city letter The combination of field and street information field is ceased, technical solution of the present invention limits not to this.
With continued reference to Fig. 1, the automatic testing method based on IP classification described in technical solution of the present invention further includes:
Step S102 carries out window row to the chosen elements in the IP parameter sets W and cluster by scheduled time dimension To count to export behavioral statistics as a result, obtaining current behavior Annual distribution collection C.
Specifically, the chosen elements include:IP network section and IP cutoff informations in the IP parameter sets W, the IP are cut Disconnected information is above-mentioned ipSeg_n.The window behavior is that the data packet of above-mentioned IP address information is received in the server When, that is, it is considered as the generation of a window behavior.The statistics is real-time counting.Corresponding this IP parameter set W of cluster.More For specifically, the chosen elements and cluster in the IP parameter sets W carry out window behavior system by scheduled time dimension Meter is included with exporting behavioral statistics result:Monitor window behavior;And count the chosen elements and cluster during the window behavior Display number in predetermined time dimension.
In step s 102, the window behavior is based on sliding window or stationary window.The predetermined time dimension width Setting method be:During setting time or setting time starting point and end time.According to the classification of window behavior, statistical window The algorithm of behavior includes:Above-mentioned objects of statistics (i.e. IP network section, the address cluster corresponding to ipSeg_n, IP) is being set respectively The statistics of sliding window or stationary window is carried out during one or more times, alternatively, passing through setting time starting point and time Terminal carries out the statistics of sliding window or stationary window during past a period of time.
Following present the first examples of statistical window behavior algorithm in technical solution of the present invention:
For above-mentioned selected element, such as IP network section, counted as follows:With the 15th minute, the 30th minute, the 60th Minute and the 120th minute are as during the time;Count the IP network section the 15th minute, the 30th minute, the 60th minute, 120 minutes sliding window occurrence numbers;The IP address network segment is counted at the 15th minute, the 30th minute, the 60th minute, the 120th point Clock stationary window occurrence number.
In another then example, the algorithm of statistical window behavior can also be in technical solution of the present invention:
The past period is set as 15 minutes, setting time starting point is 1:00th, end time 1:It 15 or can also set Starting point of fixing time is 1:15th, end time 1:30.Sliding window occurrence count is:For above-mentioned chosen elements (such as IP network Section), the number that statistics occurs within 15 minutes in the past;Stationary window occurrence count is:For in the stationary windows of 15 minutes, giving The number that fixed parameter (such as IP network section) occurs.
Based on above-mentioned technology contents, current behavior Annual distribution collection C described in step S102 is really that IP address selectes member The statistical distribution of element occurrence number in sliding window or stationary window on multiple time dimensions.
With continued reference to Fig. 1, the automatic testing method based on IP classification described in technical solution of the present invention further includes:
Step S103 distributes weight simultaneously with scheduled chronomere to the historical behavior statistical result under each chronomere Identical element or class statistic data are added to update historical behavior distribution collection D.
Specifically, in step s 103, chronomere can be day or arbitrarily selected several hours, the chronomere Preferably day, the behavioral statistics result are daily distributed, i.e., the historical behavior statistical result being daily distributed is carried out as unit of day Statistics, such as count information of the statistics for today, the count information of yesterday etc..Historical behavior statistical result is with day in this step The behavioral statistics data of time dimension for chronomere.
More specifically, the update historical behavior distribution collection D is needed according to time dimension to each on time dimension A element or cluster and its statistical data distribution weight, the technical solution of the present invention preferably temporally time series distribution of dimension Different weights, and totalling update is carried out to the number statistics based on identical element or cluster, so as to fulfill the technology of the present invention side Case meaning update historical behavior distribution collection D.The thinking of weight setting is time data more remote, and weight is lower.Based on as follows Algorithm determines element or the weight k of cluster distribution in j-th of chronomerej:kj=aj(a/ (1-a)), wherein a are more than 0 and small In 1 predetermined constant.J is the time series numerical value in the counting namely the time dimension of chronomere, and j=1~N, 1 is The time series of last update, N are initial newer time series.In step s 103, it is described to identical element or poly- Class statistical data is added includes following process to update historical behavior distribution collection D:To historical time dimension under the chronomere The upper assignment ratio for identity element or cluster is weighted addition.Wherein, the result of weighting summation is used to update described go through Recorded in history behavior distribution collection D, historical behavior distribution collection D be each element according to the historical time dimension to this Element clusters the result that the assignment proportion weighted on time dimension is added.
With continued reference to Fig. 1, the automatic testing method based on IP classification described in technical solution of the present invention further includes:
The behavioral data that the historical behavior is distributed in collection D is classified to generate by step S104 using clustering algorithm Cluster centre.
According to historical behavior distribution collection D caused by step S103, historical behavior distribution collection D is history IP time of the act Distribution collection, each element or cluster of the inside are the vector of a j dimension, if than with certain day of 1 hour time dimension The distribution vector of several statistical distributions, element or cluster is the vector set of one 24 dimension, often one-dimensional to represent above-mentioned element or poly- respectively Class is in the occurrence number of each hour window.Therefore, the behavior being distributed the historical behavior using clustering algorithm in collection D Data are classified to be included with generating cluster centre:
Define distribution vector;
Set in historical behavior distribution collection D the distance between distribution vector two-by-two, the distance for it is common it is European away from From;And
It is clustered using K-means algorithms for distance between distribution vector two-by-two, and is calculated using Elbow method Method determines most preferably to cluster number m and m cluster centre, be denoted as k1, k2 ... km }.
The display number that the distribution vector is counted for the chosen elements or cluster on scheduled time dimension.It is described The algorithm of common Euclidean distance can be obtained according to any way of Euclidean distance between two vectors of calculating in the prior art.It is described Distribution vector is the behavioral data in the historical behavior distribution collection D.In this step, K-means algorithmic procedures include input Determining cluster number and the database for including several data objects, the cluster that output meets variance minimum sandards are (i.e. above-mentioned Cluster centre), it specifically includes:
(1) object for arbitrarily selecting to determine cluster number from several data objects is as initial cluster center;(2 according to every The mean value (center object) of a clustering object calculates the distance of each object and these center objects;And according to minimum range weight Newly corresponding object is divided;(3) mean value (center object) of each (changing) cluster is recalculated;(4) standard is calculated Measure function, when meeting certain condition, during such as function convergence, then algorithm terminates, and output meets the cluster of variance minimum sandards;Such as Really bar part is unsatisfactory for, and returns to step (2).
In above-mentioned K-means algorithms, the database of several data objects is the spacing of the distribution vector two-by-two From result of calculation set.Determining one kind that may be used in following two modes of the cluster number:One passes through Elbow Method algorithms determine, i.e., judge to cluster number is imitated for how many when according to the functional relation of the result of cluster and cluster number Fruit is best, so that it is determined that cluster number;The second is the value of m is determined according to specific demand, such as the cluster of shirt size It will consider LMS three classes etc..In technical solution of the present invention the technology of the present invention side is determined preferably through Elbow method algorithms The cluster number m of case.
More specifically, it is described that the behavioral data that the historical behavior is distributed in collection D is classified using clustering algorithm It is further included with generating cluster centre:Based on the cluster centre update IP behavior clustering informations library generated;The IP behaviors cluster The cluster centre recorded in information bank has recorded each cluster centre.
With continued reference to Fig. 1, the automatic testing method based on IP classification described in technical solution of the present invention further includes:
Step S105 calculates the current behavior Annual distribution collection C to the distance of each cluster centre and exports minimum Distance value.
Specifically, the distance for calculating current behavior Annual distribution collection C to each cluster centre includes:It calculates Each distribution vector is to the distance of corresponding cluster centre in the current behavior Annual distribution collection C.More specifically, in cluster The heart is stored in the IP behaviors clustering information library, calculates element in the current behavior Annual distribution collection C and is clustered to IP behaviors The distance of each cluster centre in information bank is minimized as output valve.It should be noted that above-mentioned output valve can also It is averaged or above-mentioned output valve is calculated based on other input functions, the output valve is to work as by technical solution of the present invention It moves ahead as element in Annual distribution collection C to the distance of cluster centre.
With continued reference to Fig. 1, the automatic testing method based on IP classification described in technical solution of the present invention further includes:
Step S106, the lowest distance value and risk threshold value are to export testing result.
In step s 106, the risk threshold value is obtained based on following manner:Based on the current behavior Annual distribution collection C In the distance of distribution vector to corresponding cluster centre establish probability distribution;And take the intermediate value position of the probability distribution Number is the risk threshold value.
More specifically, in above process, technical solution of the present invention is based in current behavior Annual distribution collection C per number Strong point is to the distance (can be minimum range) for corresponding to cluster centre, by each data point to all distances of corresponding cluster centre From small to large, a probability distribution (can be histogram) is formed, 50 quantiles of the probability distribution are determined as the wind Dangerous threshold value, if the lowest distance value is less than the risk threshold value, the corresponding quantile of lowest distance value is corresponding wind Dangerous grade, window behavior or network behavior according to corresponding to the division of above-mentioned risk class can determine whether out current IP address information Belong to the possibility of corresponding cluster, the reliability for judging cluster result can be exported according to confidence level based on above-mentioned possibility, from And judge the application type belonging to IP address information.Above-mentioned cluster can be corresponding from different application scenarios, so as to pass through this The scheme of the IP Classification and Identifications of inventive technique scheme carries out the detecting of IP address information, so as to obtain being applied described in IP address information The risk assessment of type.
With reference to figure 2, the automatic testing method based on IP classification described in technical solution of the present invention can also be walked with the flow of Fig. 2 Suddenly implemented, i.e., based on above-mentioned steps S100~S106, further included:
Step S107 obtains risk class based on the testing result;
Step S108 is asked to determine acceptable risk class range and is exported corresponding cluster knot according to Outer risks Fruit.
For technical solution of the present invention, step S107, S108 can as the flow scope of technical solution of the present invention, It can also be used as the application flow of external equipment.
It should be noted that common clustering algorithm can only provide a cluster result, technical solution of the present invention will be above-mentioned The classification of clustering algorithm and IP address is combined, and can detect IP address in real time and run through the technology of the present invention side Case judges the cluster result of output IP address.It is determined in view of cluster result not necessarily 100%, but with possibility , cluster result is placed under confidence level by technical solution of the present invention, provide incessantly cluster as a result, giving cluster result Reliability.
By the cluster of the IP address information of technical solution of the present invention, by by IP address information cluster, can determine whether out Application type belonging to IP address, than such as whether for office network, if be mobile network and corresponding probability.
When actual use, for example it is the scene of an electric business marketing, for the fault tolerance of IP cluster results Compare high, which can select the high risk class of a bit, automatically derive corresponding cluster result;
For another example it is a bank loan scene, it is relatively low to fault tolerance, lower risk class can be selected, automatically It obtains and different cluster result above.
It is further to note that the IP address information of technical solution of the present invention is handled based on IP binary-coded characters A plurality of types of information arrived, the cluster of above- mentioned information can further increase the accuracy of IP address information cluster with it is reliable Property.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make various deformations or amendments within the scope of the claims, this not shadow Ring the substantive content of the present invention.

Claims (15)

1. a kind of automatic testing method based on IP classification, which is characterized in that including:
Pre-defined data packet is obtained from server end at least to wrap to obtain the element in IP parameter sets W, the IP parameter sets W Include IP address information;
The IP address information is converted into the geography information that IP uses, and based on the geography information to the IP address information Classification is clustered with being formed;
Window behavior statistics is carried out to export row by scheduled time dimension to the chosen elements in the IP parameter sets W and cluster For statistical result, current behavior Annual distribution collection C is obtained;
With scheduled chronomere to the historical behavior statistical result distribution weight under each chronomere and to identical element or Class statistic data are added to update historical behavior distribution collection D;
The behavioral data that the historical behavior is distributed in collection D is classified to generate cluster centre using clustering algorithm;
The current behavior Annual distribution collection C is calculated to the distance of each cluster centre and exports lowest distance value;
Compare the lowest distance value with risk threshold value to export testing result.
2. the method as described in claim 1, which is characterized in that the data packet includes:Carry out facility information, the net of IP behaviors Network information and account information.
3. the method as described in claim 1, which is characterized in that the element in the IP parameter sets W further includes:IP numerical value, IP The network segment, IP cutoff informations and TCP protocol stack information;
The IP cutoff informations obtain in the following way:
IP address is expressed as to the binary number of 32;
Preceding n bit values are taken as the IP cutoff informations, wherein, n takes 24 to 32 natural number;
The TCP protocol stack information includes:Tcpts, Wscale and Tcp Source Port.
4. the method as described in claim 1, which is characterized in that described that the IP address information is converted into the geography that IP uses Information includes:
Using external IP geographical data bank, IP address information is converted into using ground information;
Based at least several letters in country, province, city and the street information used described in natural language recognition in ground information Breath forms the field of the geography information;
It is described that IP address information classification is included with forming cluster based on the geography information:
Field setting classification based on described address information is with to IP address information classification.
5. the method as described in claim 1, which is characterized in that the chosen elements include:IP network in the IP parameter sets W Section and IP cutoff informations.
6. the method as described in claim 1, which is characterized in that the chosen elements and cluster in the IP parameter sets W Window behavior statistics is carried out by scheduled time dimension to export behavioral statistics result to include:
Monitor window behavior;
It counts the chosen elements during window behavior and clusters the display number in predetermined time dimension.
7. method as claimed in claim 6, which is characterized in that the window behavior is based on sliding window or stationary window.
8. method as claimed in claim 6, which is characterized in that the setting method of the predetermined time dimension is:Setting time Period or setting time starting point and end time.
9. the method as described in claim 1, which is characterized in that the chronomere is day, and the behavioral statistics result is daily Distribution.
10. the method as described in claim 1, which is characterized in that based on following algorithm determine to distribute under j-th of chronomere to The weight k of historical behavior statistical resultj
kj=aj(a/ (1-a)), wherein a are the predetermined constant more than 0 and less than 1.
11. the method as described in claim 1, which is characterized in that described that the historical behavior is distributed collection D using clustering algorithm In behavioral data classify and included with generating cluster centre:
Define distribution vector;
It sets the historical behavior and is distributed in collection D the distance between distribution vector two-by-two, the distance is common Euclidean distance;
It is clustered for distance between distribution vector two-by-two using K-means algorithms, and true using Elbow method algorithms Fixed best cluster number m and m cluster centre, be denoted as k1, k2 ... km }.
12. the method as described in claim 1 or 11, which is characterized in that described to be divided the historical behavior using clustering algorithm Behavioral data in cloth collection D is classified to be included with generating cluster centre:Gathered based on the cluster centre update IP behaviors generated Category information library;
Each described cluster centre is the cluster centre recorded in the IP behaviors clustering information library.
13. the method as described in claim 1, which is characterized in that described to calculate the current behavior Annual distribution collection C to each The distance of a cluster centre includes:It calculates in the current behavior Annual distribution collection C in each distribution vector to corresponding cluster The distance of the heart;
The risk threshold value is obtained based on following manner:
Distance based on the distribution vector in the current behavior Annual distribution collection C to corresponding cluster centre establishes probability distribution;
The intermediate value digit for taking the probability distribution is the risk threshold value.
14. the method as described in claim 11 or 13, which is characterized in that the distribution vector is the chosen elements or cluster Display number on scheduled time dimension.
15. the method as described in claim 1, which is characterized in that further include:
Risk class is obtained based on the testing result;
It asks to determine acceptable risk class range according to Outer risks and exports corresponding cluster result.
CN201611201889.5A 2016-12-22 2016-12-22 A kind of automatic testing method based on IP classification Pending CN108234435A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611201889.5A CN108234435A (en) 2016-12-22 2016-12-22 A kind of automatic testing method based on IP classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611201889.5A CN108234435A (en) 2016-12-22 2016-12-22 A kind of automatic testing method based on IP classification

Publications (1)

Publication Number Publication Date
CN108234435A true CN108234435A (en) 2018-06-29

Family

ID=62657192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611201889.5A Pending CN108234435A (en) 2016-12-22 2016-12-22 A kind of automatic testing method based on IP classification

Country Status (1)

Country Link
CN (1) CN108234435A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598404A (en) * 2019-09-17 2019-12-20 腾讯科技(深圳)有限公司 Security risk monitoring method, monitoring device, server and storage medium
CN110677309A (en) * 2018-07-03 2020-01-10 百度在线网络技术(北京)有限公司 Crowd clustering method and system, terminal and computer readable storage medium
CN111325495A (en) * 2018-12-17 2020-06-23 顺丰科技有限公司 Abnormal part classification method and system
CN112822143A (en) * 2019-11-15 2021-05-18 网宿科技股份有限公司 Method, system and equipment for evaluating IP address

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103532797A (en) * 2013-11-06 2014-01-22 网之易信息技术(北京)有限公司 Abnormity monitoring method and device for user registration
CN104050289A (en) * 2014-06-30 2014-09-17 中国工商银行股份有限公司 Detection method and system for abnormal events
CN104156418A (en) * 2014-08-01 2014-11-19 北京系统工程研究所 Knowledge reuse based evolutionary clustering method
CN105553998A (en) * 2015-12-23 2016-05-04 中国电子科技集团公司第三十研究所 Network attack abnormality detection method
JP5957411B2 (en) * 2013-04-25 2016-07-27 日本電信電話株式会社 Address resolution system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5957411B2 (en) * 2013-04-25 2016-07-27 日本電信電話株式会社 Address resolution system and method
CN103532797A (en) * 2013-11-06 2014-01-22 网之易信息技术(北京)有限公司 Abnormity monitoring method and device for user registration
CN104050289A (en) * 2014-06-30 2014-09-17 中国工商银行股份有限公司 Detection method and system for abnormal events
CN104156418A (en) * 2014-08-01 2014-11-19 北京系统工程研究所 Knowledge reuse based evolutionary clustering method
CN105553998A (en) * 2015-12-23 2016-05-04 中国电子科技集团公司第三十研究所 Network attack abnormality detection method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110677309A (en) * 2018-07-03 2020-01-10 百度在线网络技术(北京)有限公司 Crowd clustering method and system, terminal and computer readable storage medium
CN111325495A (en) * 2018-12-17 2020-06-23 顺丰科技有限公司 Abnormal part classification method and system
CN111325495B (en) * 2018-12-17 2023-12-01 顺丰科技有限公司 Abnormal part classification method and system
CN110598404A (en) * 2019-09-17 2019-12-20 腾讯科技(深圳)有限公司 Security risk monitoring method, monitoring device, server and storage medium
CN112822143A (en) * 2019-11-15 2021-05-18 网宿科技股份有限公司 Method, system and equipment for evaluating IP address
CN112822143B (en) * 2019-11-15 2022-05-27 网宿科技股份有限公司 Method, system and equipment for evaluating IP address

Similar Documents

Publication Publication Date Title
CN108848515B (en) Internet of things service quality monitoring platform and method based on big data
CN109861953B (en) Abnormal user identification method and device
US20200322237A1 (en) Traffic detection method and traffic detection device
CN102143507B (en) Method and system for monitoring service quality, and analytical method and system therefor
CN108234435A (en) A kind of automatic testing method based on IP classification
CN110321424B (en) AIDS (acquired immune deficiency syndrome) personnel behavior analysis method based on deep learning
CN101686444B (en) System and method for detecting spam SMS sender number in real time
CN106951446A (en) Financial Information method for pushing and device
CN104040963A (en) System and methods for spam detection using frequency spectra of character strings
CN109190916A (en) Method of opposing electricity-stealing based on big data analysis
CN109218321A (en) A kind of network inbreak detection method and system
CN112751835B (en) Flow early warning method, system, equipment and storage medium
CN111049818B (en) Abnormal information discovery method based on network traffic big data
CN108632269A (en) Detecting method of distributed denial of service attacking based on C4.5 decision Tree algorithms
CN111148018B (en) Method and device for identifying and positioning regional value based on communication data
CN111917574B (en) Social network topology model and construction method, user confidence and affinity calculation method and telecom fraud intelligent interception system
CN104598595A (en) Fraud webpage detection method and corresponding device
CN111191720B (en) Service scene identification method and device and electronic equipment
CN111611519B (en) Method and device for detecting personal abnormal behaviors
Althobaiti et al. Energy theft in smart grids: a survey on data-driven attack strategies and detection methods
Sun et al. Detection and classification of network events in LAN using CNN
Zhang et al. Comprehensive IoT SIM card anomaly detection algorithm based on big data
CN109446327B (en) Diagnosis method and system for mobile communication customer complaints
CN106817710A (en) The localization method and device of a kind of network problem
US8032302B1 (en) Method and system of modifying weather content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180629