CN110008976A - A kind of network behavior classification method and device - Google Patents

A kind of network behavior classification method and device Download PDF

Info

Publication number
CN110008976A
CN110008976A CN201811482192.9A CN201811482192A CN110008976A CN 110008976 A CN110008976 A CN 110008976A CN 201811482192 A CN201811482192 A CN 201811482192A CN 110008976 A CN110008976 A CN 110008976A
Authority
CN
China
Prior art keywords
data
characteristic
data set
network behavior
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811482192.9A
Other languages
Chinese (zh)
Inventor
杨宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811482192.9A priority Critical patent/CN110008976A/en
Publication of CN110008976A publication Critical patent/CN110008976A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides network behavior classification method and devices.A kind of method for network behavior classification includes: to obtain the first data set, and each data entry in first data set includes the characteristic for indicating network behavior and corresponding label information;Clustering processing is carried out to obtain one or more clusters to the characteristic in first data set;Data entry belonging to the characteristic to peel off is rejected from first data set according to one or more of clusters to obtain the second data set;And sorter model is trained using characteristic included by the data entry in second data set and corresponding label information.The present invention also provides the device and system classified for network behavior.

Description

A kind of network behavior classification method and device
Technical field
The present invention relates to computer technology more particularly to a kind of network behavior classification method and devices.
Background technique
Nowadays internet has been deep into the every aspect of society, the scale of the network user also with the universal of internet and Rapidly increase.Internet is developing into a social infrastructure, and at a part inseparable for people's lives, this makes Obtain the Internet bearer more and more users behavior.But the behavior of user on the internet is not supervised effectively.It is hidden Private safety, fake products problem, network fraud, viral wooden horse, safety of payment, public opinion violence etc. all exist difficult to regulate.
For example, since access threshold is low and supervision difficulty is big, various network false advertisements, rumour, fraud information, illegal a surname It passes and flame row wantonly, had both invaded the legitimate rights and interests of consumer, and also hampered the healthy and orderly development of cyberspace. In addition to this, miscellaneous hacker attack has seriously affected the information security of people and enterprise runs well.For example, distributed Refusal service (DDoS) generates reasonable service request by means of client/server technology to occupy the excessive service of goal systems Resource, to make goal systems that can not provide timely service response to normal users.
Therefore, by identifying the network behavior of user, different specification measures is taken for different network behaviors, is become A kind of effective supervision approach.Most important network behavior identification is exactly accuracy, if identification inaccuracy, just can not Different specification measures is correctly taken according to different network behaviors.If identifying that error rate is high, to always mistakenly disturb It abuses family online or obtains network service, will seriously affect user experience.On the other hand, existing since network information is huge Technology can not usually identify various network behaviors, especially harmful network behavior, thus can not be in network behavior in time Blocking or intervention are effectively performed when occurring.
Therefore, this field needs that the technology of network behavior classification can be effectively realized.
Summary of the invention
The present invention provides network behavior classification method and devices.Using the present invention, can be realized efficient and/or real-time Network behavior classification.
According to one embodiment of present invention, a kind of method for network behavior classification is provided comprising: obtain the One data set, each data entry in first data set include the characteristic for indicating network behavior and corresponding label Information;Clustering processing is carried out to obtain one or more clusters to the characteristic in first data set;According to described one A or multiple clusters are several to obtain second to reject data entry belonging to the characteristic to peel off from first data set According to collection;And it is trained using characteristic included by the data entry in second data set and corresponding label information Sorter model.
On the one hand, the method further includes: one or more network rows are extracted from primitive network behavioral data It is characterized;Determine the correlation between one or more of network behavior features;And first data set is generated, it is described First data set includes the characteristic of the network behavior feature with low correlation each other.
On the one hand, it includes: to reject that data entry belonging to the characteristic to peel off is rejected from first data set Deviate data entry belonging to the characteristic of cluster centre in one or more of clusters;Or if spy in a cluster The quantity for levying data is less than threshold value, then abandons data entry belonging to the characteristic in the cluster.
On the one hand, the method further includes: using housebroken sorter model to user network collected Behavioural information is classified.
On the one hand, the training sorter model comprises determining that whether the training to the sorter model restrains;With And the sorter model described in deconditioning when determining the training convergence.
On the one hand, occur it is following one or more when, determine the training convergence: the error of the sorter model Reach threshold value;The classification accuracy of the sorter model reaches threshold value;And the cycle-index of the training reaches threshold value.
On the one hand, the characteristic include it is following one or more: IP address, port, network protocol, outlet stream Amount, inlet flow rate, duration, website information.
On the one hand, the classification of the sorter model include it is following one or more: browsing webpage, viewing video, under Published article part, release information, abnormal transaction, network fraud, hacker attack.
According to another embodiment of the invention, a kind of device for network behavior classification is provided comprising: data Module is obtained, for obtaining the first data set, each data entry in first data set includes indicating network behavior Characteristic and corresponding label information;Data preprocessing module, for being carried out to the characteristic in first data set Clustering processing is rejected from first data set with obtaining one or more clusters according to one or more of clusters Data entry belonging to the characteristic to peel off is to obtain the second data set;And categorization module, for using second number Characteristic included by data entry according to concentration and corresponding label information train sorter model.
On the one hand, the data acquisition module be further configured to from primitive network behavioral data extract one or Multiple network behavior features;Determine the correlation between one or more of network behavior features;And generate described first Data set, first data set include the characteristic of the network behavior feature with low correlation each other.
On the one hand, the data preprocessing module is rejected from first data set belonging to the characteristic to peel off Data entry includes: to reject data entry belonging to the characteristic of deviation cluster centre in one or more of clusters;Or If the quantity of the characteristic in one cluster of person is less than threshold value, data strip belonging to the characteristic in the cluster is abandoned Mesh.
On the one hand, the categorization module is configured to: using housebroken sorter model to user network collected Network behavioural information is classified.
On the one hand, the categorization module is configured to: whether determination restrains the training of sorter model;And true Sorter model described in deconditioning when the fixed training convergence.
On the one hand, occur it is following one or more when, determine the training convergence: the error of the sorter model Reach threshold value;The classification accuracy of the sorter model reaches threshold value;And the cycle-index of the training reaches threshold value.
On the one hand, the characteristic include it is following one or more: IP address, port, network protocol, outlet stream Amount, inlet flow rate, duration, website information.
On the one hand, the classification of the sorter model include it is following one or more: browsing webpage, viewing video, under Published article part, release information, abnormal transaction, network fraud, hacker attack.
According to another embodiment of the invention, a kind of system for network behavior classification is provided comprising: processing Device;For the memory of storage processor executable instruction, wherein the processor, which is configured to execute the processor, to be held Row instruction is to realize method as described above.
According to an aspect of the present invention, Feature Selection and feature phase are passed through to the network behavior characteristic of primary acquisition The detection of closing property, can reject unrelated and/or redundancy characteristic information, and not only avoiding redundancy feature interferes with each other, but also number It is reduced according to intrinsic dimensionality, the high efficiency for follow-up data processing lays the foundation.
According to another aspect of the present invention, extraneous data or error number are removed by the clustering processing of unsupervised learning According to the interference to subsequent classification algorithm, the data volume of network behavior sorting algorithm processing is significantly reduced, and after improving The accuracy of continuous network behavior sorting algorithm.
According to another aspect of the present invention, suitable convergence point is selected during classifier training, reaches convergence point It is returned to model result, model over-fitting can be effectively reduced, improves the accuracy of disaggregated model.
Any of the above or multiple technologies feature may be implemented in the embodiment of the present invention, so that network behavior of the invention Classification method and device be able to solve in the prior art network behavior classification inaccuracy and/or not in time the problem of.For example, with When family is surfed the Internet, the network behavior of user can be identified in time and accurately during user carries out network activity, thus Corresponding measure is taken when needing.
Detailed description of the invention
Fig. 1 is the flow chart of network behavior classification method according to an embodiment of the invention;
Fig. 2 is the schematic diagram of network behavior feature extracting method according to an embodiment of the invention;
Fig. 3 is the schematic diagram of data preprocessing method according to an embodiment of the invention;
Fig. 4 is the schematic diagram of classifier training method according to an embodiment of the invention;And
Fig. 5 is the block diagram of network behavior sorter according to an embodiment of the invention.
Specific embodiment
The invention will be further described with attached drawing combined with specific embodiments below, but guarantor of the invention should not be limited with this Protect range.
The present invention provides network behavior classification method and devices.On the one hand, network behavior classification method of the invention and Device can accurately identify various user network behaviors, different so as to correctly be taken according to heterogeneous networks behavior Specification measure.On the other hand, network behavior classification method of the invention and device can identify various user networks in real time Behavior, so as to which harmful network behavior is effectively blocked or intervened when network behavior is occurring.As a result, according to this hair Bright network behavior classification method and device are able to solve any in the accuracy and timeliness the two of network behavior classification One problem solves both of these problems simultaneously, has wide applicability.
Fig. 1 is the flow chart of network behavior classification method according to an embodiment of the invention.
Step 102: extracting network behavior feature.User can generate various when carrying out network activity (that is, network behavior) The relevant information (for example, attribute) of various kinds, for example, to transaction, browsing webpage, the relevant information of viewing video etc..It can collect In the case where network of relation behavioural information, useful network behavior characteristic can be therefrom extracted, such as user information, Transaction amount, used facility information, network flow etc..
Step 104: data prediction being carried out to extracted network behavior feature, obtains data set.Data prediction can Including data cleansing, missing values processing, data transformation etc., so that extracted network behavior characteristic meets subsequent processing Requirement.
Step 106: classification processing being carried out to the data set obtained after data prediction using classifier.Classifier is available In distinguishing network behavior classification representated by network behavior feature, for example whether for arm's length dealing, whether having fraud, user It surfs the Internet the carried out class of activity (for example, seeing video, browsing webpage, transmission message etc.).
Step 108: sorter model being assessed, terminates net if through assessment (for example, meeting evaluation index) Otherwise network behavior assorting process returns to step 104.
The operation of classifier includes classifier training stage and classifier application stage.In the classifier training stage, pass through Above-mentioned steps 102-108 usage history data set is suitble to one or more sorter models of solution goal task to train, and Verifying and offline evaluation are carried out to model, preferable sorter model is then determined by evaluation index.Rank is applied in classifier Section, can equally through the above steps 102-108 freshly harvested data are input to trained sorter model, so that it may it is defeated Classification results out.The new data and classification results can also be further used for classification of assessment device model (that is, online evaluation).According to Two classification may be implemented in classifier of the invention, and classify also may be implemented more.
Above each step in classifier training stage is described in detail in 2-4 referring to the drawings.
Fig. 2 is the schematic diagram of network behavior feature extracting method according to an embodiment of the invention.
Step 201, the primitive network behavioral data of user is acquired.User can generate various network behaviors during online Information, IP address, port, network protocol, rate of discharge, inlet flow rate, duration, website information etc..It can acquire These network behavior data of user are to form primitive network behavioral data.
Step 202, one or more network behavior features are extracted from primitive network behavioral data to form characteristic Collection.Continue the example above with respect to network behavior, the characteristic features information of network behavior can be filtered out.In screening process The feature that can represent network behavior can be retained and rejected and classified unrelated information with network behavior.For example, entering and leaving flow information It may indicate that and what kind of network behavior is occurring (for example, browsing webpage, viewing video, downloading file, release information Deng), and the specific category of port information and network behavior may be not directly dependent upon.Thus, it is possible in primitive network behavior number Flow information is entered and left according to middle reservation and rejects port information to form characteristic data set.
Step 203, correlation analysis is done to characteristic data set, rejects the characteristic of redundancy.Similar characteristic attribute it Between correlation it is higher, may be redundancy for subsequent network behavior classification, therefore can only retain one of or will A variety of relevant characteristics merge.For example, the correlation between age of user and date of birth is higher, can only select Retain one of feature or both characteristics are merged into a kind of new feature.As another example, network per second Correlation between flow and every 2 seconds network flows is higher, can only select to retain one of feature or both are special Sign data are merged into a kind of new feature (for example, being averaged).By rejecting the data characteristics of redundancy, it can be generated and simplify feature Data set comprising the characteristic of the network behavior feature with low correlation each other.
Step 204, judge whether the correlation between each network behavior feature remained meets the requirements (for example, Correlation is less than threshold value), characteristic data set is simplified if it is, exporting, otherwise return step 202.
According to an aspect of the present invention, Feature Selection and feature phase are passed through to the network behavior characteristic of primary acquisition The detection of closing property, can reject unrelated and/or redundancy characteristic information, guarantee the mutual correlation of extracted each feature It is very low, to not only reduce data volume to be treated, avoids redundancy feature and interfere with each other, improve data analysis result Reliability, but also data characteristics dimension reduces, the high efficiency for follow-up data processing lays the foundation.
On the contrary, without Feature Selection and/or feature correlation detection, if directly using collected Primitive attribute is classified as feature, these attributes will lead to classification results there may be redundancy and invalid information Inaccuracy.According to the present invention, feature extraction and screening, correlation detection, institute are carried out to primitive network behavioral data obtained The network behavior characteristic information of acquisition can more effectively represent network behavior occurred, to improve the standard of subsequent classification True property.
Fig. 3 is the schematic diagram of data preprocessing method according to an embodiment of the invention.
Step 301, characteristic data set is obtained, each data entry in this feature data set may include indicating network behavior Characteristic and corresponding label information.For example, can obtain step 202 generate characteristic data set or step 203, Characteristic data set is simplified in 204 generations.It in one embodiment, can be to characteristic data set for web-based history behavioral data Artificial mark is carried out, determines that each user network behavior belongs to which kind of classification (that is, label information), for example, browsing webpage, viewing Video, downloading file, release information (for example, publication duplicate message, publication false propaganda), abnormal transaction, network fraud, hacker Attack etc..
Step 302, it selects and initializes clustering algorithm.For example, it is pre- to carry out data using unsupervised clustering algorithm Processing.Non-limiting as example, clustering algorithm may include K-MEANS (K mean value) algorithm, BIRCH algorithm, DBSCAN algorithm Deng.
Step 303, clustering processing is carried out to obtain one or more clusters to the characteristic in this feature data set.It can Cluster centre and parameter are determined according to selected clustering algorithm.Cluster can be unsupervised learning, that is, not need to indicate number Classification or i.e. label information according to classification.Therefore, in cluster process, the characteristic in this feature data set can be used only According to without using label information.
Step 304, determine whether cluster restrains, it, should if convergence (that is, successfully obtaining one or more cluster clusters) Process advances to step 305, otherwise return step 303, re-starts clustering processing.
Step 305, it rejects and peels off data entry belonging to characteristic to obtain new data set.For example, can obtain every Each characteristic in a cluster is rejected and is deviated considerably from belonging to the characteristic of cluster centre to the distance of its cluster centre Data entry.In another embodiment, data entry belonging to the characteristic in smaller cluster can also be abandoned, for example, such as The quantity of characteristic in one cluster of fruit is less than threshold value, then can abandon data strip belonging to the characteristic in the cluster Mesh.
The characteristic that peels off may be dirty data or invalid data, there is the accuracy that will affect subsequent classification.As a result, Characteristic finds respective cluster point after cluster, then weeds out dirty data and invalid data, and doing so can be very big Degree avoids dirty data bring from adversely affecting.
If not using unsupervised approaches to be clustered to reject part dirty data and invalid data, it will influence classification Device training, to influence the accuracy of gained classifier.On the contrary, in the present invention, by unsupervised learning clustering algorithm to spy It levies data and carries out clustering processing, remove the interference that unrelated/invalid data handles subsequent classification, the number of classification processing can be reduced According to amount, treatment effeciency is improved.Influence of the dirty data to sorting algorithm can be additionally reduced, can be improved user network behavior point The accuracy of class.It is non-limiting as example, certain user viewing news, flow should be it is lesser, but Web page picture and Video generates high flow, so that the flow of the user network behavior deviates from normal discharge.It, can be by this by clustering algorithm Kind of abnormal high flow capacity behavior is rejected as dirty data, thus will not influence it is subsequent according to flow to determine whether belonging to viewing newly The accuracy of news behavior.
Fig. 4 is the schematic diagram of classifier training method according to an embodiment of the invention.More specifically, Fig. 4 is shown Usage history data set trains the process of classifier.It is, for example, possible to use the numbers obtained by data prediction shown in Fig. 3 Classifier is trained as history data set according to collection.
Step 401, history data set can be divided into training set and test set.History data set may include multiple data Entry (for example, 1000 datas, every data corresponds to the primary network behavior of a user), and every data includes table Show network behavior characteristic and corresponding label information (such as, if be browsing webpage, whether be read news, whether To watch video, whether being publication deceptive information etc.).The label information of every data can be to be obtained by artificial mark mode 's.It is non-limiting as example, historical data can be concentrated 80% data entry as training set, and by 20% number According to entry as test set.
Step 402, selection sort device model and initiation parameter.It can according to need and practice to select or design and be suitble to Sorter model, such as the classifier based on linear function or distance function, the classifier based on decision tree or neural network. The present invention is unrestricted in this regard.
Step 403, sorter model is trained using training set.It is, for example, possible to use the data entry institutes in training set Including characteristic and corresponding label information train sorter model.Sorter model training be a parameter learning and The loop iteration process of tuning, enables finally obtained sorter model to be fitted the data of training set well.
Step 404, judge whether the training to sorter model restrains, advance to step 405 if convergence, otherwise return Continue to train to step 403.It can judge whether the training to sorter model restrains there are many mode, such as can determine pair Whether the training of sorter model reaches preset convergence threshold.Non-limiting as example, convergence threshold can be for example:
(1) error: if the error of sorter model is less than threshold value, it is believed that the training to sorter model restrains. Error indicates the difference between actual prediction output and the true output of sample, such as least square can be used for curve matching Method determines error.
(2) accuracy rate: if the classification accuracy of model reaches threshold value (for example, 90%), it is believed that classifier mould The training convergence of type.
(3) cycle-index: if the cycle-index (for example, arameter optimization) being trained using training set reaches threshold value (for example, 50 times), then it is believed that the training to sorter model restrains.
Can according to need with concrete practice select one or more convergence thresholds and/or be arranged convergence threshold it is specific Value, and be not limited to be given above specific example.For example, the combination of error and cycle-index can be used, once meet wherein Any one criterion is considered as restraining the training of sorter model and advances to step 405.In practice can also dynamic or The value of convergence threshold and/or convergence threshold is adjusted in real time.
Step 405, the accuracy rate of testing classification device model is carried out using test set, if (for example, 80%) up to standard, indicates to divide Class device is trained successfully, and sorter model is saved;If not up to standard, return to step 402 and reselect and train sorter model. As can be seen that classifier is relatively high to accuracy rate requirement, if accuracy rate is below standard, accordingly from the training process of classifier Sorter model cannot may obtain reasonable classification results in the application, sorter model should be reselected and divided Class.
In the prior art, the model of classifier training may be because over-fitting and cause generalization ability poor, so that real Border classification accuracy is lower.During classifier training according to the present invention, suitable convergence threshold is selected, reaches convergence threshold Value be returned to sorter model as a result, rather than allow classifier always training go down, model over-fitting can be effectively reduced Bring training set accuracy rate is high, and the problem that test set accuracy rate is lower.
Freshly harvested network behavior data can be then applied to (that is, answering according to Fig. 4 training sorter model obtained With the stage).It, can be by step 102 according to new network behavior/Event Distillation user network row in the sorter model application stage It is characterized, necessary data prediction is carried out by step 104, it is special to pass through pretreated user network behavior in step 106 It levies data and inputs sorter model, so that it may export user network behavior type from sorter model.For example, when user connects net When network, by the way that user network behavioural information collected is input to trained sorter model, it can determine user's Whether network behavior is browsing webpage, viewing video, downloading file, release information, abnormal transaction, network fraud, hacker attack Deng.As a result, when classifier exports the classification of bad network behavior, it can take appropriate measures and block the network behavior of user. The new network behavior feature and classification results are (in conjunction with other feedback informations, such as subsequent confirmation or the judgement for denying classifier As a result) classification of assessment device model further can also be used in step 108.If sorter model is fitted under the performance of new data Drop then can carry out re -training to model.
Fig. 5 is the block diagram of network behavior sorter 500 according to an embodiment of the invention.Network behavior classification dress Setting 500 may include data acquisition module 501, data preprocessing module 502, categorization module 503.Data acquisition module 501 can be used Each data entry in acquisition (for example, receive or generate) the first data set, the first data set includes indicating network behavior Characteristic and corresponding label information.Data acquisition module 501 can also be generated according to the method described above by reference to Fig. 2 First data set.For example, data acquisition module 501 can extract one or more network behaviors spies from primitive network behavioral data Sign determines the correlation between the one or more network behavior feature, and generating includes the net each other with low correlation First data set of network behavioural characteristic.For example, to be examined by Feature Selection and feature correlation to the data attribute of primary acquisition It surveys, if correlation detection does not pass through, to continue to screen feature, until reaching the requirement of correlation detection.It in this way can be effective Reduce the excessively high bring information redundancy of correlation between characteristic attribute.
Data preprocessing module 502 can be used for carrying out the characteristic in the first data set clustering processing to obtain one Or multiple clusters, and data belonging to the characteristic to peel off are rejected from the first data set according to the one or more cluster Entry is to obtain the second data set.For example, data preprocessing module 502 can be according to the method described above by reference to Fig. 3 come to One data set carries out clustering processing, rejects data belonging to the characteristic for deviateing cluster centre in one or more of clusters Entry, or if the quantity of the characteristic in a cluster is less than threshold value, abandon belonging to the characteristic in the cluster Data entry.Clustering processing is carried out to characteristic by unsupervised learning clustering algorithm, removes unrelated/invalid data to subsequent The interference of classification processing can reduce the data volume of classification processing, improve treatment effeciency.Can additionally reduce dirty data to point The influence of class algorithm can be improved the accuracy of user network behavior classification.
Categorization module 503 may be used in the second data set to train sorter model.Categorization module 503 can be more than Sorter model is trained referring to the method for Fig. 4 description, such as can determine whether the training to sorter model reaches convergence threshold Value, and the deconditioning sorter model when reaching convergence threshold.By selecting suitable convergence threshold, convergence threshold is reached Sorter model is returned to as a result, model over-fitting bring training set accuracy rate height can be effectively reduced, and test set The lower problem of accuracy rate.In addition, housebroken sorter model can be used for freshly harvested user network behavioural information into Row classification.
The invention proposes efficient network behavior classification methods, can reduce sorter model while guaranteeing accuracy rate Performance cost.Network behavior classification method of the invention and device may be especially suitable for identifying abnormal network row in real time For.General network behavior classification method is classified according to the flow of the network user, is had the disadvantage in that
1. primitive attribute collected is directly used to classify as feature, there may be redundancies in these attributes And invalid information, it will lead to that data volume is larger and classification results are inaccurate;
2. characteristic is concentrated, there may be abnormal datas, are also likely to be present label by the data set that artificial mark obtains The situation of information inaccuracy, these dirty datas and invalid data will affect classification results;
3. the model of classifier training may be because over-fitting and cause actual classification accuracy rate very low.
The combination of the various technical characteristics used through the invention can in real time, accurately identify various network rows For, avoid one of above-mentioned technological deficiency or a variety of, so as to when network behavior is occurring effectively block or Intervene harmful network behavior.
Network behavior classification method described above and each step and module of device can with hardware, software or its Combination is to realize.If realized within hardware, various illustrative steps, module and the circuit described in conjunction with the present invention is available General processor, digital signal processor (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) or its His programmable logic components, hardware component, or any combination thereof realize or execute.General processor can be processor, micro- Processor, controller, microcontroller or state machine etc..If realized in software, in conjunction with the various explanations of the invention described Property step, module can be used as one or more instruction or code and may be stored on the computer-readable medium or be transmitted.It realizes The software module of various operations of the invention can reside in storage medium, such as RAM, flash memory, ROM, EPROM, EEPROM, deposit Device, hard disk, removable disk, CD-ROM, cloud storage etc..Storage medium can be coupled to processor so that the processor can be from/to The storage medium reading writing information, and corresponding program module is executed to realize each step of the invention.Moreover, software-based Embodiment can be uploaded, download or remotely be accessed by means of communication appropriate.This means of communication appropriate includes example As internet, WWW, Intranet, software application, cable (including fiber optic cables), magnetic communication, electromagnetic communication are (including RF, micro- Wave and infrared communication), electronic communication or other such means of communication.
It shall yet further be noted that these embodiments are probably as the process for being depicted as flow chart, flow graph, structure chart or block diagram Come what is described.Although all operations may be described as sequential process by flow chart, many of these operations operation can It executes parallel or concurrently.In addition, the order of these operations can be rearranged.
Disclosed methods, devices and systems should not be limited in any way.On the contrary, the present invention cover it is various disclosed Embodiment (individually and various combinations with one another and sub-portfolio) all novel and non-obvious feature and aspects.Institute is public The methods, devices and systems opened are not limited to any specific aspect or feature or their combination, disclosed any embodiment It does not require the existence of any one or more specific advantages or solves specific or all technical problems.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Change, these are within the scope of the present invention.

Claims (17)

1. a kind of method for network behavior classification characterized by comprising
Obtain the first data set, each data entry in first data set include indicate network behavior characteristic and Corresponding label information;
Clustering processing is carried out to obtain one or more clusters to the characteristic in first data set;
Data strip belonging to the characteristic to peel off is rejected from first data set according to one or more of clusters Mesh is to obtain the second data set;And
Classification is trained using characteristic included by the data entry in second data set and corresponding label information Device model.
2. the method as described in claim 1, which is characterized in that further comprise:
One or more network behavior features are extracted from primitive network behavioral data;
Determine the correlation between one or more of network behavior features;And
First data set is generated, first data set includes the feature of the network behavior feature with low correlation each other Data.
3. the method as described in claim 1, which is characterized in that reject the characteristic institute to peel off from first data set The data entry of category includes:
It rejects and deviates data entry belonging to the characteristic of cluster centre in one or more of clusters;Or
If the quantity of the characteristic in a cluster is less than threshold value, data strip belonging to the characteristic in the cluster is abandoned Mesh.
4. the method as described in claim 1, which is characterized in that further include:
Classified using housebroken sorter model to user network behavioural information collected.
5. the method as described in claim 1, which is characterized in that the training sorter model includes:
Whether determination restrains the training of the sorter model;And
The sorter model described in deconditioning when determining the training convergence.
6. method as claimed in claim 5, which is characterized in that occur it is following one or more when, determine that the training is received It holds back:
The error of the sorter model reaches threshold value;
The classification accuracy of the sorter model reaches threshold value;And
The cycle-index of the training reaches threshold value.
7. the method as described in claim 1, which is characterized in that the characteristic include it is following one or more:
IP address, port, network protocol, rate of discharge, inlet flow rate, duration, website information.
8. the method as described in claim 1, which is characterized in that the classification of the sorter model includes following one or more Person:
Browse webpage, viewing video, downloading file, release information, abnormal transaction, network fraud, hacker attack.
9. a kind of device for network behavior classification characterized by comprising
Data acquisition module, for obtaining the first data set, each data entry in first data set includes indicating net The characteristic of network behavior and corresponding label information;
Data preprocessing module, for carrying out clustering processing to the characteristic in first data set to obtain one or more A cluster, and number belonging to the characteristic to peel off is rejected from first data set according to one or more of clusters According to entry to obtain the second data set;And
Categorization module, for using characteristic included by the data entry in second data set and corresponding label to believe Breath is to train sorter model.
10. device as claimed in claim 9, which is characterized in that the data acquisition module is further configured to
One or more network behavior features are extracted from primitive network behavioral data;
Determine the correlation between one or more of network behavior features;And
First data set is generated, first data set includes the feature of the network behavior feature with low correlation each other Data.
11. device as claimed in claim 9, which is characterized in that the data preprocessing module is from first data set Rejecting data entry belonging to the characteristic to peel off includes:
It rejects and deviates data entry belonging to the characteristic of cluster centre in one or more of clusters;Or
If the quantity of the characteristic in a cluster is less than threshold value, data strip belonging to the characteristic in the cluster is abandoned Mesh.
12. device as claimed in claim 9, which is characterized in that the categorization module is further configured to:
Classified using housebroken sorter model to user network behavioural information collected.
13. device as claimed in claim 9, which is characterized in that the categorization module is further configured to:
Whether determination restrains the training of sorter model;And
The sorter model described in deconditioning when determining the training convergence.
14. device as claimed in claim 13, which is characterized in that occur it is following one or more when, determine the training Convergence:
The error of the sorter model reaches threshold value;
The classification accuracy of the sorter model reaches threshold value;And
The cycle-index of the training reaches threshold value.
15. device as claimed in claim 9, which is characterized in that the characteristic include it is following one or more:
IP address, port, network protocol, rate of discharge, inlet flow rate, duration, website information.
16. device as claimed in claim 9, which is characterized in that the classification of the sorter model includes following one or more Person:
Browse webpage, viewing video, downloading file, release information, abnormal transaction, network fraud, hacker attack.
17. a kind of system for network behavior classification characterized by comprising
Processor;
For the memory of storage processor executable instruction,
Wherein the processor is configured to execute the processor-executable instruction to realize such as any one of claim 1-8 The method.
CN201811482192.9A 2018-12-05 2018-12-05 A kind of network behavior classification method and device Pending CN110008976A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811482192.9A CN110008976A (en) 2018-12-05 2018-12-05 A kind of network behavior classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811482192.9A CN110008976A (en) 2018-12-05 2018-12-05 A kind of network behavior classification method and device

Publications (1)

Publication Number Publication Date
CN110008976A true CN110008976A (en) 2019-07-12

Family

ID=67165065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811482192.9A Pending CN110008976A (en) 2018-12-05 2018-12-05 A kind of network behavior classification method and device

Country Status (1)

Country Link
CN (1) CN110008976A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991762A (en) * 2019-12-13 2020-04-10 新奥数能科技有限公司 Prediction method, prediction device, computer-readable storage medium and electronic equipment
CN111325260A (en) * 2020-02-14 2020-06-23 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and computer readable medium
CN112465533A (en) * 2019-09-09 2021-03-09 中国移动通信集团河北有限公司 Intelligent product selection method and device and computing equipment
CN112464978A (en) * 2021-01-15 2021-03-09 北京软慧科技有限公司 Method and device for identifying abnormal terminals of Internet of vehicles
CN113672675A (en) * 2021-08-09 2021-11-19 北京字跳网络技术有限公司 Data detection method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106707233A (en) * 2017-03-03 2017-05-24 广东工业大学 Multi-side positioning method and multi-side positioning device based on outlier detection
CN107194430A (en) * 2017-05-27 2017-09-22 北京三快在线科技有限公司 A kind of screening sample method and device, electronic equipment
CN107864168A (en) * 2016-09-22 2018-03-30 华为技术有限公司 A kind of method and system of network data flow classification
CN108512674A (en) * 2017-02-24 2018-09-07 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for output information
CN108540451A (en) * 2018-03-13 2018-09-14 北京理工大学 A method of classification and Detection being carried out to attack with machine learning techniques
CN108596045A (en) * 2018-04-02 2018-09-28 四川大学 A kind of group abnormality behavioral value method based on aerial monitor supervision platform

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107864168A (en) * 2016-09-22 2018-03-30 华为技术有限公司 A kind of method and system of network data flow classification
CN108512674A (en) * 2017-02-24 2018-09-07 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for output information
CN106707233A (en) * 2017-03-03 2017-05-24 广东工业大学 Multi-side positioning method and multi-side positioning device based on outlier detection
CN107194430A (en) * 2017-05-27 2017-09-22 北京三快在线科技有限公司 A kind of screening sample method and device, electronic equipment
CN108540451A (en) * 2018-03-13 2018-09-14 北京理工大学 A method of classification and Detection being carried out to attack with machine learning techniques
CN108596045A (en) * 2018-04-02 2018-09-28 四川大学 A kind of group abnormality behavioral value method based on aerial monitor supervision platform

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
洪天一: "基于数据挖掘的计算机审计方法研究与实现", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
邹凌君: "流数据的聚类分类算法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
韦海宇等: "基于改进极端随机树的异常网络流量分类", 《计算机工程》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465533A (en) * 2019-09-09 2021-03-09 中国移动通信集团河北有限公司 Intelligent product selection method and device and computing equipment
CN110991762A (en) * 2019-12-13 2020-04-10 新奥数能科技有限公司 Prediction method, prediction device, computer-readable storage medium and electronic equipment
CN111325260A (en) * 2020-02-14 2020-06-23 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and computer readable medium
CN111325260B (en) * 2020-02-14 2023-10-27 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and computer readable medium
CN112464978A (en) * 2021-01-15 2021-03-09 北京软慧科技有限公司 Method and device for identifying abnormal terminals of Internet of vehicles
CN112464978B (en) * 2021-01-15 2024-03-01 北京智联安行科技有限公司 Method and device for identifying abnormal terminals of Internet of vehicles
CN113672675A (en) * 2021-08-09 2021-11-19 北京字跳网络技术有限公司 Data detection method and device and electronic equipment
CN113672675B (en) * 2021-08-09 2023-12-15 北京字跳网络技术有限公司 Data detection method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN110008976A (en) A kind of network behavior classification method and device
CN106506454B (en) Cheat business recognition method and device
CN105678125B (en) A kind of user authen method, device
CN103812961B (en) Identify and specify the method and apparatus of classification IP address, defence method and system
CN108734184B (en) Method and device for analyzing sensitive image
CN108038778A (en) Clique's fraud recognition methods of the small micro- loan of internet finance and device
Sherly et al. BOAT adaptive credit card fraud detection system
CN112488716B (en) Abnormal event detection system
CN105574544A (en) Data processing method and device
CN110162970A (en) A kind of program processing method, device and relevant device
CN110998608A (en) Machine learning system for various computer applications
CN106897931A (en) A kind of recognition methods of abnormal transaction data and device
CN112329811A (en) Abnormal account identification method and device, computer equipment and storage medium
CN107368856A (en) Clustering method and device, the computer installation and readable storage medium storing program for executing of Malware
CN108268886A (en) For identifying the method and system of plug-in operation
CN110083507A (en) Key Performance Indicator classification method and device
CN114187036B (en) Internet advertisement intelligent recommendation management system based on behavior characteristic recognition
CN102118382A (en) System and method for detecting attack of collaborative recommender based on interest combination
CN112016769B (en) Method and device for managing relative person risk prediction and information recommendation
CN109740335A (en) The classification method and device of identifying code operation trace
CN113988190A (en) Customer intention analysis method, apparatus, device and storage medium
CN110033031B (en) Group detection method, device, computing equipment and machine-readable storage medium
Rahmani et al. Detecting fraudulent transactions in banking cards using scale‐free graphs
Wang et al. Bot-like Behavior Detection in Online Banking
CN112801783A (en) Entity identification method and device based on digital currency transaction characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200921

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200921

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190712