CN109063721A - A kind of method and device that behavioural characteristic data are extracted - Google Patents

A kind of method and device that behavioural characteristic data are extracted Download PDF

Info

Publication number
CN109063721A
CN109063721A CN201810576742.7A CN201810576742A CN109063721A CN 109063721 A CN109063721 A CN 109063721A CN 201810576742 A CN201810576742 A CN 201810576742A CN 109063721 A CN109063721 A CN 109063721A
Authority
CN
China
Prior art keywords
characteristic
data
feature
weight
feature set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810576742.7A
Other languages
Chinese (zh)
Inventor
雷璟
温涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Electronics Technology Group Corp CETC
Electronic Science Research Institute of CTEC
Original Assignee
China Electronics Technology Group Corp CETC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Electronics Technology Group Corp CETC filed Critical China Electronics Technology Group Corp CETC
Priority to CN201810576742.7A priority Critical patent/CN109063721A/en
Publication of CN109063721A publication Critical patent/CN109063721A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the method and devices that a kind of behavioural characteristic data are extracted, wherein method includes: to extract characteristic in the user behavior data got, according to characteristic construction feature collection;It selects a characteristic for benchmark characteristic one by one in feature set, the characteristic weighting in feature set in addition to reference characteristic data is divided into neighborhood and remote domain according to pre-determined distance threshold value, the weight of reference characteristic data is assessed by mutual information;By weight feedback into the corresponding weight vector of reference characteristic data, and the smallest characteristic of weight is deleted in feature set;The smallest characteristic of weight is gradually deleted, the corresponding feature set of characteristic and the preset corresponding weight vector of threshold when the number and weight vector of feature set character pair data are stablized, after output is stable.Thus it solves abnormality detection technical characteristic in the prior art and extracts difficult, low to scale sexual abnormality detection efficiency technical problem.

Description

A kind of method and device that behavioural characteristic data are extracted
Technical field
The method and dress extracted the present invention relates to big data processing technology field more particularly to a kind of behavioural characteristic data It sets.
Background technique
The research of recent domestic related network behavior is analyzed and researched generally by a large number of services data, is mentioned Produce the mathematical model of the reflection certain genuine properties of network.Traditional simple feature extraction and abnormality detection based on traffic statistics There are higher rate of failing to report and rate of false alarm, it has been not enough to cope with the dynamic network environment that becomes increasingly complex.
The business datum amount of network behavior is big, processing difficulty is high, and the abnormality detection technology of existing Behavior-based control analysis is big More are confined to a certain behavior level, individual data source.It is individually detected from a behavior level, individual data source different Frequentation toward having one-sidedness, and cannot make user be fully understood by it is abnormal there is a phenomenon where and essence.It is different in agreement behavior layer Often in detection, most of technology all only focuses on transport layer and network layer feature, and the operation of certain application protocols is at these Level is difficult to embody, and application layer protocol abnormality detection needs further to be studied.
Since the business datum amount of network behavior is big, processing difficulty is high, abnormality detection technology is mainly used for individual course, does not have Each layer is comprehensively considered.Existing abnormality detection technology existing characteristics extract difficulty, low to scale sexual abnormality detection efficiency Technical problem.
Summary of the invention
The present invention provides a kind of method, apparatus, computer readable storage medium and equipment that behavioural characteristic data are extracted, and uses Difficult, low to scale sexual abnormality detection efficiency technical problem is extracted to solve abnormality detection technical characteristic in the prior art.
According to one aspect of the present invention, a kind of method that behavioural characteristic data are extracted is provided, which comprises
Characteristic is extracted in the user behavior data got, according to characteristic construction feature collection;
Select a characteristic for benchmark characteristic one by one in feature set, it will be in feature set according to pre-determined distance threshold value Characteristic weighting in addition to reference characteristic data is divided into neighborhood and remote domain, assesses reference characteristic data by mutual information Weight;
By weight feedback into the corresponding weight vector of reference characteristic data, and the smallest spy of weight is deleted in feature set Levy data;
The smallest characteristic of weight is gradually deleted, is stablized in the number and weight vector of feature set character pair data When, the corresponding feature set of characteristic and the preset corresponding weight vector of threshold after output is stable.
Optionally, according to characteristic construction feature collection, comprising:
According to each characteristic to the disturbance degree of characteristic population entropy, the characteristic extracted is screened, Candidate characteristic set is constructed by the characteristic after screening.
Optionally, the user behavior data got includes: traffic behavior layer data, agreement behavior layer data, Yi Jiyong Family behavior layer data.
Optionally, before according to characteristic construction feature collection, further includes:
The characteristic extracted is normalized, section registration process.
Optionally, the weight of reference characteristic data is assessed by mutual information, comprising:
The feature correlation and feature redundancy of reference characteristic data are assessed by mutual information;
The weight of reference characteristic is obtained according to feature correlation and feature redundancy.
Two aspects according to the present invention, provide a kind of device that behavioural characteristic data are extracted, and described device includes:
Feature set module is constructed for extracting characteristic in the user behavior data got according to characteristic Feature set;
Weight computing module, for selecting a characteristic for benchmark characteristic one by one in feature set, according to default Characteristic weighting in feature set in addition to reference characteristic data is divided into neighborhood and remote domain by distance threshold, passes through mutual information Assess the weight of reference characteristic data;
Screening module is used for by weight feedback into the corresponding weight vector of reference characteristic data, and deletes in feature set Except the smallest characteristic of weight;
Characteristic output module, for gradually deleting the smallest characteristic of weight, in feature set character pair data Number and weight vector stablize when, output stablize after the corresponding feature set of characteristic and the preset corresponding weight of threshold to Amount.
Optionally, feature set module includes:
Screening unit, for the disturbance degree according to each characteristic to characteristic population entropy, to the feature extracted Data are screened, and construct candidate characteristic set by the characteristic after screening.
Optionally, the user behavior data got includes: traffic behavior layer data, agreement behavior layer data, Yi Jiyong Family behavior layer data.
Optionally, feature set module further include:
Characteristic integral unit, for the characteristic extracted to be normalized, section registration process.
Optionally, weight computing module includes:
Characteristics unit, for assessing the feature correlation and feature redundancy of reference characteristic data by mutual information;
Weight unit, for obtaining the weight of reference characteristic according to feature correlation and feature redundancy.
The method and device that a kind of behavioural characteristic data according to the present invention are extracted, by selecting one one by one in feature set Characteristic is benchmark characteristic, according to pre-determined distance threshold value by the characteristic in feature set in addition to reference characteristic data Weighting is divided into neighborhood and remote domain, the weight of reference characteristic data is assessed by mutual information, by weight feedback to reference characteristic number According in corresponding weight vector, and the smallest characteristic of weight is deleted in feature set, gradually delete the smallest feature of weight Data, the corresponding spy of characteristic when the number and weight vector of feature set character pair data are stablized, after output is stable Collection and the preset corresponding weight vector of threshold.It is assessed by the weight to the characteristic got, it is too small to delete weight Number of features, it is known that corresponding characteristic weight tends towards stability in feature set, output stablize after characteristic it is corresponding Feature set and the preset corresponding weight vector of threshold solve abnormality detection technical characteristic extraction difficulty in the prior art, to scale The low technical problem of sexual abnormality detection efficiency improves the efficiency to the detection of scale sexual abnormality.
Above description is only the general introduction of technical solution of the embodiment of the present invention, in order to better understand the embodiment of the present invention Technological means, and can be implemented in accordance with the contents of the specification, and in order to allow above and other mesh of the embodiment of the present invention , feature and advantage can be more clearly understood, the special specific embodiment for lifting the embodiment of the present invention below.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention The limitation of embodiment.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is the flow chart for the method that a kind of behavioural characteristic data that first embodiment of the invention provides are extracted;
Fig. 2 is the flow chart for the method that a kind of behavioural characteristic data that second embodiment of the invention provides are extracted;
Fig. 3 is the flow chart for the method that a kind of behavioural characteristic data that third embodiment of the invention provides are extracted;
Fig. 4 is the flow chart for the method that a kind of behavioural characteristic data that fourth embodiment of the invention provides are extracted;
Fig. 5 is the functional module signal for the device that a kind of behavioural characteristic data that fifth embodiment of the invention provides are extracted Figure;
Fig. 6 is the functional module signal for the device that a kind of behavioural characteristic data that sixth embodiment of the invention provides are extracted Figure;
Fig. 7 is the functional module signal for the device that a kind of behavioural characteristic data that seventh embodiment of the invention provides are extracted Figure;
Fig. 8 is the functional module signal for the device that a kind of behavioural characteristic data that eighth embodiment of the invention provides are extracted Figure.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Referring to Fig. 1, implementing the flow chart for the method that a kind of behavioural characteristic data provided are extracted for the present invention first.This In embodiment, the method that the behavioural characteristic data are extracted includes the following steps:
Step S101 extracts characteristic in the user behavior data got, according to characteristic construction feature collection.
Wherein, the user behavior data got includes the characteristic got under user's normal use and makes in user With the characteristic got under abnormal behaviour, according to the characteristic construction feature collection got.
Optionally, the user behavior data got includes: traffic behavior layer data, agreement behavior layer data, Yi Jiyong Family behavior layer data.
, it will be clear that obtaining traffic behavior layer data, agreement row to promote the accuracy to Network anomaly detection For layer data and user behavior layer data.Wherein, user behavior layer data includes user's login time, search sensitive word, uses Open the data such as webpage in family;Agreement behavior layer data include the data such as IP address, port numbers, IP packet header length, function code;Stream Amount behavior layer data include the data such as SYN packet ratio, IP comentropy, IP correlation, TTL distribution.By to three level uplinks With the characteristics of and the relevant technologies extraction, by detecting to stage construction abnormal behaviour, the running for coordinating each behavior level is closed System, improves the comprehensive and accuracy of abnormality detection.
Step S102 selects a characteristic for benchmark characteristic, according to pre-determined distance threshold value one by one in feature set Characteristic weighting in feature set in addition to reference characteristic data is divided into neighborhood and remote domain, benchmark is assessed by mutual information The weight of characteristic.
When it is implemented, selecting a characteristic for benchmark characteristic one by one in feature set, according to preset threshold Characteristic weighting in feature set in addition to reference characteristic data is divided into field and remote domain by value, is specifically included:
Processing is weighted to each characteristic in the feature set got;Such as, according to each in feature set Characteristic is weighted processing to characteristic to the influence degree of network behavior.For example, hacker attacks the malice of buying net Hitting is that corresponding operation is executed by function code, and the corresponding characteristic of selection function code weights x1;For selection IP agreement net Network communication selects protocol identifier to be corresponding characteristic weighting x2, to prevent malicious attack from changing agreement, for source and mesh IP address can disclose the main body of malicious attack, so two IP address of selection make corresponding characteristic weighting x3 and weighting X4, attacker attack the message that may generate deformity using IP agreement, and data length can disclose this variation, select Dendron characteristic weighing x5 of the length as data, may be due to accessing unknown data address when attack access equipment data It shows attack signature, selects the corresponding characteristic of data address to weight x6, obtain weight vector w (t)t→∞={ w1,w2, Λ,wom, om is characterized the number of data.
NeighborhoodIndicate xiWeighted distance at weight vector w (t) is less than the set of all the points of given threshold value:
Wherein, d (xi,xj| w (t)) it is Weighted distance under w (t), r is pre-determined distance threshold value.
Remote domainIndicate xiWeighted distance at weight vector w (t) is greater than the set of all the points of given threshold value:
Further, x can be removed for network data flow is interioriOuter any object xjTag field label cj, it is as follows:
Training data flow point can be divided into object xiOn the basis of neighborhood, remote domain.By the characteristic data flow in feature set After weighting is divided into neighborhood, remote domain, the weight of the reference characteristic data is assessed using mutual information, and then calculate the important journey of feature Degree.
Step S103 by weight feedback into the corresponding weight vector of reference characteristic data, and deletes power in feature set It is worth the smallest characteristic.
When it is implemented, needing to screen characteristic since characteristic amount is big, by weight feedback to benchmark spy It levies in the corresponding weight vector of data, the smallest characteristic of weight is deleted in feature set.
Step S104 gradually deletes the smallest characteristic of weight, in the number and weight of feature set character pair data The corresponding feature set of characteristic and the preset corresponding weight vector of threshold when vector is stablized, after output is stable.
In the present embodiment, by selecting a characteristic for benchmark characteristic one by one in feature set, according to it is default away from The characteristic weighting in feature set in addition to reference characteristic data is divided into neighborhood and remote domain from threshold value, is commented by mutual information The weight for estimating reference characteristic data by weight feedback into the corresponding weight vector of reference characteristic data, and is deleted in feature set Except the smallest characteristic of weight, gradually delete the smallest characteristic of weight, feature set character pair data number and The corresponding feature set of characteristic and the preset corresponding weight vector of threshold when weight vector is stablized, after output is stable.By right The weight of the characteristic got is assessed, and deletes the too small number of features of weight, it is known that corresponding feature in feature set Data weight tends towards stability, and the corresponding feature set of characteristic and the preset corresponding weight vector of threshold after output is stable solve Abnormality detection technical characteristic extracts difficult, to scale sexual abnormality detection efficiency low technical problem in the prior art, is promoted To the efficiency of scale sexual abnormality detection.
Referring to Fig. 2, implementing the flow chart for the method that a kind of behavioural characteristic data provided are extracted for the present invention second.This In embodiment, the method that the behavioural characteristic data are extracted includes the following steps:
Step S201 extracts characteristic in the user behavior data got.
Step S202, according to each characteristic to the disturbance degree of characteristic population entropy, to the characteristic extracted It is screened, candidate characteristic set is constructed by the characteristic after screening.
When it is implemented, the corresponding characteristic characteristic dimension of the feature set actually got is huge, by whole characteristics Entering feature set according to the consideration and carrying out operation can take a substantial amount of time and space.Therefore before weighted feature selection, by commenting Feature is estimated to the influence degree of feature set population entropy, is removed inessential or unrelated attribute, is constructed candidate characteristic set Sm:
The similarity in feature set between any two object is calculated, to obtain the similarity matrix S=of feature set (Sij)n*n, define any two characteristic xi, xjBetween comentropy.
Data flow entirety entropy is the comentropy average value of all characteristics in this section of network data flow, calculation formula are as follows:
To each characteristic S in primitive character collection Sk, removal this feature data S is calculated using formula abovekIt is whole afterwards The incrementss of entropy, the Pre-Evaluation function as characteristic:
preM(sk)=E (S-sk)-E (S) w (t)=[1 ..., 1]
Wherein, preM (sk) value range be [- 1,1].If preM (sk) > 0, then prove that removal this feature data can make spy It collects population entropy to increase, illustrates that it is conducive to cluster process, preM (sk) preM (sk) < 0 when, it was demonstrated that removal this feature Feature set population entropy can be made to reduce, illustrate that this feature is detrimental to the unrelated or noise characteristic data of cluster, preM (sk) value Closer -1, indicate that this feature data are bigger to the negative interaction of cluster.It follows that can be by Pre-Evaluation function to feature set In all characteristics be ranked up, unrelated or noise characteristic data are rejected by the wealthy value of setting, filter out characteristic structure Build candidate characteristic set Sm
Step S203 selects a characteristic for benchmark characteristic, according to pre-determined distance threshold value one by one in feature set Characteristic weighting in feature set in addition to reference characteristic data is divided into neighborhood and remote domain, benchmark is assessed by mutual information The weight of characteristic.
Step S204 by weight feedback into the corresponding weight vector of reference characteristic data, and deletes power in feature set It is worth the smallest characteristic.
Step S205 gradually deletes the smallest characteristic of weight, in the number and weight of feature set character pair data The corresponding feature set of characteristic and the preset corresponding weight vector of threshold when vector is stablized, after output is stable.
Wherein, step S202 to step S205 has been described in detail in the first embodiment, and this will not be repeated here.
In the present embodiment, by calculating each characteristic to the disturbance degree of characteristic population entropy, to the spy extracted Sign data are screened, and are constructed candidate characteristic set by the characteristic after screening, are rejected unrelated or noise characteristic, filter out time Select feature set.The characteristic dimension for reducing the corresponding characteristic of the feature set actually got passes through the feature after screening Data construction feature collection, reduces the time handled unessential characteristic and space.
Referring to Fig. 3, implementing the flow chart for the method that a kind of behavioural characteristic data provided are extracted for third of the present invention.This In embodiment, the method that the behavioural characteristic data are extracted includes the following steps:
Step S301 extracts characteristic in the user behavior data got, carries out to the characteristic extracted Normalization, section registration process, according to treated characteristic construction feature collection.
Optionally, the user behavior data got includes: traffic behavior layer data, agreement behavior layer data, Yi Jiyong Family behavior layer data.
When it is implemented, since the data got include traffic behavior layer data, agreement behavior layer data and user Behavior layer data, characteristic include multiple types.It needs that characteristic is normalized and section registration process, with to obtaining The data got carry out characteristic extraction.It is as follows, normalization and section registration process are illustrated:
Normalized: different server users normal behaviour in same time period has differences, identical service The normal behaviour in device user stage in different times is there is also difference, this is to differentiating that it is difficult that abnormal belt is come.In view of such Situation: although different server different time stage Normal appearances should be different, in the same server in the period, one Whether a user is abnormal only to need to compare with the Normal appearances in the server period.Therefore we change absolute value and are Relative value proposes normalization characteristic processing method.Specific practice is: calculating all user characteristics every day in the same server Average value, obtain average value characteristic time sequenceNormal behaviour benchmark as the server;For with Family u, by its date t feature vectorDivided by the benchmark of this day of server where itThis ratio is exactly after normalizing Numerical value.
Section registration process: the abnormality detection of user behavior includes that horizontal and vertical comparison detects, and laterally detection refers to User to be detected is compared with other users behavior, and longitudinal detect refers to user's current behavior and historical behavior progress Compare detection.Abnormal user is generally not all to show abnormal behaviour always, and the important sampling section to its behavior is abnormal Tendency and abnormal stage of development, but normal users are without this stage.If selected according to method same as abnormal user Sampling interval, then the sampling interval of all normal users is all last T days on data set.It is so formed such a existing As: the feature of normal users is extracted from the stage at the same time, and the feature of abnormal user carrys out self-dispersed time phase.Such as preceding institute It states, time difference meeting classification of disturbance virtually increases the difference of normal users and abnormal user.
For exclusive PCR difference, it is proposed that section registration process: for normal users Ua, we are it same at random The sampling interval of abnormal user a Ub, Ub are selected in one server as abnormal tendency and abnormal stage of development, the sampling of Ua Section is aligned with Ub, chooses same a period of time window.We carry out repeatedly random alignment, and the comparative feature value finally chosen is The average value being repeatedly aligned at random.
Step S302 selects a characteristic for benchmark characteristic, according to pre-determined distance threshold value one by one in feature set Characteristic weighting in feature set in addition to reference characteristic data is divided into neighborhood and remote domain, benchmark is assessed by mutual information The weight of characteristic
Step S303 by weight feedback into the corresponding weight vector of reference characteristic data, and deletes power in feature set It is worth the smallest characteristic
Step S304 gradually deletes the smallest characteristic of weight, in the number and weight of feature set character pair data The corresponding feature set of characteristic and the preset corresponding weight vector of threshold when vector is stablized, after output is stable.
Wherein, step S302 to step S305 has been described in detail in the first embodiment, and this will not be repeated here.
In the present embodiment, by the way that the characteristic extracted is normalized, section registration process, to the user of input The corresponding weight vector of behavioral data is handled, the influence of exclusive PCR difference, and it is sparse, intelligent to solve user behavior data collection Detection algorithm performance is bad, the problem of abnormal behaviour user cannot have correctly been filtered out from a large number of users.Further solve Abnormality detection technical characteristic extracts difficult, to scale sexual abnormality detection efficiency low technical problem in the prior art, is promoted To the efficiency of scale sexual abnormality detection.
Referring to Fig. 4, implementing the flow chart for the method that a kind of behavioural characteristic data provided are extracted for the present invention the 4th.This In embodiment, the method that the behavioural characteristic data are extracted includes the following steps:
Step S401 extracts characteristic in the user behavior data got, carries out to the characteristic extracted Normalization, section registration process, according to treated characteristic construction feature collection.
Optionally, the user behavior data got includes: traffic behavior layer data, agreement behavior layer data, Yi Jiyong Family behavior layer data.
Step S402 selects a characteristic for benchmark characteristic, according to pre-determined distance threshold value one by one in feature set Characteristic weighting in feature set in addition to reference characteristic data is divided into neighborhood and remote domain, benchmark is assessed by mutual information The feature correlation and feature redundancy of characteristic;The power of reference characteristic is obtained according to feature correlation and feature redundancy Value.
When it is implemented, mutual information is a kind of Information Meter for being used to indicate correlation between two event sets in information theory Amount, is defined as I (X, Y)=H (X)+H (Y)-H (X, Y), and wherein H (X, Y) is combination entropy, is defined as:It can be derived from:
Arbitrary characteristics data S is provided according to mutual information conceptkFeature correlation and feature redundancy definition:
Feature correlation indicates characteristic SkWith the relevance of domain label c, feature correlation is bigger, and feature is more outstanding, It can be indicated with family information between the two, calculation formula are as follows:
D(sk, c) and=I (sk,c)
Wherein c is the domain label that formula provides.
Feature redundancy indicates characteristic SkWith feature set SmThe repeatability of middle other feature, redundancy is smaller, characteristic According to more outstanding, available feature data SkWith feature set SmThe Average Mutual expression of middle other feature, calculation formula are as follows:
Wherein suIt is characterized collection SmIn remove characteristic SkArbitrary characteristics data.
Theoretically an outstanding feature needs to have two conditions: higher feature correlation and lower feature redundancy Degree.Accordingly, the characteristic evaluating factor is provided:
ρj=D (sj,c)-λR(sj)
Wherein, λ is a balance parameters, and the relative intensity of redundancy is cured for controlling feature data dependence.It is found that working as When λ value is 0, redundancy influences to disappear, and feature evaluation maximumlly concentrates in correlation;When λ value is 1, feature is commented Estimate and concentrates in redundancy.Due to requirement of the different feature sets under different clustered demands, for redundancy and correlation It is different, if redundancy is more demanding, it is relatively preferable to cluster efficiency, if correlation requirement is higher, clustering precision is often Preferably, λ facilitates the final Clustering Effect of team to introducing and efficiency is balanced and controls.
In the t times iteration, characteristic evaluating set can be obtainedWherein T is the number of iterations, and w (t) is feature weight vector used by current iteration, due to defaulting each characteristic under original state It is identical for the influence degree of data segmentation, therefore w (1)=[1, Λ, 1, Λ, 1].Indicate j-th of spy under the t times iteration The assessment magnitude of sign.Feature weight vector is updated, w (t+1) can be obtained, renewal function is as follows:
Wherein TmFor preset iteration maximum times,The renewal intensity for being is decayed at any time, guarantees convergence rate.w (t+1) tend towards stability or iterate to maximum times TmWhen, feature selecting can stop and return the result.
It should be noted that iteration requires to randomly select characteristic object x every timei, the randomness of selection leads to neighbour The effect of domain analysis is different.In general, if selected xiNeighborhood in data object it is more, and be distributed it is compacter, then selection get over Success, the reference value of neighbor analysis is just bigger, and in contrast, object is fewer, is distributed sparse neighbor analysis reference value and gets over It is small, or even feature selecting can be guided into the direction of mistake, cause efficiency to decline.Join it can be seen that analyzing each iteration neighborhood It is very necessary for examining the assessment of value.
Accordingly, the neighbor analysis reference factor p of the t times iterationl(xi) it is defined as follows:
Wherein, m xiThe number of data object in neighborhood, δ are to be used to standardize neighborhood distance apart from generalized parameter, Obviously to different data sets, δ is different.It is found that when m value is bigger, and neighborhood object and xiAverage distance it is smaller, then pl (xi) value more levels off to 0 closer to 1.
By reference factor pl(xi) as the t times iteration when feature weight vector update probability so that selecting object xiCompared with Update probability when success, on the contrary probability is small, the convergence rate of iterative algorithm can be improved in this way, when feature weight vector w (t) tends to After stabilization, S is weeded outmThe lower feature of middle evaluation magnitude exports SmWith corresponding feature weight vector wopt
Step S403 by weight feedback into the corresponding weight vector of reference characteristic data, and deletes power in feature set It is worth the smallest characteristic.
Step S404 gradually deletes the smallest characteristic of weight, in the number and weight of feature set character pair data The corresponding feature set of characteristic and the preset corresponding weight vector of threshold when vector is stablized, after output is stable.
Wherein, step S401, step S403 and step S404 are described in detail in the first embodiment, This is not repeated them here.
In the present embodiment, the feature correlation and feature redundancy of reference characteristic data are assessed by mutual information;According to spy Sign correlation and feature redundancy obtain the weight of reference characteristic.The convergence rate that iterative algorithm can be improved, makes weight faster Vector tends towards stability, and the corresponding feature set of characteristic and the preset corresponding weight vector of threshold after output is stable solve existing There is in technology abnormality detection technical characteristic extract difficult, low to scale sexual abnormality detection efficiency technical problem, improves pair The efficiency of scale sexual abnormality detection.
Referring to Fig. 5, the function of the device 100 extracted for a kind of behavioural characteristic data that fifth embodiment of the invention provides Module diagram.Applied to computer equipment, the device 100 that behavior characteristic is extracted includes feature set module 110, weight Computing module 120, screening module 130 and characteristic output module 140.The device is mainly used to solve in the prior art Abnormality detection technical characteristic extracts difficult, low to scale sexual abnormality detection efficiency technical problem.
Wherein, which includes but is not limited to mobile phone, mobile phone, smart phone, tablet computer, personal electricity Brain, personal digital assistant, media player, server and other electronic equipments.
Feature set module 110, for extracting characteristic in the user behavior data got, according to characteristic structure Build feature set.
Wherein, the user behavior data got includes the characteristic got under user's normal use and makes in user With the characteristic got under abnormal behaviour, according to the characteristic construction feature collection got.
Optionally, the user behavior data got includes: traffic behavior layer data, agreement behavior layer data, Yi Jiyong Family behavior layer data.
, it will be clear that obtaining traffic behavior layer data, agreement row to promote the accuracy to Network anomaly detection For layer data and user behavior layer data.Wherein, user behavior layer data includes user's login time, search sensitive word, uses Open the data such as webpage in family;Agreement behavior layer data include the data such as IP address, port numbers, IP packet header length, function code;Stream Amount behavior layer data include the data such as SYN packet ratio, IP comentropy, IP correlation, TTL distribution.By to three level uplinks With the characteristics of and the relevant technologies extraction, by detecting to stage construction abnormal behaviour, the running for coordinating each behavior level is closed System, improves the comprehensive and accuracy of abnormality detection.
Weight computing module 120, for selecting a characteristic for benchmark characteristic one by one in feature set, according to pre- If the characteristic weighting in feature set in addition to reference characteristic data is divided into neighborhood and remote domain by distance threshold, pass through mutual trust The weight of breath assessment reference characteristic data.
When it is implemented, selecting a characteristic for benchmark characteristic one by one in feature set, according to preset threshold Characteristic weighting in feature set in addition to reference characteristic data is divided into field and remote domain by value, is specifically included:
Processing is weighted to each characteristic in the feature set got;Such as, according to each in feature set Characteristic is weighted processing to characteristic to the influence degree of network behavior.For example, hacker attacks the malice of buying net Hitting is that corresponding operation is executed by function code, and the corresponding characteristic of selection function code weights x1;For selection IP agreement net Network communication selects protocol identifier to be corresponding characteristic weighting x2, to prevent malicious attack from changing agreement, for source and mesh IP address can disclose the main body of malicious attack, so two IP address of selection make corresponding characteristic weighting x3 and weighting X4, attacker attack the message that may generate deformity using IP agreement, and data length can disclose this variation, select Dendron characteristic weighing x5 of the length as data, may be due to accessing unknown data address when attack access equipment data It shows attack signature, selects the corresponding characteristic of data address to weight x6, obtain weight vector w (t)t→∞={ w1,w2, Λ,wom, om is characterized the number of data.
NeighborhoodIndicate xiWeighted distance at weight vector w (t) is less than the set of all the points of given threshold value:
Wherein, d (xi,xj| w (t)) it is Weighted distance under w (t), r is pre-determined distance threshold value.
Remote domainIndicate xiWeighted distance at weight vector w (t) is greater than the set of all the points of given threshold value:
Further, x can be removed for network data flow is interioriOuter any object xjTag field label cj, it is as follows:
Training data flow point can be divided into object xiOn the basis of neighborhood, remote domain.By the characteristic data flow in feature set After weighting is divided into neighborhood, remote domain, the weight of the reference characteristic data is assessed using mutual information, and then calculate the important journey of feature Degree.
Screening module 130 is used for by weight feedback into the corresponding weight vector of reference characteristic data, and in feature set Delete the smallest characteristic of weight.
When it is implemented, needing to screen characteristic since characteristic amount is big, by weight feedback to benchmark spy It levies in the corresponding weight vector of data, the smallest characteristic of weight is deleted in feature set.
Characteristic output module 140, for gradually deleting the smallest characteristic of weight, in feature set character pair number According to number and weight vector stablize when, output stablize after the corresponding feature set of characteristic and the preset corresponding weight of threshold to Amount.
In the present embodiment, by selecting a characteristic for benchmark characteristic one by one in feature set, according to it is default away from The characteristic weighting in feature set in addition to reference characteristic data is divided into neighborhood and remote domain from threshold value, is commented by mutual information The weight for estimating reference characteristic data by weight feedback into the corresponding weight vector of reference characteristic data, and is deleted in feature set Except the smallest characteristic of weight, gradually delete the smallest characteristic of weight, feature set character pair data number and The corresponding feature set of characteristic and the preset corresponding weight vector of threshold when weight vector is stablized, after output is stable.By right The weight of the characteristic got is assessed, and deletes the too small number of features of weight, it is known that corresponding feature in feature set Data weight tends towards stability, and the corresponding feature set of characteristic and the preset corresponding weight vector of threshold after output is stable solve Abnormality detection technical characteristic extracts difficult, to scale sexual abnormality detection efficiency low technical problem in the prior art, is promoted To the efficiency of scale sexual abnormality detection.
Referring to Fig. 6, the functional module of the device 100 extracted for the behavioural characteristic data that sixth embodiment of the invention provides Schematic diagram.Applied to computer equipment, which includes but is not limited to mobile phone, mobile phone, smart phone, plate electricity Brain, PC, personal digital assistant, media player, server and other electronic equipments.What behavior characteristic was extracted Device 100 includes feature set module 110, weight computing module 120, screening module 130 and characteristic output module 140. On the basis of five embodiments, feature set module 110 includes:
Screening unit 111, for the disturbance degree according to each characteristic to characteristic population entropy, to the spy extracted Sign data are screened, and construct candidate characteristic set by the characteristic after screening.
When it is implemented, the corresponding characteristic characteristic dimension of the feature set actually got is huge, by whole characteristics Entering feature set according to the consideration and carrying out operation can take a substantial amount of time and space.Therefore before weighted feature selection, by commenting Feature is estimated to the influence degree of feature set population entropy, is removed inessential or unrelated attribute, is constructed candidate characteristic set Sm:
The similarity in feature set between any two object is calculated, to obtain the similarity matrix S=of feature set (Sij)n*n, define any two characteristic xi, xjBetween comentropy.
Data flow entirety entropy is the comentropy average value of all characteristics in this section of network data flow, calculation formula are as follows:
To each characteristic S in primitive character collection Sk, removal this feature data S is calculated using formula abovekIt is whole afterwards The incrementss of entropy, the Pre-Evaluation function as characteristic:
preM(sk)=E (S-sk)-E (S) w (t)=[1 ..., 1]
Wherein, preM (sk) value range be [- 1,1].If preM (sk) > 0, then prove that removal this feature data can make spy It collects population entropy to increase, illustrates that it is conducive to cluster process, preM (sk) preM (sk) < 0 when, it was demonstrated that removal this feature Feature set population entropy can be made to reduce, illustrate that this feature is detrimental to the unrelated or noise characteristic data of cluster, preM (sk) value Closer -1, indicate that this feature data are bigger to the negative interaction of cluster.It follows that can be by Pre-Evaluation function to feature set In all characteristics be ranked up, unrelated or noise characteristic data are rejected by the wealthy value of setting, filter out characteristic structure Build candidate characteristic set Sm
Referring to Fig. 7, the functional module of the device 100 extracted for the behavioural characteristic data that seventh embodiment of the invention provides Schematic diagram.Applied to computer equipment, which includes but is not limited to mobile phone, mobile phone, smart phone, plate electricity Brain, PC, personal digital assistant, media player, server and other electronic equipments.What behavior characteristic was extracted Device 100 includes feature set module 110, weight computing module 120, screening module 130 and characteristic output module 140. On the basis of sixth embodiment, feature set module 110 further include:
Characteristic integral unit 112, for the characteristic extracted to be normalized, section registration process.
Optionally, the user behavior data got includes: traffic behavior layer data, agreement behavior layer data, Yi Jiyong Family behavior layer data.
When it is implemented, since the data got include traffic behavior layer data, agreement behavior layer data and user Behavior layer data, characteristic include multiple types.It needs that characteristic is normalized and section registration process, with to obtaining The data got carry out characteristic extraction.It is as follows, normalization and section registration process are illustrated:
Normalized: different server users normal behaviour in same time period has differences, identical service The normal behaviour in device user stage in different times is there is also difference, this is to differentiating that it is difficult that abnormal belt is come.In view of such Situation: although different server different time stage Normal appearances should be different, in the same server in the period, one Whether a user is abnormal only to need to compare with the Normal appearances in the server period.Therefore we change absolute value and are Relative value proposes normalization characteristic processing method.Specific practice is: calculating all user characteristics every day in the same server Average value, obtain average value characteristic time sequenceNormal behaviour benchmark as the server;For with Family u, by its date t feature vectorDivided by the benchmark of this day of server where itThis ratio is exactly after normalizing Numerical value.
Section registration process: the abnormality detection of user behavior includes that horizontal and vertical comparison detects, and laterally detection refers to User to be detected is compared with other users behavior, and longitudinal detect refers to user's current behavior and historical behavior progress Compare detection.Abnormal user is generally not all to show abnormal behaviour always, and the important sampling section to its behavior is abnormal Tendency and abnormal stage of development, but normal users are without this stage.If selected according to method same as abnormal user Sampling interval, then the sampling interval of all normal users is all last T days on data set.It is so formed such a existing As: the feature of normal users is extracted from the stage at the same time, and the feature of abnormal user carrys out self-dispersed time phase.Such as preceding institute It states, time difference meeting classification of disturbance virtually increases the difference of normal users and abnormal user.
For exclusive PCR difference, it is proposed that section registration process: for normal users Ua, we are it same at random The sampling interval of abnormal user a Ub, Ub are selected in one server as abnormal tendency and abnormal stage of development, the sampling of Ua Section is aligned with Ub, chooses same a period of time window.We carry out repeatedly random alignment, and the comparative feature value finally chosen is The average value being repeatedly aligned at random.
Referring to Fig. 8, the functional module of the device 100 extracted for the behavioural characteristic data that eighth embodiment of the invention provides Schematic diagram.Applied to computer equipment, which includes but is not limited to mobile phone, mobile phone, smart phone, plate electricity Brain, PC, personal digital assistant, media player, server and other electronic equipments.What behavior characteristic was extracted Device 100 includes feature set module 110, weight computing module 120, screening module 130 and characteristic output module 140. On the basis of five embodiments, weight computing module 120 includes:
Characteristics unit 121, for assessing the feature correlation and feature redundancy of reference characteristic data by mutual information;
Weight unit 122, for obtaining the weight of reference characteristic according to feature correlation and feature redundancy.
When it is implemented, mutual information is a kind of Information Meter for being used to indicate correlation between two event sets in information theory Amount, is defined as I (X, Y)=H (X)+H (Y)-H (X, Y), and wherein H (X, Y) is combination entropy, is defined as:It can be derived from:
Arbitrary characteristics data S is provided according to mutual information conceptkFeature correlation and feature redundancy definition:
Feature correlation indicates characteristic SkWith the relevance of domain label c, feature correlation is bigger, and feature is more outstanding, It can be indicated with family information between the two, calculation formula are as follows:
D(sk, c) and=I (sk,c)
Wherein c is the domain label that formula provides.
Feature redundancy indicates characteristic SkWith feature set SmThe repeatability of middle other feature, redundancy is smaller, characteristic According to more outstanding, available feature data SkWith feature set SmThe Average Mutual expression of middle other feature, calculation formula are as follows:
Wherein suIt is characterized collection SmIn remove characteristic SkArbitrary characteristics data.
Theoretically an outstanding feature needs to have two conditions: higher feature correlation and lower feature redundancy Degree.Accordingly, the characteristic evaluating factor is provided:
ρj=D (sj,c)-λR(sj)
Wherein, λ is a balance parameters, and the relative intensity of redundancy is cured for controlling feature data dependence.It is found that working as When λ value is 0, redundancy influences to disappear, and feature evaluation maximumlly concentrates in correlation;When λ value is 1, feature is commented Estimate and concentrates in redundancy.Due to requirement of the different feature sets under different clustered demands, for redundancy and correlation It is different, if redundancy is more demanding, it is relatively preferable to cluster efficiency, if correlation requirement is higher, clustering precision is often Preferably, λ facilitates the final Clustering Effect of team to introducing and efficiency is balanced and controls.
In the t times iteration, characteristic evaluating set can be obtainedWherein T is the number of iterations, and w (t) is feature weight vector used by current iteration, due to defaulting each characteristic under original state It is identical for the influence degree of data segmentation, therefore w (1)=[1, Λ, 1, Λ, 1].Indicate j-th of spy under the t times iteration The assessment magnitude of sign.Feature weight vector is updated, w (t+1) can be obtained, renewal function is as follows:
Wherein TmFor preset iteration maximum times,The renewal intensity for being is decayed at any time, guarantees convergence rate.w (t+1) tend towards stability or iterate to maximum times TmWhen, feature selecting can stop and return the result.
It should be noted that iteration requires to randomly select characteristic object x every timei, the randomness of selection leads to neighbour The effect of domain analysis is different.In general, if selected xiNeighborhood in data object it is more, and be distributed it is compacter, then selection get over Success, the reference value of neighbor analysis is just bigger, and in contrast, object is fewer, is distributed sparse neighbor analysis reference value and gets over It is small, or even feature selecting can be guided into the direction of mistake, cause efficiency to decline.Join it can be seen that analyzing each iteration neighborhood It is very necessary for examining the assessment of value.
Accordingly, the neighbor analysis reference factor p of the t times iterationl(xi) it is defined as follows:
Wherein, m xiThe number of data object in neighborhood, δ are to be used to standardize neighborhood distance apart from generalized parameter, Obviously to different data sets, δ is different.It is found that when m value is bigger, and neighborhood object and xiAverage distance it is smaller, then pl (xi) value more levels off to 0 closer to 1.
By reference factor pl(xi) as the t times iteration when feature weight vector update probability so that selecting object xiCompared with Update probability when success, on the contrary probability is small, the convergence rate of iterative algorithm can be improved in this way, when feature weight vector w (t) tends to After stabilization, S is weeded outmThe lower feature of middle evaluation magnitude exports SmWith corresponding feature weight vector wopt
The embodiment of the invention also provides a kind of computer equipments, comprising: processor, memory and communication bus;Communication Bus is for realizing the connection communication between processor and memory;
Processor is used to execute the program that the behavioural characteristic data stored in memory are extracted, to realize following steps:
Step S101 extracts characteristic in the user behavior data got, according to characteristic construction feature collection.
Step S102 selects a characteristic for benchmark characteristic, according to pre-determined distance threshold value one by one in feature set Characteristic weighting in feature set in addition to reference characteristic data is divided into neighborhood and remote domain, benchmark is assessed by mutual information The weight of characteristic
Step S103 by weight feedback into the corresponding weight vector of reference characteristic data, and deletes power in feature set It is worth the smallest characteristic
Step S104 gradually deletes the smallest characteristic of weight, in the number and weight of feature set character pair data The corresponding feature set of characteristic and the preset corresponding weight vector of threshold when vector is stablized, after output is stable.
Optionally, the step of execution can be replaced step S201 to step S205, step S301 to step S304, Yi Jibu Rapid S401 to step S404.
Due to first embodiment into fourth embodiment to behavioural characteristic data extract method implementation process into It has gone detailed description, has been repeated no more in the present embodiment.
Computer equipment includes but is not limited to mobile phone, mobile phone, smart phone, tablet computer, individual in the present embodiment Computer, personal digital assistant, media player, server and other electronic equipments.
The embodiment of the invention also provides a kind of computer readable storage medium, which has Behavioural characteristic data extract method, when behavioural characteristic data extract method by least one processor execute when, cause to A few processor executes following steps:
Step S101 extracts characteristic in the user behavior data got, according to characteristic construction feature collection.
Step S102 selects a characteristic for benchmark characteristic, according to pre-determined distance threshold value one by one in feature set Characteristic weighting in feature set in addition to reference characteristic data is divided into neighborhood and remote domain, benchmark is assessed by mutual information The weight of characteristic
Step S103 by weight feedback into the corresponding weight vector of reference characteristic data, and deletes power in feature set It is worth the smallest characteristic
Step S104 gradually deletes the smallest characteristic of weight, in the number and weight of feature set character pair data The corresponding feature set of characteristic and the preset corresponding weight vector of threshold when vector is stablized, after output is stable.
Optionally, the step of execution can be replaced step S201 to step S205, step S301 to step S304, Yi Jibu Rapid S401 to step S404.
Due to first embodiment into fourth embodiment to behavioural characteristic data extract method implementation process into It has gone detailed description, has been repeated no more in the present embodiment.
The present embodiment computer readable storage medium includes but is not limited to are as follows: ROM, RAM, disk or CD etc..
In conclusion the embodiment of the invention discloses the method and devices that a kind of behavioural characteristic data are extracted, by spy It selects a characteristic for benchmark characteristic in collection one by one, reference characteristic number will be removed in feature set according to pre-determined distance threshold value Outer characteristic weighting accordingly is divided into neighborhood and remote domain, the weight of reference characteristic data is assessed by mutual information, by weight It feeds back in the corresponding weight vector of reference characteristic data, and deletes the smallest characteristic of weight in feature set, gradually delete Except the smallest characteristic of weight, when the number and weight vector of feature set character pair data are stablized, after output is stablized The corresponding feature set of characteristic and the preset corresponding weight vector of threshold.It is commented by the weight to the characteristic got Estimate, delete the too small number of features of weight, it is known that corresponding characteristic weight tends towards stability in feature set, after output is stablized The corresponding feature set of characteristic and the preset corresponding weight vector of threshold, solve abnormality detection technical characteristic in the prior art and mention Difficulty is taken, the technical problem low to scale sexual abnormality detection efficiency improves the efficiency to the detection of scale sexual abnormality.
In embodiment provided herein, it should be understood that disclosed device and method, it can also be by other Mode realize.The apparatus embodiments described above are merely exemplary, for example, the flow chart and block diagram in attached drawing are shown Device, the architectural framework in the cards of method and computer program product, function of multiple embodiments according to the present invention And operation.In this regard, each box in flowchart or block diagram can represent one of a module, section or code Point, a part of the module, section or code includes one or more for implementing the specified logical function executable Instruction.It should also be noted that function marked in the box can also be attached to be different from some implementations as replacement The sequence marked in figure occurs.For example, two continuous boxes can actually be basically executed in parallel, they sometimes may be used To execute in the opposite order, this depends on the function involved.It is also noted that each of block diagram and or flow chart The combination of box in box and block diagram and or flow chart can be based on the defined function of execution or the dedicated of movement The system of hardware is realized, or can be realized using a combination of dedicated hardware and computer instructions.
In addition, each functional module in each embodiment of the present invention can integrate one independent portion of formation together Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
In short, the foregoing is merely illustrative of the preferred embodiments of the present invention, it is not intended to limit the scope of the present invention. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims (10)

1. a kind of method that behavioural characteristic data are extracted, which is characterized in that the described method includes:
Characteristic is extracted in the user behavior data got, according to the characteristic construction feature collection;
Select a characteristic for benchmark characteristic one by one in the feature set, according to pre-determined distance threshold value by the feature The characteristic weighting in addition to reference characteristic data is concentrated to be divided into neighborhood and remote domain, it is special to assess the benchmark by mutual information Levy the weight of data;
By the weight feedback into the corresponding weight vector of the reference characteristic data, and weight is deleted in the feature set The smallest characteristic;
The smallest characteristic of weight is gradually deleted, it is steady in the number of the feature set character pair data and the weight vector Periodically, the corresponding feature set of characteristic and the preset corresponding weight vector of threshold after output is stablized.
2. the method as described in claim 1, which is characterized in that according to the characteristic construction feature collection, comprising:
According to each characteristic to the disturbance degree of characteristic population entropy, the characteristic extracted is sieved Choosing constructs candidate characteristic set by the characteristic after screening.
3. the method as described in claim 1, which is characterized in that the user behavior data got includes: the traffic behavior number of plies According to, agreement behavior layer data and user behavior layer data.
4. method as claimed in claim 3, which is characterized in that before the characteristic construction feature collection, further includes:
The characteristic extracted is normalized, section registration process.
5. the method as described in claim 1, which is characterized in that the weight of the reference characteristic data is assessed by mutual information, Include:
The feature correlation and feature redundancy of the reference characteristic data are assessed by mutual information;
The weight of the reference characteristic is obtained according to the feature correlation and the feature redundancy.
6. the device that a kind of behavioural characteristic data are extracted, which is characterized in that described device includes:
Feature set module is constructed for extracting characteristic in the user behavior data got according to the characteristic Feature set;
Weight computing module, for selecting a characteristic for benchmark characteristic one by one in the feature set, according to default Characteristic weighting in the feature set in addition to reference characteristic data is divided into neighborhood and remote domain by distance threshold, by mutual The weight of reference characteristic data described in information evaluation;
Screening module is used for by the weight feedback into the corresponding weight vector of the reference characteristic data, and in the spy The smallest characteristic of weight is deleted in collection;
Characteristic output module, for gradually deleting the smallest characteristic of weight, in the feature set character pair data Number and the weight vector stablize when, output stablize after the corresponding feature set of characteristic and the preset corresponding weight of threshold Vector.
7. device as claimed in claim 6, which is characterized in that the feature set module includes:
Screening unit, for the disturbance degree according to each characteristic to characteristic population entropy, described in extracting Characteristic is screened, and constructs candidate characteristic set by the characteristic after screening.
8. device as claimed in claim 6, which is characterized in that the user behavior data got includes: the traffic behavior number of plies According to, agreement behavior layer data and user behavior layer data.
9. device as claimed in claim 8, which is characterized in that the feature set module further include:
Characteristic integral unit, for the characteristic extracted to be normalized, section registration process.
10. device as described in claim 1, which is characterized in that weight computing module includes:
Characteristics unit, for assessing the feature correlation and feature redundancy of the reference characteristic data by mutual information;
Weight unit, for obtaining the weight of the reference characteristic according to the feature correlation and the feature redundancy.
CN201810576742.7A 2018-06-05 2018-06-05 A kind of method and device that behavioural characteristic data are extracted Pending CN109063721A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810576742.7A CN109063721A (en) 2018-06-05 2018-06-05 A kind of method and device that behavioural characteristic data are extracted

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810576742.7A CN109063721A (en) 2018-06-05 2018-06-05 A kind of method and device that behavioural characteristic data are extracted

Publications (1)

Publication Number Publication Date
CN109063721A true CN109063721A (en) 2018-12-21

Family

ID=64820507

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810576742.7A Pending CN109063721A (en) 2018-06-05 2018-06-05 A kind of method and device that behavioural characteristic data are extracted

Country Status (1)

Country Link
CN (1) CN109063721A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459990A (en) * 2020-03-31 2020-07-28 腾讯科技(深圳)有限公司 Object processing method, system, computer readable storage medium and computer device
CN111724278A (en) * 2020-06-11 2020-09-29 国网吉林省电力有限公司 Fine classification method and system for power multi-load users

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040225627A1 (en) * 1999-10-25 2004-11-11 Visa International Service Association, A Delaware Corporation Synthesis of anomalous data to create artificial feature sets and use of same in computer network intrusion detection systems
US20170134404A1 (en) * 2015-11-06 2017-05-11 Cisco Technology, Inc. Hierarchical feature extraction for malware classification in network traffic
CN106973047A (en) * 2017-03-16 2017-07-21 北京匡恩网络科技有限责任公司 A kind of anomalous traffic detection method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040225627A1 (en) * 1999-10-25 2004-11-11 Visa International Service Association, A Delaware Corporation Synthesis of anomalous data to create artificial feature sets and use of same in computer network intrusion detection systems
US20170134404A1 (en) * 2015-11-06 2017-05-11 Cisco Technology, Inc. Hierarchical feature extraction for malware classification in network traffic
CN106973047A (en) * 2017-03-16 2017-07-21 北京匡恩网络科技有限责任公司 A kind of anomalous traffic detection method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘帅: "面向网络数据流的多层面异常行为分析检测技术研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑(月刊)》 *
过岩巍等: "网络游戏案例研究:用户行为分析和流失预测", 《中文信息学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459990A (en) * 2020-03-31 2020-07-28 腾讯科技(深圳)有限公司 Object processing method, system, computer readable storage medium and computer device
CN111724278A (en) * 2020-06-11 2020-09-29 国网吉林省电力有限公司 Fine classification method and system for power multi-load users

Similar Documents

Publication Publication Date Title
Cai et al. Structural temporal graph neural networks for anomaly detection in dynamic graphs
EP3477906B1 (en) Systems and methods for identifying and mitigating outlier network activity
Imran et al. An intelligent and efficient network intrusion detection system using deep learning
Song et al. Toward a more practical unsupervised anomaly detection system
Ali Alheeti et al. Intelligent intrusion detection in external communication systems for autonomous vehicles
Selvarajan et al. Mining of intrusion attack in SCADA network using clustering and genetically seeded flora‐based optimal classification algorithm
US20150188941A1 (en) Method and system for predicting victim users and detecting fake user accounts in online social networks
WO2019175880A1 (en) Method and system for classifying data objects based on their network footprint
CN103189836A (en) Method for classification of objects in a graph data stream
CN113094707B (en) Lateral movement attack detection method and system based on heterogeneous graph network
CN109753797B (en) Dense subgraph detection method and system for stream graph
Podder et al. Artificial neural network for cybersecurity: A comprehensive review
CN109325232A (en) A kind of user behavior exception analysis method, system and storage medium based on LDA
Wang et al. Network traffic analysis over clustering-based collective anomaly detection
Jalali et al. Social network sampling using spanning trees
CN112685272B (en) Interpretable user behavior abnormity detection method
Wang et al. Detecting shilling groups in online recommender systems based on graph convolutional network
CN108961019A (en) A kind of detection method and device of user account
CN114124460A (en) Industrial control system intrusion detection method and device, computer equipment and storage medium
CN113609394A (en) Information flow-oriented safety recommendation system
CN117172875A (en) Fraud detection method, apparatus, device and storage medium
Li et al. Anomaly detection by discovering bipartite structure on complex networks
Wang et al. Phishing scams detection via temporal graph attention network in Ethereum
CN109063721A (en) A kind of method and device that behavioural characteristic data are extracted
Liu et al. Heterogeneous graphs neural networks based on neighbor relationship filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221