CN106713335A - Malicious software identification method and device - Google Patents

Malicious software identification method and device Download PDF

Info

Publication number
CN106713335A
CN106713335A CN201611265807.3A CN201611265807A CN106713335A CN 106713335 A CN106713335 A CN 106713335A CN 201611265807 A CN201611265807 A CN 201611265807A CN 106713335 A CN106713335 A CN 106713335A
Authority
CN
China
Prior art keywords
url
malware
characteristic dimension
software
family
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611265807.3A
Other languages
Chinese (zh)
Other versions
CN106713335B (en
Inventor
於大维
董浩
谢军
陆骋怀
尚进
蒋东毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hillstone Networks Co Ltd
Original Assignee
Hillstone Networks Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hillstone Networks Co Ltd filed Critical Hillstone Networks Co Ltd
Priority to CN201611265807.3A priority Critical patent/CN106713335B/en
Publication of CN106713335A publication Critical patent/CN106713335A/en
Application granted granted Critical
Publication of CN106713335B publication Critical patent/CN106713335B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/144Detection or countermeasures against botnets

Abstract

The present invention discloses a malicious software identification method and device. Through adoption of the technical scheme, the method comprises: collecting lots of malicious software network behaviors in advance, extracting URL features from the network behaviors, collecting URL features of the assigned software network behaviors to be subjected to safety detection after a predefined rule is established, comparing the URL features of the two, and determining whether the assigned software to be detected is malicious software or not. through adoption of the technical scheme, the malicious software can be rapidly and accurately identified so as to solve the technical problem that the malicious software cannot be rapidly and accurately identified in the prior art.

Description

The recognition methods of Malware and device
Technical field
The present invention relates to single-chip microcomputer field, recognition methods and device in particular to a kind of Malware.
Background technology
In the related art, Malware refers to virus, worm and the Te Luo for performing malice task on the computer systems The program of her wooden horse, implements to control infected main frame by destroying software process.Malware is made up of various threats, is felt The main frame of dye is often controlled by the order control server of hacker, forms Botnet (English name is BotNet), corpse Network is that the central controlled a group computer of hacker is subject on internet, is often used by a hacker to initiate large-scale network to attack Hit, such as distributed denial of service attack (DDoS), magnanimity spam, while what these computers of hacker's control were preserved Information also all can be obtained by a hacker.Therefore, whether for the protection of Cybersecurity Operation or secure user data, corpse Network is all the hidden danger of great threat.
Broken through by hacker, planted the computer of trojan horse, hacker can arbitrarily manipulate it and do anything using it, Just as puppet.The core of Botnet is order and control (Command&Control) mechanism.Compromised slave and hacker it Between to exist be not the communication channel known to host subscriber.Hacker is sent to compromised slave by this channel and ordered, and uploads text Part, launch a offensive etc..The communication device of C&C is formed with various, and HTTP is main communications protocol.
The analysis of the network behavior based on a large amount of Malwares, it is controlled to recognize that technical staff can set up effective model The communication of main frame and Botnet, so as to find the main frame being infected by malware.Infection main frame is found by network behavior, can To be isolated in time, cleaning reduces the loss for threatening and bringing.This is a challenging field, has multiple technologies to use In this problem of solution.
In the related art, about following two modes solve above-mentioned problem:
The first is to set up the black list database that C&C connects network address.Using URL (Uniform Resource Locator, referred to as URL) filtering method control and report main frame networking use, felt so as to detect The main frame of dye.The technology sets up the tagged word phase library of known C&C connections.By determining with the actual matching for being connected network address parameter Whether it is C&C connections.The advantage of the technology is simple, and precisely, rate of false alarm is low.But, first way has the following disadvantages:It is special It is static feature to levy field storehouse, and the slight variations of the network address parameter for connecting just are felt simply helpless.If the tagged word of whole Section is all collected into, and storehouse can become very big and efficiency can be than relatively low.Feature database needs timely to update to ensure that its is ageing.It is special Although levying network behavior of the field from Malware sample, not all connection is all malice connection.Some connect Connect feature and normal harmless connection very close to.So need that feature is screened and selected, to reduce wrong report.Therefore safeguard With update feature database workload than larger.
Second is by the network behavior and normal harmless network behavior of Malware sample, using supervision machine The method of habit sets up model.On the basis of substantial amounts of positive negative sample is collected, sample is marked.Then supervision machine is used The method of study, such as returns, and the method such as random forest sets up model.Model can make a decision to the network address parameter for connecting.The skill Art uses machine learning, with certain dynamic adaptable.To not meeting, but Malware with similitude has necessarily Recognition capability.But there is problems with this kind of technical approach:1), the model of effective supervision machine study needs substantial amounts of Representative normal sample is learnt.Because normal sample quantity is excessively huge, form of diverse and it is complicated, it is difficult to have complete Face and representational sampling.And malice number of samples is less than normal, the model being built such that has larger wrong report.2), supervision machine Study is very high to the accuracy requirement of sample label.If positivity sample has inaccurate label, machine learning can be had a strong impact on very much The accuracy of model.
For the technical problem in correlation technique, lacking convenient effective identification Malware, there is presently no effective Solution.
The content of the invention
Recognition methods and the device of a kind of Malware are the embodiment of the invention provides, at least to solve to lack in correlation technique The technical problem of weary convenient effective identification Malware.
According to one embodiment of present invention, there is provided a kind of recognition methods of Malware, including:Obtain and specify soft The corresponding uniform resource position mark URL of network behavior that part is operationally produced;According to preset rules and the feature dimensions of the URL Degree determines whether the designated software is Malware, wherein, the preset rules be according to produced by multiple Malwares The corresponding URL of network behavior characteristic dimension determine.
Alternatively, the preset rules are:In the designated software and the characteristic dimension of Malware URL therebetween Similarity more than in the case of predetermined threshold value, determining the designated software for Malware, wherein, the URL is that software is produced The characteristic dimension of raw network behavior corresponding URL, the URL of the Malware is to specify all malice in Malware family The characteristic dimension of the URL that software has jointly.
Alternatively, the similarity of the characteristic dimension of the URL between the member in the specified malice family is higher than preset value, The specified malice family is the set of default Malware, wherein, the network behavior of software is obtained in the following manner The characteristic dimension of URL:Obtain the respective network behavior of the multiple software, and parse acquisition each described network behavior URL;The parameter of the URL is split according to key-value pair, multiple parameters segmentation is obtained, then the Parameter Subsection of the key-value pair is assigned Value is mapped to after n-dimensional space, obtains the integer vectors of the URL, wherein, the integer vectors of the URL are the spy of the URL Dimension is levied, wherein, the Parameter Subsection is the coding of key-value pair, and the number of dimensions in the n representative features space, the n is integer.
Alternatively, it is similar to the characteristic dimension of the URL in the Malware in group cluster in the designated software Degree determines that the designated software is the Malware in the group cluster more than in the case of predetermined threshold value, wherein, institute It is the group cluster in the specified Malware family to state group cluster, and the group cluster is by with lower section Formula is obtained:Malware in the specified Malware family is divided into multiple by way of the characteristic dimension of the URL that is polymerized Group cluster, wherein, in the case of there are multiple Malware families in the multiple Malware, by different malice Similarity in software family group cluster high merges into specified group cluster.
Alternatively, by the Malware in the specified Malware family by way of the characteristic dimension of the URL that is polymerized It is divided into after multiple group cluster, determines the classification of the network behavior of Malware in the group cluster, according to institute The classification that classification determines the group cluster is stated, wherein, the classification includes one below:C&C is connected, file download, extensively Accuse and click on.
Alternatively, the Malware in the specified Malware family is regularly updated.
According to another embodiment of the invention, a kind of identifying device of Malware is additionally provided, the device includes:
Acquisition module, for obtaining URL corresponding with the network behavior that designated software is operationally produced URL;
Determining module, determines whether the designated software is evil for the characteristic dimension according to preset rules and the URL Meaning software, wherein, the preset rules are according to the feature dimensions of URL corresponding with the network behavior produced by multiple Malwares What degree determined.
Alternatively, the determining module is additionally operable to the feature in the designated software and Malware URL therebetween The similarity of dimension is Malware more than the designated software in the case of predetermined threshold value, is determined, wherein, the URL is soft The characteristic dimension of network behavior corresponding URL, the URL of the Malware that part is produced is all in specified Malware family The characteristic dimension of the URL that Malware has jointly.
Alternatively, the similarity of the characteristic dimension of the URL between the member in the specified malice family is higher than preset value, The specified malice family is the set of default Malware, and the acquisition module is additionally operable to obtain software in the following manner Network behavior URL characteristic dimension:Obtain multiple respective network behaviors of software, and parse acquisition each described network row URL in;The parameter of the URL is split according to key-value pair, multiple parameters segmentation is obtained, then by the parameter of the key-value pair Segmentation evaluation mapping obtains the integer vectors of the URL to after n-dimensional space, wherein, the integer vectors of the URL are described The characteristic dimension of URL, wherein, the Parameter Subsection is the coding of key-value pair, the number of dimensions in the n representative features space, the n It is integer.
Alternatively, the determining module is additionally operable in the Malware in the designated software and group cluster In the case that the similarity of the characteristic dimension of URL is more than predetermined threshold value, determine the designated software in the group cluster Malware, wherein, the group cluster is the group cluster in the specified Malware family, the group Cluster is obtained in the following manner:The feature dimensions that Malware in the specified Malware family is passed through into the URL that is polymerized The mode of degree is divided into multiple group cluster, wherein, there are the feelings of multiple Malware families in the multiple Malware Under condition, the group cluster high of the similarity in different Malware families is merged into specified group cluster.
In embodiments of the present invention, the network behavior of a large amount of Malwares is gathered in advance, URL is extracted from network behavior special Levy, after rule in advance is established, gather the URL features of the network behavior of safe designated software to be detected, by the two URL features are compared, to determine whether designated software to be detected is Malware.Using above-mentioned technical proposal, realize Rapid accurately identification Malware, and then solve and lack the technology of convenient effective identification Malware in correlation technique and ask Topic.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, this hair Bright schematic description and description does not constitute inappropriate limitation of the present invention for explaining the present invention.In the accompanying drawings:
Fig. 1 is a kind of flow chart of the recognition methods of Malware according to embodiments of the present invention;
Fig. 2 is the flow chart for setting up detection malice model according to the preferred embodiment of the invention;
Fig. 3 is a kind of structured flowchart of the identifying device of the Malware in the preferred embodiment of the present invention.
Specific embodiment
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that described embodiment is only The embodiment of a part of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people The every other embodiment that member is obtained under the premise of creative work is not made, should all belong to the model of present invention protection Enclose.
It should be noted that term " first ", " in description and claims of this specification and above-mentioned accompanying drawing Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so using Data can exchange in the appropriate case, so as to embodiments of the invention described herein can with except illustrating herein or Order beyond those of description is implemented.Additionally, term " comprising " and " having " and their any deformation, it is intended that cover Lid is non-exclusive to be included, for example, the process, method, system, product or the equipment that contain series of steps or unit are not necessarily limited to Those steps or unit clearly listed, but may include not list clearly or for these processes, method, product Or other intrinsic steps of equipment or unit.
Embodiment 1
According to embodiments of the present invention, there is provided a kind of recognition methods embodiment of Malware, it is necessary to explanation, attached The step of flow of figure is illustrated can perform in the such as one group computer system of computer executable instructions, though also, So logical order is shown in flow charts, but in some cases, can be with shown different from order execution herein Or the step of description.
Fig. 1 is a kind of flow chart of the recognition methods of Malware according to embodiments of the present invention, as shown in figure 1, the party Method includes:
Step S102, obtains uniform resource position mark URL corresponding with the network behavior that designated software is operationally produced;
Step S104, the characteristic dimension according to preset rules and the URL determines whether the designated software is Malware, its In, the preset rules are determined according to the characteristic dimension of URL corresponding with the network behavior produced by multiple Malwares.
By above-mentioned steps, the network behavior of a large amount of Malwares is gathered in advance, URL features are extracted from network behavior, After rule in advance is established, the URL features of the network behavior of safe designated software to be detected are gathered, by the URL of the two Feature is compared, to determine whether designated software to be detected is Malware.Using above-mentioned technical proposal, realize rapid Accurate identification Malware, and then solve the technical problem for lacking convenient effective identification Malware in correlation technique.
Alternatively, the preset rules are:In the phase of the characteristic dimension of URL of the designated software with Malware therebetween In the case of being more than predetermined threshold value like degree, the designated software is determined for Malware, wherein, the URL is the network that software is produced The characteristic dimension of behavior corresponding URL, the URL of the Malware is that all Malwares have jointly in specifying Malware family The characteristic dimension of some URL.You need to add is that, the characteristic dimension of URL (also has for an integer vectors in follow-up specification Related explanation), two modes of the similarity of the characteristic dimension of URL are calculated, can calculate in correlation technique between vector Similarity technical approach.Judge whether close method is range formula to vector.Range formula can be specific by what is applied Characteristic is selected.Selected range formula will be consistent in all calculation procedures.
Alternatively, the similarity of the characteristic dimension of the URL between the member in the specified malice family is higher than preset value, should The set that malice family is default Malware is specified, wherein, obtain the URL's of the network behavior of software in the following manner Characteristic dimension:The respective network behavior of the plurality of software is obtained, and parses the URL obtained in each network behavior;According to key Value obtains multiple parameters segmentation to splitting the parameter of the URL, then by the Parameter Subsection evaluation mapping of the key-value pair to n dimension skies Between after, obtain the integer vectors of the URL, wherein, the integer vectors of the URL are the characteristic dimension of the URL, wherein, the parameter The coding of key-value pair is segmented into, the number of dimensions in the n representative features space, the n is integer.
Alternatively, the similarity of the characteristic dimension of the URL in the Malware in the designated software and group cluster In the case of more than predetermined threshold value, determine that the designated software is the Malware in group cluster, wherein, the group Cluster is the group cluster in the specified Malware family, and group cluster is obtained in the following manner:Should The Malware in Malware family is specified to be divided into multiple group cluster by way of the characteristic dimension of the URL that is polymerized, its In, in the case of there are multiple Malware families in the plurality of Malware, will be similar in different Malware families Degree group cluster high merges into specified group cluster.
Alternatively, the Malware in the specified Malware family is divided by way of the characteristic dimension of the URL that is polymerized Into after multiple group cluster, the classification of the network behavior of Malware in group cluster is determined, according to the category Determine the classification of group cluster, wherein, the category includes one below:C&C is connected, file download, ad click.Need To be supplemented, the classification of above-mentioned group is not limited to the example above.
Alternatively, the Malware in the specified Malware family is regularly updated.The source of renewal can include new Malware sample, the feedback of equipment testing result, and to the self adaptation of deployed with devices environment.
Below in conjunction with a preferred embodiment of the present invention will be described in detail.
The preferred embodiment of the present invention is based on the network behavior feature of Malware family, using novel method to network connection The parameter of URL carries out characteristic dimension extraction.The dimension of extraction uses polymerization (clustering) method of non-supervisory machine learning Accurate feature is extracted, incorrect sample and noise is effectively removed, the model of generation has accuracy rate very high and relatively low Rate of false alarm.
The principle of this method is to make full use of Malware to develop some intrinsic propestieses for evolving.In the related art, dislike One of essence of meaning software is exactly the reuse of code module.Many Malwares are enterprising on existing Malware basis The malice mutation that row is changed and produced.The source code of some Malwares has dealing to circulate in underground black market.These factors are caused Maliciously the network address feature of connection has in same family's mutation of Malware and largely remains.In the same of general character reservation When also have it is a certain degree of drift mutation.One efficient detection method of reliability needs that common similitude can be extracted, and goes Unless the noise of general character.
Between the family of different Malwares, due to the multiplexing of module, its some network connection behavior and other families it Between can also there is very big similitude.The similar features across family's Malware are recognized, being merged into a feature contributes to Reduce feature recognition model, improve detection efficiency.
Using the technical scheme in the preferred embodiments of the present invention, following three points technical problem is mainly solved:
1st, the malice connection and non-malicious connection in Malware sample are distinguished:The network row of a large amount of Malwares collected To be the basis of our analysis modelings.The network behavior of sample is entirely not malice connection, wherein there are many normal software In there is also.Judge that a connection is that malice connection is a technical barrier.
2nd, effective Feature Engineering:The evolutionary process of Malware produces substantial amounts of mutation, and this allows the extraction of identification feature Tool acquires a certain degree of difficulty.This problem is solved, characteristic dimension and matching process need certain ambiguity.Not only to recognize existing There is known sample, identification is also possible to unknown mutation.
3rd, the machine learning modeling technique of optimum sample is not relied on.This method is fewer to optimum sample demand, benign Sample is only used for doing model cleaning reduction rate of false alarm.
Fig. 2 be it is according to the preferred embodiment of the invention set up detection malice model flow chart, as shown in Fig. 2 including with Lower step:
Step one, Malware runs after collecting classification in Sandboxing, and collects its network behavior.
Step 2, the network behavior to various protocols is parsed, and extracts the URL features of http protocol.
Step 3, to splitting, each key-value pair is to be encoded to n (n is a fixed value) integer to the parameter key value of URL Value.One URL can use this method migration an into integer vectors.This vector is the characteristic dimension of URL.This mathematics The characteristic dimension quantity of change is fixed.The step for be dimension extract and dimensionality reduction process.
Step 4 a, characteristic dimension of all URL of Malware family is divided into multiple with the method for polymerization together Similar group.The sample URL for being unsatisfactory for polymerizing condition can be taken as noise removal.Each group is exactly this Malware man One characteristic signature of race.One malice family can have one to multiple groups feature.Different from traditional word string match party Method, the calculating of the similitude that is polymerized uses the distinctive computing formula of the preferred embodiment of the present invention, can effectively reflect URL parameter Physical meaning.
The condition of polymerization is controlled, can be the malice URL for really having general character and the false evil without statistically significance Meaning URL is distinguished.During renewal afterwards, with the increase of sample, these false malice URL also due to new sample This addition and turn into significant true malice URL.
Step 5, the group (clusters) of multiple Malware families carries out across family polymerization together.Different families Similar clusters can be merged into a cluster.Malware module is exactly reflected across the general character cluster of family The characteristics of change and code reuse.
Step 6, in order to reduce wrong report, is connected with substantial amounts of normal discharge to model and cleaned.In this step, we Using same similarity calculation method.The cluster similar to benign connection can be removed from model.This step is also right Cluster carries out key words sorting.Some malice are connected, by our analysis and research, it may be determined that the type of connection, such as C& C is connected, file download, ad click, etc..The method that the URL of these determinations classification can be matched with same similitude will Tag along sort is applied in cluster.Malice connection is so matched to cluster, we can provide its malice and connect Particular type, allow user to have more cognitive.
Step 7, the mathematical feature model of the cluster of step 6 generation can be issued to the intelligent fire-proofing wall equipment of deployment On.Firewall box is checked the HTTP URL of customer flow using model.The malice connection for detecting can generate threat Event shows on firewall box to user.The meta data of inspection result can upload to high in the clouds and be further analyzed simultaneously.
Step 8, model modification.This method can constantly update the sample of Malware, repeat at periodic or other desired step one to six, build Vertical new model.High in the clouds can also do big data analysis to the detection data that all devices are uploaded simultaneously, according to the result pair of feedback Model is modified.The model of renewal is issued in equipment again upgrades.
The model of preferred embodiment of the present invention generation has small volume, calculates rapid, and the degree of accuracy is high, Malware coverage rate High the features such as.Preliminary data shows that applicant extracts and set up about 9,000 from more than 100 ten thousand Malware samples The model of cluster.The model can detect 85% known malicious sample, and the coverage rate of malice family reaches more than 90%. It is simultaneously low to the rate of false alarm of optimum sample to can ignore.Importantly, the method for the preferred embodiment of the present invention is to unknown The verification and measurement ratio of Malware mutation is also very high, such that it is able to prevent trouble before it happens.
The following is the specific embodiment of the preferred embodiment of the present invention.
The preferred embodiment of the present invention is that a kind of novel method carries out big data analysis to the URL that Malware is connected, and is extracted The general character of malice connection is so as to founding mathematical models.The model that this method is set up has small volume, and computation complexity is small, and detection is high The features such as imitating quick.
Using the method for the preferred embodiment of the present invention, user needs to collect the network row of substantial amounts of Malware sample first For.The common practice is to run Malware under sandbox environment, while capturing network message.Malware would generally be by malice Family classifies, and under same malice family there are different mutation.We are grouped malice connection by family, including same family Under mutation.The general character of same family's difference mutation can so be extracted.Here is an example, in a family Trojan Below [Rootkit]/Win32.Small, there are two malice to connect.(actual data, might have below a family certainly Tens thousand of connections)
Family:Trojan[Rootkit]/Win32.Small
First malice is connected:http://domain.com/connUser=joe&ver=2.0&key=123abc
Second malice is connected:http://cc.domain/connpathKey=123456&user=jane&ver= 3.5
Connected to first, extract three pairs of parameters:User=joe, ver=2.0, key=123abc
Second is connected, three pairs of parameters are extracted:Key=12345, user=jane, ver=3.5
(parameter is the form of key=value)
For actual data, parameter key-value pairs up to a million may be extracted.From these parameter centerings, it is necessary to carry out spy Levy the extraction of dimension.The extraction of feature will reflect the characteristic of malice connection change.To this parameter of ver=2.0, it is converted into Ver=2.0, ver=numerical (2.0 types), string (type of ver)=2.0.Then each conversion values is reflected It is mapped to a confined space of fixed dimension n up.Assuming that n is 100, four values are mapped as (5,32,91,99).Specifically reflect Penetrating formula can be determined by practical application.
The vector of network address → n-dimensional space is connected, is herein the method extracted to characteristic dimension and reduce number of dimensions, be this One of key problem in technology point of invention preferred embodiment.Its function is that ten million parameters up to a hundred are converted in order to big data Machine learning.
Assuming that there be M connection network address in a malice family, we can obtain a dimensional matrix of M x N.To dimension Matrix carries out clustering polymerizations, and close vector is merged into one cluster of generation.So can be by the square of MxN Battle array is reduced to p clusters.One Cluster is represented by the n-dimensional vector of its central point.Adjustment polymerization parameter, p can be controlled In a smaller and effective scope.
Judge whether close method is range formula to vector.Range formula can be selected by the concrete property applied. Selected range formula will be consistent in all calculation procedures.
P n dimensions clusters is the Mathematical Modeling of threat detection.The connection unknown to one, http:// somewhere/connectionUser=mike&version=2.3, we can be carried with same method to its Connecting quantity Take dimension and digital vectors.Vector to be detected enters row distance calculating with model clusters, meets distance threshold requirement It is judged as malice connection.
The detection model that the preferred embodiment of the present invention is provided can be deployed in the outlet fire wall of corporate networks, in enterprise The external connection of host's machine is detected.The preferred embodiment of the present invention can detect intranet host and download malice from malicious websites Software, plug-in unit etc.;May also detect that the request that infected main frame sends to Botnet control centre.The result of detection can To remind corporate IT department to be further analyzed suspicious main frame, such as anti-virus scan.Such as make a definite diagnosis, can further take protection Measure, is isolated.Fire wall can also configure some automatically strategy harm carried out to suspicious main frame slow down, such as limitation network Connection, prevents file unofficial biography, etc., to avoid wooden horse, the harm that virus is brought.
The unique new cluster technologies of the preferred embodiment of the present invention cause that the feedback of model and renewal become simple easy OK.User can be with the wrong report of identification model, so that similar URL is no longer reported.The model modification in high in the clouds can also Soon the feedback of user is issued in more equipment.
In the related art, Advanced threat is various with its, and the characteristic for continuing and being difficult to detect is complete to the information of enterprise Bring significant challenge.Technical scheme in the preferred embodiment of the present invention, can provide protection for enterprise, government, existing anti- The compromised slave being infected by malware effectively is detected within the time as short as possible on the basis of wall with flues.Enterprise is allowed to exist Before harm occurs, adopt an effective measure, reducing each side with this loses.
Using the technical scheme in the preferred embodiment of the present invention, following effect is realized:
1st, correct effective malice network address connection is extracted from the network behavior of all Malware samples, removes benign making an uproar Sound, sets up detection model rapidly and efficiently.Detection model can effectively adapt to Malware using the matching process of similarity The mutation of network connection, help detects unknown mutation.
2nd, detection method is rapidly and efficiently.The preferred embodiment of the present invention employs new conversion formula, will whether there is in theory Limit possible parameter word string and be converted into the mathematics dimension of fixed number, so as to effectively solve the problems, such as dimension spell so that be fast Fast efficient detection is possibly realized.
Embodiment 2
Fig. 3 is a kind of structured flowchart of the identifying device of the Malware in the preferred embodiment of the present invention, such as Fig. 3 institutes Show, the device includes:
Acquisition module 32, for obtaining unified resource positioning corresponding with the network behavior that designated software is operationally produced Symbol URL;
Determining module 34, is connected with the acquisition module 32, and determining for the characteristic dimension according to preset rules and the URL should Whether designated software is Malware, wherein, the preset rules are foundation and the network behavior pair produced by multiple Malwares What the characteristic dimension of the URL for answering determined.
Alternatively, the determining module 34 is additionally operable to the feature dimensions in the designated software and Malware URL therebetween The similarity of degree is Malware more than the designated software in the case of predetermined threshold value, is determined, wherein, the URL is that software is produced Network behavior corresponding URL, the URL of the Malware characteristic dimension be specify Malware family in all Malwares The characteristic dimension of the URL having jointly.
Alternatively, the similarity of the characteristic dimension of the URL between the member in the specified malice family is higher than preset value, should The set that malice family is default Malware is specified, the acquisition module 32 is additionally operable to obtain in the following manner the net of software The characteristic dimension of the URL of network behavior:Obtain the respective network behavior of the plurality of Malware, and parse acquisition each network row URL in;The parameter of the URL is split according to key-value pair, multiple parameters segmentation is obtained, then by the Parameter Subsection of the key-value pair Evaluation mapping obtains the integer vectors of the URL to after n-dimensional space, wherein, the integer vectors of the URL are the feature dimensions of the URL Degree, wherein, the Parameter Subsection is the coding of key-value pair, and the number of dimensions in the n representative features space, the n is integer.
Alternatively, the determining module 34 is additionally operable to the URL in the Malware in the designated software and group cluster Characteristic dimension similarity more than in the case of predetermined threshold value, determine that the designated software is the malice in group cluster Software, wherein, group cluster is the group cluster in the specified Malware family, and group cluster passes through In the following manner is obtained:Malware in the specified Malware family is divided into by way of the characteristic dimension of the URL that is polymerized Multiple group cluster, wherein, in the case of there are multiple Malware families in the plurality of Malware, by different evils Similarity in meaning software family group cluster high merges into specified group cluster.
The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.
In the above embodiment of the present invention, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment The part of detailed description, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, can be by other Mode is realized.Wherein, device embodiment described above is only schematical, such as division of described unit, Ke Yiwei A kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can combine or Person is desirably integrated into another system, or some features can be ignored, or does not perform.Another, shown or discussed is mutual Between coupling or direct-coupling or communication connection can be the INDIRECT COUPLING or communication link of unit or module by some interfaces Connect, can be electrical or other forms.
The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On unit.Some or all of unit therein can be according to the actual needs selected to realize the purpose of this embodiment scheme.
In addition, during each functional unit in each embodiment of the invention can be integrated in a processing unit, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is to realize in the form of SFU software functional unit and as independent production marketing or use When, can store in a computer read/write memory medium.Based on such understanding, technical scheme is substantially The part for being contributed to prior art in other words or all or part of the technical scheme can be in the form of software products Embody, the computer software product is stored in a storage medium, including some instructions are used to so that a computer Equipment (can be personal computer, server or network equipment etc.) perform each embodiment methods described of the invention whole or Part steps.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. are various can be with store program codes Medium.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (10)

1. a kind of recognition methods of Malware, it is characterised in that including:
Obtain uniform resource position mark URL corresponding with the network behavior that designated software is operationally produced;
Characteristic dimension according to preset rules and the URL determines whether the designated software is Malware, wherein, it is described pre- If rule is determined according to the characteristic dimension of URL corresponding with the network behavior produced by multiple Malwares.
2. method according to claim 1, it is characterised in that the preset rules are:In the designated software and malice The similarity of the characteristic dimension of software URL therebetween is evil more than the designated software in the case of predetermined threshold value, is determined Meaning software, wherein, the URL is the corresponding URL of network behavior that software is produced, the characteristic dimension of the URL of the Malware It is the characteristic dimension of the URL that all Malwares in specified Malware family have jointly.
3. method according to claim 2, it is characterised in that the URL's between member in the specified malice family The similarity of characteristic dimension be higher than preset value, the specified malice family is the set of default Malware, wherein, by with Under type obtains the characteristic dimension of the URL of the network behavior of software:
The respective network behavior of multiple softwares is obtained, and parses the URL obtained in each described network behavior;
The parameter of the URL is split according to key-value pair, multiple parameters segmentation is obtained, then the Parameter Subsection of the key-value pair is assigned Value is mapped to after n-dimensional space, obtains the integer vectors of the URL, wherein, the integer vectors of the URL are the spy of the URL Dimension is levied, wherein, the Parameter Subsection is the coding of key-value pair, and the number of dimensions in the n representative features space, the n is integer.
4. according to the method in claim 2 or 3, it is characterised in that the evil in the designated software and group cluster The similarity of the characteristic dimension of the URL in meaning software is the race more than the designated software in the case of predetermined threshold value, is determined Malware in group cluster, wherein, the group cluster is the group in the specified Malware family Cluster, the group cluster is obtained in the following manner:
Malware in the specified Malware family is divided into multiple groups by way of the characteristic dimension of the URL that is polymerized Cluster, wherein, in the case of there are multiple Malware families in the multiple Malware, by different Malwares Similarity in family group cluster high merges into specified group cluster.
5. method according to claim 4, it is characterised in that lead to the Malware in the specified Malware family The mode for crossing the characteristic dimension of polymerization URL is divided into after multiple group cluster, and malice is soft in determining the group cluster The classification of the network behavior of part, the classification of the group cluster is determined according to the classification, wherein, the classification include with It is one of lower:C&C is connected, file download, ad click.
6. method according to claim 3, it is characterised in that methods described also includes:Regularly update the specified malice Malware in software family.
7. a kind of identifying device of Malware, it is characterised in that the device includes:
Acquisition module, for obtaining uniform resource position mark URL corresponding with the network behavior that designated software is operationally produced;
Determining module, determines whether the designated software is that malice is soft for the characteristic dimension according to preset rules and the URL Part, wherein, the preset rules are true according to the characteristic dimension of URL corresponding with the network behavior produced by multiple Malwares Fixed.
8. device according to claim 7, it is characterised in that the determining module be additionally operable to the designated software with dislike The similarity of the characteristic dimension of meaning software URL therebetween determines that the designated software is more than in the case of predetermined threshold value Malware, wherein, the URL is the corresponding URL of network behavior that software is produced, the feature dimensions of the URL of the Malware Spend the characteristic dimension of the URL having jointly for all Malwares in specified Malware family.
9. device according to claim 8, it is characterised in that the URL's between member in the specified malice family The similarity of characteristic dimension is higher than preset value, and the specified malice family is the set of default Malware, the acquisition mould Block is additionally operable to the characteristic dimension of the URL of the network behavior for obtaining software in the following manner:
The respective network behavior of multiple softwares is obtained, and parses the URL obtained in each described network behavior;Come according to key-value pair The parameter of the URL is split, multiple parameters segmentation is obtained, then by the Parameter Subsection evaluation mapping of the key-value pair to n-dimensional space Afterwards, the integer vectors of the URL are obtained, wherein, the integer vectors of the URL are the characteristic dimension of the URL, wherein, institute The coding that Parameter Subsection is key-value pair is stated, the number of dimensions in the n representative features space, the n is integer.
10. device according to claim 8 or claim 9, it is characterised in that the determining module is additionally operable in the designated software In the case of being more than predetermined threshold value with the similarity of the characteristic dimension of the URL in the Malware in group cluster, institute is determined It is the Malware in the group cluster to state designated software, wherein, the group cluster is that the specified malice is soft Group cluster in part family, the group cluster is obtained in the following manner:
Malware in the specified Malware family is divided into multiple groups by way of the characteristic dimension of the URL that is polymerized Cluster, wherein, in the case of there are multiple Malware families in the multiple Malware, by different Malwares Similarity in family group cluster high merges into specified group cluster.
CN201611265807.3A 2016-12-30 2016-12-30 Malicious software identification method and device Active CN106713335B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611265807.3A CN106713335B (en) 2016-12-30 2016-12-30 Malicious software identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611265807.3A CN106713335B (en) 2016-12-30 2016-12-30 Malicious software identification method and device

Publications (2)

Publication Number Publication Date
CN106713335A true CN106713335A (en) 2017-05-24
CN106713335B CN106713335B (en) 2020-10-30

Family

ID=58905647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611265807.3A Active CN106713335B (en) 2016-12-30 2016-12-30 Malicious software identification method and device

Country Status (1)

Country Link
CN (1) CN106713335B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107222511A (en) * 2017-07-25 2017-09-29 深信服科技股份有限公司 Detection method and device, computer installation and the readable storage medium storing program for executing of Malware
CN107609400A (en) * 2017-09-28 2018-01-19 深信服科技股份有限公司 Computer virus classification method, system, equipment and computer-readable recording medium
CN109951484A (en) * 2019-03-20 2019-06-28 四川长虹电器股份有限公司 The test method and system attacked for machine learning product
CN110399722A (en) * 2019-02-20 2019-11-01 腾讯科技(深圳)有限公司 A kind of virus family generation method, device, server and storage medium
CN110765393A (en) * 2019-09-17 2020-02-07 微梦创科网络科技(中国)有限公司 Method and device for identifying harmful URL (uniform resource locator) based on vectorization and logistic regression
CN112580027A (en) * 2020-12-15 2021-03-30 北京天融信网络安全技术有限公司 Malicious sample determination method and device, storage medium and electronic equipment

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145902A (en) * 2007-08-17 2008-03-19 东南大学 Fishing webpage detection method based on image processing
US20090094175A1 (en) * 2007-10-05 2009-04-09 Google Inc. Intrusive software management
CN102340424A (en) * 2010-07-21 2012-02-01 中国移动通信集团山东有限公司 Bad message detection method and bad message detection device
CN102708186A (en) * 2012-05-11 2012-10-03 上海交通大学 Identification method of phishing sites
CN104239582A (en) * 2014-10-14 2014-12-24 北京奇虎科技有限公司 Method and device for identifying phishing webpage based on feature vector model
CN104331436A (en) * 2014-10-23 2015-02-04 西安交通大学 Rapid classification method of malicious codes based on family genetic codes
CN104537303A (en) * 2014-12-30 2015-04-22 中国科学院深圳先进技术研究院 Distinguishing system and method for phishing website
CN104579773A (en) * 2014-12-31 2015-04-29 北京奇虎科技有限公司 Domain name system analysis method and device
US20150128263A1 (en) * 2013-11-07 2015-05-07 Cyberpoint International, LLC Methods and systems for malware detection
CN104794051A (en) * 2014-01-21 2015-07-22 中国科学院声学研究所 Automatic Android platform malicious software detecting method
CN105825129A (en) * 2015-01-04 2016-08-03 中国移动通信集团设计院有限公司 Converged communication malicious software identification method and system
CN105893848A (en) * 2016-04-27 2016-08-24 南京邮电大学 Precaution method for Android malicious application program based on code behavior similarity matching
CN106131016A (en) * 2016-07-13 2016-11-16 北京知道创宇信息技术有限公司 Maliciously URL detection interference method, system and device
CN106131071A (en) * 2016-08-26 2016-11-16 北京奇虎科技有限公司 A kind of Web method for detecting abnormality and device
EP3097658A1 (en) * 2014-01-24 2016-11-30 McAfee, Inc. Automatic placeholder finder-filler
US20160352777A1 (en) * 2014-11-17 2016-12-01 Vade Retro Technology Inc. Methods and systems for phishing detection
US9531736B1 (en) * 2012-12-24 2016-12-27 Narus, Inc. Detecting malicious HTTP redirections using user browsing activity trees

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145902A (en) * 2007-08-17 2008-03-19 东南大学 Fishing webpage detection method based on image processing
US20090094175A1 (en) * 2007-10-05 2009-04-09 Google Inc. Intrusive software management
CN102340424A (en) * 2010-07-21 2012-02-01 中国移动通信集团山东有限公司 Bad message detection method and bad message detection device
CN102708186A (en) * 2012-05-11 2012-10-03 上海交通大学 Identification method of phishing sites
US9531736B1 (en) * 2012-12-24 2016-12-27 Narus, Inc. Detecting malicious HTTP redirections using user browsing activity trees
US20150128263A1 (en) * 2013-11-07 2015-05-07 Cyberpoint International, LLC Methods and systems for malware detection
CN104794051A (en) * 2014-01-21 2015-07-22 中国科学院声学研究所 Automatic Android platform malicious software detecting method
EP3097658A1 (en) * 2014-01-24 2016-11-30 McAfee, Inc. Automatic placeholder finder-filler
CN104239582A (en) * 2014-10-14 2014-12-24 北京奇虎科技有限公司 Method and device for identifying phishing webpage based on feature vector model
CN104331436A (en) * 2014-10-23 2015-02-04 西安交通大学 Rapid classification method of malicious codes based on family genetic codes
US20160352777A1 (en) * 2014-11-17 2016-12-01 Vade Retro Technology Inc. Methods and systems for phishing detection
CN104537303A (en) * 2014-12-30 2015-04-22 中国科学院深圳先进技术研究院 Distinguishing system and method for phishing website
CN104579773A (en) * 2014-12-31 2015-04-29 北京奇虎科技有限公司 Domain name system analysis method and device
CN105825129A (en) * 2015-01-04 2016-08-03 中国移动通信集团设计院有限公司 Converged communication malicious software identification method and system
CN105893848A (en) * 2016-04-27 2016-08-24 南京邮电大学 Precaution method for Android malicious application program based on code behavior similarity matching
CN106131016A (en) * 2016-07-13 2016-11-16 北京知道创宇信息技术有限公司 Maliciously URL detection interference method, system and device
CN106131071A (en) * 2016-08-26 2016-11-16 北京奇虎科技有限公司 A kind of Web method for detecting abnormality and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107222511A (en) * 2017-07-25 2017-09-29 深信服科技股份有限公司 Detection method and device, computer installation and the readable storage medium storing program for executing of Malware
CN107609400A (en) * 2017-09-28 2018-01-19 深信服科技股份有限公司 Computer virus classification method, system, equipment and computer-readable recording medium
CN110399722A (en) * 2019-02-20 2019-11-01 腾讯科技(深圳)有限公司 A kind of virus family generation method, device, server and storage medium
CN110399722B (en) * 2019-02-20 2024-03-26 腾讯科技(深圳)有限公司 Virus family generation method, device, server and storage medium
CN109951484A (en) * 2019-03-20 2019-06-28 四川长虹电器股份有限公司 The test method and system attacked for machine learning product
CN110765393A (en) * 2019-09-17 2020-02-07 微梦创科网络科技(中国)有限公司 Method and device for identifying harmful URL (uniform resource locator) based on vectorization and logistic regression
CN112580027A (en) * 2020-12-15 2021-03-30 北京天融信网络安全技术有限公司 Malicious sample determination method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN106713335B (en) 2020-10-30

Similar Documents

Publication Publication Date Title
Aljawarneh et al. Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model
CN106713335A (en) Malicious software identification method and device
Halbouni et al. Machine learning and deep learning approaches for cybersecurity: A review
JP6534712B2 (en) Network intrusion detection method and apparatus
CN112738015B (en) Multi-step attack detection method based on interpretable convolutional neural network CNN and graph detection
US10721245B2 (en) Method and device for automatically verifying security event
CN111600919B (en) Method and device for constructing intelligent network application protection system model
CN105471882A (en) Behavior characteristics-based network attack detection method and device
CN103748853A (en) Method and system for classifying a protocol message in a data communication network
CN107360152A (en) A kind of Web based on semantic analysis threatens sensory perceptual system
CN110263538A (en) A kind of malicious code detecting method based on system action sequence
CN103532969A (en) Zombie network detection method, device and processor
CN112333195B (en) APT attack scene reduction detection method and system based on multi-source log correlation analysis
CN110445766A (en) Ddos attack method for situation assessment and device
Haltaş et al. An automated bot detection system through honeypots for large-scale
Nadeem et al. Beyond labeling: Using clustering to build network behavioral profiles of malware families
Bortolameotti et al. Headprint: detecting anomalous communications through header-based application fingerprinting
Villalba et al. Advanced payload analyzer preprocessor
Aljebreen et al. Enhancing DDoS attack detection using snake optimizer with ensemble learning on internet of things environment
CN110362995A (en) It is a kind of based on inversely with the malware detection of machine learning and analysis system
Otsuki et al. Evaluating payload features for malware infection detection
Maidamwar et al. Ensemble learning approach for classification of network intrusion detection in IoT environment
CN113965393B (en) Botnet detection method based on complex network and graph neural network
Sujana et al. Temporal based network packet anomaly detection using machine learning
Tan et al. Web Application Anomaly Detection Based On Converting HTTP Request Parameters To Numeric

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 215163 No. 181 Jingrun Road, Suzhou High-tech Zone, Jiangsu Province

Applicant after: SHANSHI NETWORK COMMUNICATION TECHNOLOGY CO., LTD.

Address before: 215163 No. 181 Jingrun Road, Suzhou High-tech Zone, Jiangsu Province

Applicant before: HILLSTONE NETWORKS

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant