CN109359439A - Software detecting method, device, equipment and storage medium - Google Patents

Software detecting method, device, equipment and storage medium Download PDF

Info

Publication number
CN109359439A
CN109359439A CN201811257390.5A CN201811257390A CN109359439A CN 109359439 A CN109359439 A CN 109359439A CN 201811257390 A CN201811257390 A CN 201811257390A CN 109359439 A CN109359439 A CN 109359439A
Authority
CN
China
Prior art keywords
feature
software
type feature
sample
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811257390.5A
Other languages
Chinese (zh)
Other versions
CN109359439B (en
Inventor
庞瑞
张宏君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN201811257390.5A priority Critical patent/CN109359439B/en
Publication of CN109359439A publication Critical patent/CN109359439A/en
Application granted granted Critical
Publication of CN109359439B publication Critical patent/CN109359439B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a kind of software detecting method, device, equipment and storage mediums, which comprises extracts the numeric type feature and nonumeric type feature that each sample is included in software sample library;The nonumeric type feature is handled using the selected non-encrypted hash algorithm of N kind, and processing result is converted into numeric type feature;The N is the integer greater than 1;According to the numeric type feature that the numeric type feature and conversion that include in each sample obtain, construction feature matrix;Utilize the eigenmatrix training machine Study strategies and methods;Using the Machine learning classifiers, target software is detected.The present invention can convert the complex characters string feature extracted from Malware sample to the Hash feature for being easy to machine learning algorithm processing, to reduce model training difficulty, training speed is significantly improved, space expense is reduced, improves Malware discrimination precision.

Description

Software detecting method, device, equipment and storage medium
Technical field
The present invention relates to detection technique field more particularly to a kind of software detecting method, device, equipment and storage mediums.
Background technique
Malware mainly includes destructive computer virus, worm-type virus, wooden horse back door, vulnerability exploit program, advertisement fishing Fish code etc., these Malwares can evade technology and security breaches combine with a variety of, break through existing traditional human system Monitoring, to user benefit generate greatly destroy.The purpose of malware detection system seeks to find to mix in normal in time Malware in file, and independently take measures before it generates damaging influence as far as possible, and notify user in time.
Malware detection method includes static file analysis detection and two kinds of dynamic behaviour analysis detection at present.It is existing Malware stationary detection technique relies primarily on manually generated condition code library and rule base and is matched, even if more advanced Heuristic virus investigation detection technique, it is also desirable to judgement be assisted to recognize by the expert knowledge library of manual maintenance.However current mutual It networks in the case where explosive extension, thousands of host and user all suffer from all kinds of mutation in internet, polymorphic, shell adding, Add the threat of Malwares such as obscuring.How to cope with variant virus and malware attacks rapidly, to magnanimity and type it is numerous The processing analysis that more Malwares is automated, improves the recall rate of Malware, reduces rate of false alarm, become current evil The main bugbear for software detection means of anticipating.
Detection method based on machine learning does not depend on condition code library and expert knowledge library, fast using trained model The differentiation of speed automation recognizes Malware, and can classify by further trained model to Malware, with compared with Good research and application prospect.Machine learning malware detection method relies primarily on two big steps, one is choosing suitable foot The sample of amount, and feature therein is extracted, the numerical value and nonumeric feature after extraction are screened and are cleaned, and are picked Except missing, error items, logarithm value tag does standardization and normalized, then carries out specific coding to nonumeric feature, generally Single hot spot (one-hot) coding is carried out, is converted into the numeric form of computer capacity identifying processing, then by the feature of all extractions Combine to form eigenmatrix.The second is need to select suitable machine learning modeling pattern, it is soft for current magnanimity malice The problem of part is brought, traditional logistic regression, naive Bayesian, support vector machines, the methods of decision tree is because of training speed Slowly, the factors such as consumption resource is huge, and model evaluation effect is poor are not suitable for malware detection and identification.
Traditional characteristic of malware extracting method is compiled for the character string information that extracts, or using one-hot Code, or it is converted into the value type of AscII code, this processing mode haves the defects that as follows:
1, one-hot coding compares character string number in string assemble, string name in the case where all determining Effectively, and in Malware the character string feature extracted is because Malware total amount is that unlimited, new Malware layer goes out not Thoroughly, therefore by the string assemble of training sample estimate that the string assemble of population sample can bring very big deviation;
2, character string, which turns AscII code, really can convert character string type feature to value type feature, but in view of difference The character string characteristic length of sample extraction may be inconsistent, so that the feature quantity after conversion is also inconsistent, how to AscII code It is more difficult that the character string of form carries out participle merogenesis, it is still desirable to which algorithm for design will input the eigenmatrix of machine learning model Dimension conversion is consistent, so that complexity is still higher;
3, it is difficult to cope with the magnanimity that viral generator generates and add and obscure, character string mutation, artificial plus interference is mixed sand etc. and to be supported The various modes of imperial virus investigation engine detection.
As it can be seen that the existing characteristic of malware extracting method based in machine learning detection method, which is not able to satisfy, to be needed It asks, so, how to convert the complex characters string feature extracted from Malware sample to from being easy to machine learning algorithm The feature of reason improves training speed, becomes the technical problems to be solved by the invention to reduce model training difficulty.
Summary of the invention
In view of the above problems, the embodiment of the present invention is proposed in order to provide a kind of software detecting method, device, equipment and is deposited Storage media.
One aspect according to an embodiment of the present invention provides a kind of software detecting method, comprising:
Extract the numeric type feature and nonumeric type feature that each sample is included in software sample library;
The nonumeric type feature is handled using the selected non-encrypted hash algorithm of N kind, and processing result is turned It is changed to numeric type feature;The N is the integer greater than 1;
According to the numeric type feature that the numeric type feature and conversion that include in each sample obtain, construction feature square Battle array;
Utilize the eigenmatrix training machine Study strategies and methods;
Using the Machine learning classifiers, target software is detected.
Other side according to an embodiment of the present invention provides a kind of software detection device, comprising:
Characteristic extracting module, for extracting the numeric type feature and nonumeric type spy that each sample in software sample library is included Sign;
Feature processing block, for using the selected non-encrypted hash algorithm of N kind to the nonumeric type feature at Reason, and processing result is converted into numeric type feature;The N is the integer greater than 1;
Matrix construction module, the numerical value for being obtained according to the numeric type feature and conversion that include in each sample Type feature, construction feature matrix;
Learning training module, for utilizing the eigenmatrix training machine Study strategies and methods;
Detection module detects target software for utilizing the Machine learning classifiers.
The third aspect according to an embodiment of the present invention, provides a kind of calculating equipment, the calculating equipment include: memory, Processor and communication bus;The communication bus is for realizing the connection communication between processor and memory;
The processor is for executing the software checking program stored in memory, to realize following method and step:
Extract the numeric type feature and nonumeric type feature that each sample is included in software sample library;
The nonumeric type feature is handled using the selected non-encrypted hash algorithm of N kind, and processing result is turned It is changed to numeric type feature;The N is the integer greater than 1;
According to the numeric type feature that the numeric type feature and conversion that include in each sample obtain, construction feature square Battle array;
Utilize the eigenmatrix training machine Study strategies and methods;
Using the Machine learning classifiers, target software is detected.
Fourth aspect according to an embodiment of the present invention provides a kind of computer readable storage medium, described computer-readable Computer program is stored on storage medium, which realizes following method and step when being executed by processor:
Extract the numeric type feature and nonumeric type feature that each sample is included in software sample library;
The nonumeric type feature is handled using the selected non-encrypted hash algorithm of N kind, and processing result is turned It is changed to numeric type feature;The N is the integer greater than 1;
According to the numeric type feature that the numeric type feature and conversion that include in each sample obtain, construction feature square Battle array;
Utilize the eigenmatrix training machine Study strategies and methods;
Using the Machine learning classifiers, target software is detected.
Compared with prior art, the invention has the following beneficial effects:
The software detection scheme that the embodiment of the present invention proposes is used based on the non-encrypted Hash feature of mixing and machine learning The software detecting method of model can convert the complex characters string feature extracted from Malware sample to and be easy to machine The Hash feature of learning algorithm processing significantly improves training speed to reduce model training difficulty, reduces space and opens Pin, improves Malware discrimination precision.
The program lacks most of complete virus signature library etc. and answers for lacking abundant Malware expert knowledge library With scene, there is preferable detection effect.While the common mutation of malware author and polymorphic equal escapes detection can be resisted Means, to artificial addition interference, shell adding and plus obscure and have stronger resistivity, use the machine learning of this feature processing method Classifier has preferable anti-interference ability and robustness.
Above description is only the general introduction of technical solution of the embodiment of the present invention, in order to better understand the embodiment of the present invention Technological means, and can be implemented in accordance with the contents of the specification, and in order to allow above and other mesh of the embodiment of the present invention , feature and advantage can be more clearly understood, the special specific embodiment for lifting the embodiment of the present invention below.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention The limitation of embodiment.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is a kind of flow chart for software detecting method that first embodiment of the invention provides;
Fig. 2 is a kind of flow chart for software detecting method that second embodiment of the invention provides;
Fig. 3 is a kind of characteristic processing method based on the non-encrypted hash algorithm of mixing that third embodiment of the invention provides Flow chart;
Fig. 4 is the flow chart of mixing splicing and recombination method in third embodiment of the invention;
Fig. 5 is a kind of structural block diagram for software detection device that fourth embodiment of the invention provides;
Fig. 6 is a kind of structural block diagram for software detection device that fifth embodiment of the invention provides;
Fig. 7 is a kind of structural block diagram for calculating equipment that sixth embodiment of the invention provides.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
In the first embodiment of the invention, a kind of software detecting method is provided, it is therefore an objective to be directed to existing malware detection The defect of method proposes a kind of based on the software detecting method for mixing non-encrypted Hash feature.Specifically, as shown in Figure 1, originally Described method includes following steps for embodiment:
Step S101 extracts each sample is included in software sample library numeric type feature and nonumeric type feature;
In the embodiment of the present invention, before executing this step, also executes and obtain software sample, construct the mistake in software sample library Journey.Specifically, marking the sample is black sample, and determines the type of Malware when getting Malware sample;When obtaining When getting normal software sample, marking the sample is white sample.It is carried out in this way using the software in software sample library subsequent Feature extraction and machine-learning process.
In the embodiment of the present invention, illustratively, the sample program in software sample library is mainly PE (Portable Executive, transplantable executable) file, or DLL (the Dynamic Link with similar file structure Liberary, dynamic link library) file.Numeric type feature can be extracted in sample program file in this way and nonumeric type is special Sign.Certainly, sample program may be other kinds of file in the embodiment of the present invention, and the present invention does not limit using PE uniquely simultaneously Or dll file type.
In one particular embodiment of the present invention, the numeric type feature includes one or more of following feature: Code head file information, code segment information, character string statistical information, sample general evaluation system information, function row in imported address list Table, export function list, byte count information and byte information entropy statistics.
In the embodiment of the present invention, the nonumeric type feature refers mainly to character string type data.In a tool of the invention In body embodiment, the nonumeric type feature includes one or more of following feature: recognizable character in software head information String sequence, all path string sequences, all uniform resource locator character string sequences, all registry entry characters String sequence, the machine models character string of software head information, all name section character string sequences of software, entrance name character string, The character string sequence of continuous Q or more recognizable character composition in all sections of software;Wherein, Q is positive integer.It is exemplary at one In embodiment, the Q takes 5, but Q is not limited to take the value.
It should be noted that those skilled in the art can carry out feature increase on the basis of features described above according to demand Or it reduces, but all within protection thought range of the invention.
Step S102 is handled the nonumeric type feature using the selected non-encrypted hash algorithm of N kind, and will place Reason result is converted to numeric type feature;The N is the integer greater than 1;
In the embodiment of the present invention, selecting the principle of non-encrypted hash algorithm is complementation between each algorithm, avoids using merely A kind of algorithm will cause Hash collision and Character losing.
In an exemplary embodiment of the present invention, three kinds of non-encrypted hash algorithms are selected, are specifically included: MurMurHash3 algorithm, SimHash algorithm and CRC32 algorithm.Certainly, those skilled in the art can carry out on this basis It increases or decreases.Which kind of algorithm is specifically used, is not the emphasis of the embodiment of the present invention, present invention focuses on protections using mixing Non-encrypted hash algorithm realizes the scheme of the extraction of composite character.
In one particular embodiment of the present invention, described to utilize the selected non-encrypted hash algorithm of N kind to the non-number Value type feature is handled, and processing result is converted to numeric type feature, is specifically included:
(1) the nonumeric type feature is grouped according to the packet mode of setting;
(2) it is directed to every group of nonumeric type feature, Hash processing is carried out respectively using the non-encrypted hash algorithm of N kind, obtains Integer is converted into N number of cryptographic Hash, and by obtained N number of cryptographic Hash;
(3) shaped characteristic of each group is spliced, the numeric type feature after being converted.
Step S103 according to the numeric type feature for including in each sample and converts the obtained numeric type feature, Construction feature matrix;
In one particular embodiment of the present invention, this step specific implementation is as follows:
Each numeric type feature is standardized;
Characteristic after standardization is normalized;
Character matrix is constructed using the characteristic after normalized.
In an alternate embodiment of the present invention where, after construction feature matrix, further includes: according to the dimensionality reduction side of setting Method carries out dimension-reduction treatment to the eigenmatrix.To reject the not strong characteristic series of obvious correlation, then result inputted into engineering Classifier is practised to be trained.
Step S104 utilizes the eigenmatrix training machine Study strategies and methods;
In one particular embodiment of the present invention, specific to wrap using the eigenmatrix training machine Study strategies and methods It includes:
The eigenmatrix constructed using the software sample for being labeled with Malware and normal software, the first machine of training Study strategies and methods, to classify to software as Malware or normal software;
Utilize the eigenmatrix for being labeled with different types of Malware sample and constructing, training the second machine learning point Class device is classified with the type to Malware.
That is, the first Machine learning classifiers are two sorting machine learning models, the second Machine learning classifiers are More sorting machine learning models.
In an alternate embodiment of the present invention where, after the complete Machine learning classifiers of training, further includes:
It is tested using the test sample collection Machine learning classifiers complete to training, to adjust the machine learning The model parameter of classifier.
Step S105 detects target software using the Machine learning classifiers.
Specifically, feature is extracted in target software in the way of step S102, and will be special in the embodiment of the present invention Sign input Machine learning classifiers are detected.The process is the test process of real-time online.Above-mentioned S101 to S104 can be Offline implementation process.
Specifically, in the embodiment of the present invention, Malware and normal software are realized using the first Machine learning classifiers Classification;The classification of malware type is realized using the second Machine learning classifiers, to realize the detection to target software.
In an alternate embodiment of the present invention where, when detecting target software is Malware, according further to setting Warning mode alerts.
In conclusion the software detection scheme that the embodiment of the present invention proposes, uses based on the non-encrypted Hash feature of mixing With the software detecting method of machine learning model, the complex characters string feature extracted from Malware sample can be converted Training speed is significantly improved to reduce model training difficulty to be easy to the Hash feature of machine learning algorithm processing, is dropped Low space expense, improves Malware discrimination precision.
In second embodiment of the invention, a kind of software detecting method is provided, compared to the first embodiment, the present embodiment will Specific embodiments of the present invention process more illustrate in detail in conjunction with example is specifically applied, it should be noted that this A large amount of technical details disclosed in embodiment are used to explain the present invention, and are not used to uniquely limit the present invention.
Specifically, as shown in Fig. 2, more specifically being provided the embodiment of the invention provides a kind of software detecting method It is a kind of soft based on the characteristic processing method for mixing non-encrypted hash algorithm, and the malice based on the method and machine learning algorithm Part detection means.Specifically comprise the following steps:
Step S100: collecting training sample, constructs software sample library;
Specifically, obtaining the Malware sample for machine learning training in the present embodiment, being labeled as black sample, remember For integer 1, while corresponding number of normal procedure sample is collected, is labeled as white sample, is denoted as integer 0.
It is true and reliable for the Malware black and white sample for how determining collected, in an exemplary implementation of the invention In example, using the open virus investigation engine library on the website virustotal, (sum about 60 to 70, can use engine quantity according to institute Scanning file type is different) software sample collected is scanned one by one, discrimination standard is that 50 or more virus investigations are drawn The Malware that is divided into for holding up detection, do not have a virus investigation engine detection is divided into normal file.Pass through this collection step 500000 Malware samples, 500,000 normal software samples, wherein 400,000 Malwares and 400,000 normal softwares are as training Data set, 100,000 Malwares and 100,000 normal softwares are as test data set.The program sample collected is mainly PE file, Or the dll file with similar file structure.The numerous antivirus engines that can use on virustotal simultaneously are soft to malice Part is classified, take ballot method select the type of most virus investigation softwares identification as the type of Malware in training data and Family.
Step S200: for the software training sample collected, the data information in each sample is extracted;
Specifically, the information extracted is divided into the embodiment of the present invention: numeric type information (including Boolean type, that is, it is considered as 1) and nonumeric type information (referring mainly to character string type data) 0 and.And check all data informations, to data that may be present Missing, data dislocation are corrected, it is ensured that obtained data information is completely errorless.
In the embodiment of the present invention, extracted numeric type feature is specifically included: code head file information, code segment information, Character string statistical information, sample general evaluation system information, function list, export function list, byte count letter in imported address list Breath and byte information entropy statistics.Specific features type example is as shown in table 1:
The numeric type feature that table 1 extracts
In the present embodiment:
Malicious code head file information, comprising: file virtual size, if be debug mode, if contain signature, PE timestamps, other numerical informations of PE file header, if storage table containing thread-local;
Code segment information, comprising: whether contain resource section, section area number, zero size code segment number, no name code segment Number contains " MEM_WRITE " section number;
Character string statistical information, comprising: recognizable character string number, character string average length, printable character string number Statistics, the sum of all character information entropys;
Sample general evaluation system information, comprising: path identifier " C: " quantity, http (s): // there is sum, " HKEY " There is quantity, quantity occurs in " MZ ", if contain relocation table, symbol numbers in symbol table;
Function list in imported address list, comprising: imported address list function numbers;
Export function list, comprising: export function numbers;
Byte count information, comprising: byte 0x00 to 0xFF number in whole file, file total bytes;
Byte information entropy statistics, comprising: the comentropy of byte is distributed.
In the embodiment of the present invention, extracted nonumeric type feature include: recognizable character string sequence in software head information, All path string sequence, all uniform resource locator character string sequences, all registry entry character string sequences, All name section character string sequences of the machine models character string of software head information, software, entrance name character string, software are all The character string sequence of continuous Q or more recognizable character composition in section;Wherein, Q is positive integer.In an exemplary embodiment In, the Q takes 5, but Q is not limited to take the value.Specific features type example is as shown in table 2:
The nonumeric type feature that table 2 extracts
For the description of feature listed by the above Tables 1 and 2 and feature extracting method, if agreement should for numeric type feature Item is sky, then with the replacement of integer numerical value 0, for nonumeric type feature, if this replaces it to be empty, with character string " 0 ".
Step S300: to above-mentioned nonumeric type feature carry out based on tri- kinds of MurMurHash3, SimHash, CRC32 it is non-plus The mixing Hash characteristic processing of close hash algorithm converts above-mentioned character string type feature difficult to deal with to the numerical value of regular length Type eigenmatrix.
Hash (Hash) algorithm is also known as hashing algorithm, i.e., a certain member is mapped to a specific section.It is generally divided into Cryptographic hashing algorithm and non-encrypted hash algorithm two major classes.Common MD5 algorithm is a kind of cryptographic hashing algorithm, can be incited somebody to action The character string of random length is mapped as the cryptographic Hash of one 128 (16 byte) by hashing algorithm, has and has a wide range of application, touches Hit the advantages such as rate is extremely low.However for the characteristic processing of machine learning model, it is not appropriate for using cryptographic hashing algorithm.It is former Because being that machine learning characteristic processing needs to retain to the greatest extent the general character of primitive character, so as to the energy in training process later Class discrimination is carried out using these general character.But the cryptographic hashing algorithm of MD5 etc is very sensitive for primitive character variation, only The only reverse acute variation that will also result in MD5 cryptographic Hash of a bit, destroys information included in original feature, this is to machine It is very unfavorable for learning training.Therefore, the embodiment of the present invention extracts these non-numbers using non-encrypted type hash algorithm Value type feature utmostly retains the classification information of primitive character, as a kind of effectively characteristic processing method.
Step S400: using the eigenmatrix obtained in step S300, being trained Machine learning classifiers, obtains machine Device learning classification model.
Specifically, two sorting machines can be trained to learn mould for the training data for being labeled with Malware and normal file Type realizes the function of differentiating identification Malware;More sorting machines can be trained for different classes of training data is labeled with Learning model, realization further distinguish it belongs to which family and type to the file for being determined as Malware.In the present invention Malware is divided into ad ware (Adware) in embodiment, backdoor programs (Backdoor programs), Trojan Horse Program (Trojan), destructive computer virus (virus), worm-type virus (worm) extort viral (Ransom), hack tool (HackTool), rogue software (Rogue), Rootkit, 10 major class such as antivirus tool (Virus Tool).
Machine learning algorithm used in the embodiment of the present invention is LightGBM method, i.e. light weight gradient elevator algorithm. LightGBM algorithm is a kind of method for improving, can preferably promote original traditional grad enhancement decision Tree algorithms, make its meter Calculate speed faster, the scope of application is wider, and precision is higher, and hardware spending is smaller.LightGBM has selected determining based on histogram Plan tree method greatly optimizes memory consumption and calculates cost.Compared with traditional pre-sorted algorithm, it is based on The algorithm memory consumption of histogram is only 1/8, and finding time complexity on cut-point in decision tree is O (n), but in number According to compared with pre-sorted algorithm, all features share the same concordance list, therefore only need to this concordance list in segmentation Operation.Communication cost can be greatly reduced when accelerating training using computer group by lightGBM simultaneously, save parallel Call duration time between computer, greatly accelerates training process.But the embodiment of the present invention is not related to utilize parallel computer Cluster is trained.
Step S500: utilizing test sample collection, tests to the obtained Machine learning classifiers of training and Performance Evaluation, Actual demand can be met to judge the model that training obtains.
Specifically, in the embodiment of the present invention, using 100,000 Malwares and 100,000 normal softwares test verification and measurement ratio and wrong report Rate, and class test, the accuracy rate of inspection machine Study strategies and methods are carried out to therein 10 extremely evil meaning software samples.
Specific implementation includes: to carry out performance metric to the model after the completion of training using test sample collection, takes accuracy (accuracy), recall rate (recall rate), ROC curve/AUC etc. performance indicators.In addition by the way of hypothesis testing, Extensive error is estimated using test error, to obtain the Generalization Capability situation of model.It i.e. can be with according to hypothesis testing result If being inferred to observe that model A is better than B on test set, then the Generalization Capability of A in statistical significance better than B probability have it is more It is few.Judge that can be trained model meet the needs of actual use based on the above appraisal procedure, it, then can be with if meet demand It carries out next step and then returns to the training stage if being unsatisfactory for demand, by adjusting training parameter, increase iteration number, Different cost functions, regular terms are chosen, the modes such as learning rate improve model performance.
Step S600: being packaged the model after test, to export the machine learning classification for meeting follow-up system processing Model;
It in an alternate embodiment of the present invention where, is intuitive readable json lattice by machine learning classification model encapsulation Formula includes model date of formation, types of models, feature name, feature value range, learning rate, sub-tree quantity and each son Set essential information, feature importance ranking etc.;
It is binary format by machine learning classification model encapsulation, comprising interior in another alternative embodiment of the invention Appearance is same as above, but is encapsulated using binary system, can greatly accelerate model read speed, for generating the model energy of decision tree substantial amounts Effectively reduce reading and parsing time.
Step S700: utilizing generated Machine learning classifiers, and it is special to receive externally input software under testing file data Sign, judges whether it is Malware, is for example, then determine which type Malware it belongs to using family classification model, and Malware warning is issued in real time;Wherein, warning mode can be selected by user, including and unlimited log, Email, pop-out Mouthful, the modes such as short message.
In third embodiment of the invention, proposes a kind of characteristic processing method based on the non-encrypted hash algorithm of mixing, be The implementation process of step S300 in second embodiment is described in detail.Specifically, as shown in figure 3, the method includes such as Lower step:
Step S301: nonumeric type characteristic is extracted by shown in above-mentioned table 2;
Step S302: duplicate removal denoising is carried out to these nonumeric type features;
Specifically, this step is then emphatically due to being cleaned to all numeric types and nonumeric feature before Repetition API is detected whether to character string sequence, DLL character string removes possible imperfect API, DLL character string, general API Function is ended up with .exe, and dll character string is ended up with .dll.
Step S303 is grouped nonumeric type feature, for each group, is all made of step S3041, S3042, Hash method described in S3043, obtains cryptographic Hash.
In an exemplary embodiment of the present invention, the nonumeric type the 2nd to 8 row in table 2 extracted about PE is special Sign is divided into one group, forms a character string sequence;The nonumeric type feature point that 9th to 15 row in table 2 is extracted about code segment It is one group, forms a character string sequence;Most latter two the 16th, 17 row extracts non-about imported address list and export function table Numeric type feature is divided into one group, forms a character string sequence.
Step S3041: Hash is carried out using nonumeric type feature of the Murmurhash3 algorithm to input.
The features such as Murmurhash is a kind of non-encrypted hash algorithm, has Hash speed fast, low collision rate, cryptographic Hash can Choosing has 32, and 64,128 place values such as use 128 cryptographic Hash, it is ensured that the Hash under millions data volume according to calculating Collision probability is almost 0.The embodiment of the present invention illustratively uses cryptographic Hash for 128 Murmurhash3 algorithms.
Specifically, Murmurhash3 algorithm is one group by choosing a sliding window to obtain continuous 2 bit blocks, benefit With large integer multiplication, shifting function, xor operation, first-order linear transformation, accumulation summation etc. is final to obtain 128 cryptographic Hash.
Step S3042: Hash is carried out using nonumeric type feature of the Simhash to input.
Simhash is a kind of local sensitivity Hash, can be good at retain initial data characteristic information, cryptographic Hash can It is comparative very strong, it can similarity between preferably more different hash values for example, by using Hamming distances.Simhash is generally used In the deduplication of magnanimity document, it is used for doing characteristic processing to the character string of extraction in embodiments of the present invention.
It is as follows that the embodiment of the present invention proposed carries out characteristic processing method using Simhash:
(1) 2 byte n-dimensional vectors are converted original character string using 2-gram method, in vector per it is one-dimensional be 2 words Section.
Such as character string " MSVCP60.dll " is converted into " [MS, SV, VC, CP, P6,60,0. .d, dl, ll] ";
(2) to above-mentioned n-dimensional vector per one weight W of one-dimensional designi(i=0,1..., n-1), if each weight is impartial, It can set
(3) in n-dimensional vector per one-dimensional carry out Hash, hash method can with unrestricted choice, unlimited encryption or it is non-plus Close hash algorithm, mainly by wishing that the cryptographic Hash digit mapped determines.The Hash for using MD5 to walk in the embodiment of the present invention as this Method generates 128 cryptographic Hash;
(4) weight W is subject to by turn to the cryptographic Hash after every one-dimensional Hashi, W is denoted as if the position is 1iIf the position is 0, then it is denoted as-Wi, then n after all weightings is tieed up into cryptographic Hash and is summed by turn, one is obtained per one-dimensional 128 dimensions for real-coded GA Vector;
(5) in this 128 dimensional vector, if wherein one-dimensional data is greater than threshold value σ, which is denoted as 1, remembers if being less than σ It is 0, if being equal to σ, is still denoted as 0, then can converts 128 dimension Bit Strings for the 128 dimension floating type vector, as finally Simhash result.
Wherein, the circular of threshold value σ is as follows:
Wherein BijIt is 1 or 0, W for the bit value of the jth position after vector i-th dimension Hash in step 2iAs step (2) define.
Step S3043: Hash is carried out using nonumeric type feature of the CRC32 to input.
CRC32 is a kind of cyclic redundancy check algorithm, and correctness verifies during being generally used for data frame transfer, the present invention In be used for for character string being hashing on 32 bit lengths, and carry out characteristic processing with it.The embodiment of the present invention utilizes CRC32 The step of carrying out characteristic processing are as follows:
(1) following generator polynomial is chosen:
C (x)=1+x+x2+x4+x5+x7+x8+x10+x11+x12+x16+x22+x23+x26+x32, 16 system sequences are 0xEDB88320。
(2) for the binary form of original character string sequence using above-mentioned generator polynomial as divisor, mod2 division fortune is done It calculates, obtained 32 remainders are CRC32 Hash coding.
Step S305: mixing splicing and recombination are carried out to the result obtained using three of the above hash algorithm, to be formed New feature vector and matrix.In the present embodiment, each group of cryptographic Hash is 128+128+32, and actual storage format is byte class Type.
As shown in figure 4, mixing splicing proposed by the present invention and recombination method:
For each group of cryptographic Hash, first 128+128 is segmented as unit of byte, and each byte type is turned Turn to integer, to rear 32 transformation in planta be long, 33 shaped characteristics of each group of formation in this way, then by three grouping sequentially Splicing forms 99 feature vectors altogether.
Step S306: numeric type feature as described in Table 1 is extracted, totally 641 shaped characteristics.Wherein Boolean type, which is considered as, takes It is worth the integer of position 0,1.
Step S307: standardization is done to extracted total 740 (that is: 641+99) a feature, to eliminate different spies Numberical range gap bring between sign influences.Its formulae express are as follows:
Wherein, E (x) is the mean value of this feature, and σ is standard deviation.
Characteristic after standardization is normalized, each characteristic of every a line is mapped to [0, 1] in section.
Step S308: data that treated form the eigenmatrix of dimension M × 740, and input Machine learning classifiers carry out Training.Wherein, M is the number of sample.
Optionally, in the embodiment of the present invention, data that treated form the eigenmatrix of dimension M × 740, and use Pierre Gloomy related coefficient, the dimension reduction methods such as Chi-square Test carry out dimensionality reduction to eigenmatrix, reject the not strong characteristic series of obvious correlation, then Result input Machine learning classifiers are trained.
Corresponding with first embodiment of the invention the method, fourth embodiment of the invention provides a kind of software detection dress It sets, as shown in figure 5, specifically including:
Characteristic extracting module 510, for extracting numeric type feature that each sample in software sample library is included and nonumeric Type feature;
Feature processing block 520, for being carried out using the selected non-encrypted hash algorithm of N kind to the nonumeric type feature Processing, and processing result is converted into numeric type feature;The N is the integer greater than 1;
Matrix construction module 530, described in being obtained according to the numeric type feature for including in each sample with conversion Numeric type feature, construction feature matrix;
Learning training module 540, for utilizing the eigenmatrix training machine Study strategies and methods;
Detection module 550 detects target software for utilizing the Machine learning classifiers.
Optionally, in the embodiment of the present invention, learning training module 540 is also used to complete to training using test sample collection The Machine learning classifiers are tested, to adjust the model parameter of the Machine learning classifiers.
Optionally, in the embodiment of the present invention, learning training module 540, specifically for using being labeled with Malware and just The software sample of normal software and the eigenmatrix constructed, the first Machine learning classifiers of training, using to software as Malware Or normal software is classified;And the eigenmatrix for being labeled with different types of Malware sample and constructing is utilized, The second Machine learning classifiers of training, are classified with the type to Malware.
Optionally, in the embodiment of the present invention, the numeric type feature includes one or more of following feature: code head Field information, code segment information, character string statistical information, sample general evaluation system information, function list, export in imported address list Function list, byte count information and byte information entropy statistics.
Optionally, in the embodiment of the present invention, the nonumeric type feature includes one or more of following feature: software Head information in recognizable character string sequence, all path string sequences, all uniform resource locator character string sequences, All registry entry character string sequence, the machine models character string of software head information, all name section character string sequences of software, Entrance name character string, the character string sequence that continuous Q or more recognizable character forms in all sections of software;Wherein, Q is positive Integer.
Optionally, in the embodiment of the present invention, feature processing block 520, be specifically used for by the nonumeric type feature according to The packet mode of setting is grouped;For every group of nonumeric type feature, carried out respectively using the non-encrypted hash algorithm of N kind Hash processing, obtains N number of cryptographic Hash, and convert integer for obtained N number of cryptographic Hash;The shaped characteristic of each group is spelled It connects, the numeric type feature after being converted.
Optionally, in the embodiment of the present invention, matrix construction module 530 is specifically used for carrying out each numeric type feature Standardization;Characteristic after standardization is normalized;Utilize the characteristic after normalized Construct character matrix.
Optionally, in the embodiment of the present invention, matrix construction module 530 is also used to after construction feature matrix, according to setting Fixed dimension reduction method carries out dimension-reduction treatment to the eigenmatrix.
Optionally, in the embodiment of the present invention, the non-encrypted hash algorithm of N kind includes at least two in following algorithm Kind: MurMurHash3 algorithm, SimHash algorithm and CRC32 algorithm.
The specific implementation process of above-mentioned each module can be found in the first and second embodiment, and this embodiment is not repeated.
In conclusion the software detection scheme that the embodiment of the present invention proposes, uses based on the non-encrypted Hash feature of mixing With the software detection scheme of machine learning model, the complex characters string feature extracted from Malware sample can be converted Training speed is significantly improved to reduce model training difficulty to be easy to the Hash feature of machine learning algorithm processing, is dropped Low space expense, improves Malware discrimination precision.
In the fifth embodiment of the present invention, a kind of software detection device is provided, as shown in fig. 6, specifically including:
Characteristic extracting module 610, for extracting numeric type feature that each sample in software sample library is included and nonumeric Type feature;
Feature processing block 620, for being carried out using the selected non-encrypted hash algorithm of N kind to the nonumeric type feature Processing, and processing result is converted into numeric type feature;The N is the integer greater than 1;
Matrix construction module 630, described in being obtained according to the numeric type feature for including in each sample with conversion Numeric type feature, construction feature matrix;
Learning training module 640, for utilizing the eigenmatrix training machine Study strategies and methods;Optionally, this module It can be set to off-line module, it is after off-line training that model encapsulation is good, and it is transferred to detection module 670;
File format discrimination module 650, for detect input target software whether be the present apparatus support software format, If so, triggering characteristic extracting module 610, the file of the target software transmitted is received by characteristic extracting module 610, is mentioned Numeric type feature included in software and/or nonumeric type feature are taken, file pre-scan module 660 is input to;
File pre-scan module 660, for searching for matched spy according to existing characteristic of malware code library and rule base Levy code, screening Malware.
Optionally, this module uses traditional condition code matching technique and yara rule match technology.If passing through spy Sign code and rule match detect Malware, then directly transmit alarm to result record and online alarm module 680;Otherwise, Feature processing block 620 is triggered, by feature processing block 620 using the selected non-encrypted hash algorithm of N kind to from target software The nonumeric type feature of middle extraction is handled, and processing result is converted to numeric type feature;And utilize matrix construction mould The numeric type feature that block 630 is obtained according to the numeric type feature and conversion extracted from target software, construction feature Matrix, and eigenmatrix is input to detection module 670;
Detection module 670 detects target software for utilizing the Machine learning classifiers.Specifically, this mould Block is set as, using generated detection and disaggregated model, receiving externally input file data feature to be measured in wire module, sentencing Whether disconnected is Malware, for example ' is ' then to determine which type Malware it belongs to using family classification model.
As a result it records and online alarm module 680: on line real-time monitoring malware detection as a result, simultaneously issuing in real time Malware warning, warning mode can select by user, including and unlimited log, Email, pop-up window, the side such as short message Formula.
In conclusion the software detection scheme that the embodiment of the present invention proposes, uses based on the non-encrypted Hash feature of mixing With the software detection scheme of machine learning model, the complex characters string feature extracted from Malware sample can be converted Training speed is significantly improved to reduce model training difficulty to be easy to the Hash feature of machine learning algorithm processing, is dropped Low space expense, improves Malware discrimination precision.Meanwhile scheme described in the embodiment of the present invention, also swept in advance by file It retouches module and software is judged in advance, when only can not judging, just input disaggregated model of the present invention, it is further to improve Identification effect.In addition, the present embodiment is also provided with alarm module, the use body of user is further improved by the module It tests.
In the sixth embodiment of the present invention, a kind of calculating equipment is provided, as shown in fig. 7, the calculating equipment includes: to deposit Reservoir 710, processor 720 and communication bus 730;The communication bus 730 for realizing processor 720 and memory 710 it Between connection communication;
Specifically, processor 720 can be general processor, such as central processing unit in the embodiment of the present invention (Central Processing Unit, CPU), can also be digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (English: Application Specific Integrated Circuit, ASIC), or it is arranged to implement one or more integrated circuits of the embodiment of the present invention.Wherein, memory 710 is for depositing Store up the executable instruction of the processor 720;
Memory 710 is transferred to processor 520 for storing program code, and by the program code.Memory 710 can To include volatile memory (Volatile Memory), such as random access memory (Random Access Memory, RAM);Memory 710 also may include nonvolatile memory (Non-Volatile Memory), such as read-only memory (Read-Only Memory, ROM), flash memory (Flash Memory), hard disk (Hard Disk Drive, HDD) or solid State hard disk (Solid-State Drive, SSD);Memory 710 can also include the combination of the memory of mentioned kind.
Specifically, processor 720 is for executing in the application program stored in memory 710 in the embodiment of the present invention Software checking program, to realize following method and step:
Step 1, the numeric type feature and nonumeric type feature that each sample is included in software sample library are extracted;
Step 2, the nonumeric type feature is handled using the selected non-encrypted hash algorithm of N kind, and will processing As a result numeric type feature is converted to;The N is the integer greater than 1;
Step 3, the numeric type feature obtained according to the numeric type feature and conversion that include in each sample, construction Eigenmatrix;
Step 4, the eigenmatrix training machine Study strategies and methods are utilized;
Step 5, using the Machine learning classifiers, target software is detected.
The implementation process of each step can be found in first to 3rd embodiment in the present embodiment, and this embodiment is not repeated.
In seventh embodiment of the invention, a kind of computer readable storage medium, the computer-readable storage medium are provided Computer program is stored in matter, which realizes following method and step when being executed by processor:
Step 1, the numeric type feature and nonumeric type feature that each sample is included in software sample library are extracted;
Step 2, the nonumeric type feature is handled using the selected non-encrypted hash algorithm of N kind, and will processing As a result numeric type feature is converted to;The N is the integer greater than 1;
Step 3, the numeric type feature obtained according to the numeric type feature and conversion that include in each sample, construction Eigenmatrix;
Step 4, the eigenmatrix training machine Study strategies and methods are utilized;
Step 5, using the Machine learning classifiers, target software is detected.
The implementation process of each step can be found in first to 3rd embodiment in the present embodiment, and this embodiment is not repeated.
Wherein, computer storage medium can be RAM memory, flash memory, ROM memory, eprom memory, EEPROM Memory, register, hard disk, mobile hard disk, CD-ROM or any other form known in the art storage medium.
In embodiment provided herein, it should be understood that disclosed device and method, it can also be by other Mode realize.The apparatus embodiments described above are merely exemplary, for example, the flow chart and block diagram in attached drawing are shown Device, the architectural framework in the cards of method and computer program product, function of multiple embodiments according to the present invention And operation.In this regard, each box in flowchart or block diagram can represent one of a module, section or code Point, a part of the module, section or code includes one or more for implementing the specified logical function executable Instruction.It should also be noted that function marked in the box can also be attached to be different from some implementations as replacement The sequence marked in figure occurs.For example, two continuous boxes can actually be basically executed in parallel, they sometimes may be used To execute in the opposite order, depending on this is according to related function.It is also noted that every in block diagram and or flow chart The combination of box in a box and block diagram and or flow chart can use the dedicated base for executing defined function or movement It realizes, or can realize using a combination of dedicated hardware and computer instructions in the system of hardware.
In addition, each functional module in each embodiment of the present invention can integrate one independent portion of formation together Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
In short, the foregoing is merely illustrative of the preferred embodiments of the present invention, it is not intended to limit the scope of the present invention. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims (20)

1. a kind of software detecting method characterized by comprising
Extract the numeric type feature and nonumeric type feature that each sample is included in software sample library;
The nonumeric type feature is handled using the selected non-encrypted hash algorithm of N kind, and processing result is converted to Numeric type feature;The N is the integer greater than 1;
According to the numeric type feature that the numeric type feature and conversion that include in each sample obtain, construction feature matrix;
Utilize the eigenmatrix training machine Study strategies and methods;
Using the Machine learning classifiers, target software is detected.
2. the method as described in claim 1, which is characterized in that utilize the Machine learning classifiers, carried out to target software Before detection, further includes:
It is tested using the test sample collection Machine learning classifiers complete to training, to adjust the machine learning classification The model parameter of device.
3. the method as described in claim 1, which is characterized in that utilize the eigenmatrix training machine Study strategies and methods, tool Body includes:
The eigenmatrix constructed using the software sample for being labeled with Malware and normal software, the first machine learning of training Classifier, to classify to software as Malware or normal software;
Utilize the eigenmatrix for being labeled with different types of Malware sample and constructing, the second machine learning classification of training Device is classified with the type to Malware.
4. the method as described in claim 1, which is characterized in that the numeric type feature includes one or more in following feature It is a: code head file information, code segment information, character string statistical information, sample general evaluation system information, function in imported address list List, export function list, byte count information and byte information entropy statistics.
5. the method as described in claim 1, which is characterized in that the nonumeric type feature include one in following feature or It is multiple: recognizable character string sequence, all path string sequences, all uniform resource locator words in software head information Accord with string sequence, all registry entry character string sequences, the machine models character string of software head information, all name section words of software Accord with string sequence, entrance name character string, the character string sequence that continuous Q or more recognizable character forms in all sections of software; Wherein, Q is positive integer.
6. the method as described in claim 1, which is characterized in that described to utilize the selected non-encrypted hash algorithm of N kind to described Nonumeric type feature is handled, and processing result is converted to numeric type feature, is specifically included:
The nonumeric type feature is grouped according to the packet mode of setting;
For every group of nonumeric type feature, Hash processing is carried out respectively using the non-encrypted hash algorithm of N kind, obtains N number of Kazakhstan Uncommon value, and integer is converted by obtained N number of cryptographic Hash;
The shaped characteristic of each group is spliced, the numeric type feature after being converted.
7. the method as described in claim 1, which is characterized in that it is described according to the numeric type feature for including in each sample and The obtained numeric type feature is converted, construction feature matrix specifically includes:
Each numeric type feature is standardized;
Characteristic after standardization is normalized;
Character matrix is constructed using the characteristic after normalized.
8. the method as described in claim 1, which is characterized in that after construction feature matrix, further includes: according to the drop of setting Dimension method carries out dimension-reduction treatment to the eigenmatrix.
9. method as claimed in any of claims 1 to 8 in one of claims, which is characterized in that the non-encrypted hash algorithm of N kind includes At least two: MurMurHash3 algorithm, SimHash algorithm and CRC32 algorithm in following algorithm.
10. a kind of software detection device characterized by comprising
Characteristic extracting module, for extracting the numeric type feature and nonumeric type feature that each sample in software sample library is included;
Feature processing block, for being handled using the selected non-encrypted hash algorithm of N kind the nonumeric type feature, and Processing result is converted into numeric type feature;The N is the integer greater than 1;
Matrix construction module, the numeric type for being obtained according to the numeric type feature and conversion that include in each sample are special Sign, construction feature matrix;
Learning training module, for utilizing the eigenmatrix training machine Study strategies and methods;
Detection module detects target software for utilizing the Machine learning classifiers.
11. device as claimed in claim 10, which is characterized in that the learning training module is also used to utilize test sample The Machine learning classifiers for collecting complete to training are tested, to adjust the model parameter of the Machine learning classifiers.
12. device as claimed in claim 10, which is characterized in that the learning training module is labeled with specifically for utilizing The software sample of Malware and normal software and the eigenmatrix constructed, the first Machine learning classifiers of training, to soft Part is that Malware or normal software are classified;And it utilizes and is labeled with different types of Malware sample and constructs Eigenmatrix, training the second Machine learning classifiers, classified with the type to Malware.
13. device as claimed in claim 10, which is characterized in that the numeric type feature include one in following feature or It is multiple: code head file information, code segment information, character string statistical information, sample general evaluation system information, letter in imported address list Ordered series of numbers table, export function list, byte count information and byte information entropy statistics.
14. device as claimed in claim 10, which is characterized in that the nonumeric type feature includes one in following feature It is or multiple: recognizable character string sequence, all path string sequences, all uniform resource locator in software head information Character string sequence, all registry entry character string sequences, the machine models character string of software head information, all name sections of software Character string sequence, entrance name character string, the character string sequence that continuous Q or more recognizable character forms in all sections of software Column;Wherein, Q is positive integer.
15. device as claimed in claim 10, which is characterized in that the feature processing block is specifically used for the non-number Value type feature is grouped according to the packet mode of setting;For every group of nonumeric type feature, the non-encrypted Hash of N kind is utilized Algorithm carries out Hash processing respectively, obtains N number of cryptographic Hash, and convert integer for obtained N number of cryptographic Hash;By the integer of each group Feature is spliced, the numeric type feature after being converted.
16. device as claimed in claim 10, which is characterized in that the matrix construction module is specifically used for each number Value type feature is standardized;Characteristic after standardization is normalized;Utilize normalized Characteristic afterwards constructs character matrix.
17. device as claimed in claim 10, which is characterized in that the matrix construction module is also used in construction feature square After battle array, according to the dimension reduction method of setting, dimension-reduction treatment is carried out to the eigenmatrix.
18. the device as described in any one of claim 10 to 17, which is characterized in that the non-encrypted hash algorithm of N kind Including at least two: MurMurHash3 algorithm, SimHash algorithm and the CRC32 algorithm in following algorithm.
19. a kind of calculating equipment, which is characterized in that the calculating equipment includes: memory, processor and communication bus;It is described Communication bus is for realizing the connection communication between processor and memory;
The processor is for executing the software checking program stored in memory, to realize such as any one of claims 1 to 9 The step of described software detecting method.
20. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium The step of program, which realizes software detecting method as claimed in any one of claims 1-9 wherein when being executed by processor.
CN201811257390.5A 2018-10-26 2018-10-26 software detection method, device, equipment and storage medium Active CN109359439B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811257390.5A CN109359439B (en) 2018-10-26 2018-10-26 software detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811257390.5A CN109359439B (en) 2018-10-26 2018-10-26 software detection method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109359439A true CN109359439A (en) 2019-02-19
CN109359439B CN109359439B (en) 2019-12-13

Family

ID=65346949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811257390.5A Active CN109359439B (en) 2018-10-26 2018-10-26 software detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109359439B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992969A (en) * 2019-03-25 2019-07-09 腾讯科技(深圳)有限公司 A kind of malicious file detection method, device and detection platform
CN110210224A (en) * 2019-05-21 2019-09-06 暨南大学 A kind of mobile software similitude intelligent detecting method of big data based on description entropy
CN111079164A (en) * 2019-12-18 2020-04-28 深圳前海微众银行股份有限公司 Feature correlation calculation method, device, equipment and computer-readable storage medium
CN111143670A (en) * 2019-12-09 2020-05-12 中国平安财产保险股份有限公司 Information determination method and related product
CN111144459A (en) * 2019-12-16 2020-05-12 重庆邮电大学 Class-unbalanced network traffic classification method and device and computer equipment
CN111352834A (en) * 2020-02-25 2020-06-30 江苏大学 Self-adaptive random test method based on locality sensitive hashing
CN111581640A (en) * 2020-04-02 2020-08-25 北京兰云科技有限公司 Malicious software detection method, device and equipment and storage medium
CN112100453A (en) * 2019-06-18 2020-12-18 深信服科技股份有限公司 Method, system, equipment and computer storage medium for character string distribution statistics
CN112380537A (en) * 2020-11-30 2021-02-19 北京天融信网络安全技术有限公司 Method, device, storage medium and electronic equipment for detecting malicious software
CN112883375A (en) * 2021-02-03 2021-06-01 深信服科技股份有限公司 Malicious file identification method, device, equipment and storage medium
CN113254935A (en) * 2021-07-02 2021-08-13 北京微步在线科技有限公司 Malicious file identification method and device and storage medium
CN113569241A (en) * 2021-07-28 2021-10-29 新华三技术有限公司 Virus detection method and device
CN114115730A (en) * 2021-11-02 2022-03-01 北京银盾泰安网络科技有限公司 Application container storage engine platform
CN115221857A (en) * 2022-09-21 2022-10-21 中国电子信息产业集团有限公司 Data similarity detection method and device containing numerical value types

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007117582A2 (en) * 2006-04-06 2007-10-18 Smobile Systems Inc. Malware detection system and method for mobile platforms
US20120260343A1 (en) * 2006-09-19 2012-10-11 Microsoft Corporation Automated malware signature generation
CN104376262A (en) * 2014-12-08 2015-02-25 中国科学院深圳先进技术研究院 Android malware detecting method based on Dalvik command and authority combination
CN106778266A (en) * 2016-11-24 2017-05-31 天津大学 A kind of Android Malware dynamic testing method based on machine learning
CN108595955A (en) * 2018-04-25 2018-09-28 东北大学 A kind of Android mobile phone malicious application detecting system and method
CN108614970A (en) * 2018-04-03 2018-10-02 腾讯科技(深圳)有限公司 Detection method, model training method, device and the equipment of Virus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007117582A2 (en) * 2006-04-06 2007-10-18 Smobile Systems Inc. Malware detection system and method for mobile platforms
US20120260343A1 (en) * 2006-09-19 2012-10-11 Microsoft Corporation Automated malware signature generation
CN104376262A (en) * 2014-12-08 2015-02-25 中国科学院深圳先进技术研究院 Android malware detecting method based on Dalvik command and authority combination
CN106778266A (en) * 2016-11-24 2017-05-31 天津大学 A kind of Android Malware dynamic testing method based on machine learning
CN108614970A (en) * 2018-04-03 2018-10-02 腾讯科技(深圳)有限公司 Detection method, model training method, device and the equipment of Virus
CN108595955A (en) * 2018-04-25 2018-09-28 东北大学 A kind of Android mobile phone malicious application detecting system and method

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992969A (en) * 2019-03-25 2019-07-09 腾讯科技(深圳)有限公司 A kind of malicious file detection method, device and detection platform
CN109992969B (en) * 2019-03-25 2023-03-21 腾讯科技(深圳)有限公司 Malicious file detection method and device and detection platform
CN110210224A (en) * 2019-05-21 2019-09-06 暨南大学 A kind of mobile software similitude intelligent detecting method of big data based on description entropy
CN110210224B (en) * 2019-05-21 2023-01-31 暨南大学 Intelligent big data mobile software similarity detection method based on description entropy
CN112100453A (en) * 2019-06-18 2020-12-18 深信服科技股份有限公司 Method, system, equipment and computer storage medium for character string distribution statistics
CN111143670A (en) * 2019-12-09 2020-05-12 中国平安财产保险股份有限公司 Information determination method and related product
CN111144459B (en) * 2019-12-16 2022-12-16 重庆邮电大学 Unbalanced-class network traffic classification method and device and computer equipment
CN111144459A (en) * 2019-12-16 2020-05-12 重庆邮电大学 Class-unbalanced network traffic classification method and device and computer equipment
CN111079164A (en) * 2019-12-18 2020-04-28 深圳前海微众银行股份有限公司 Feature correlation calculation method, device, equipment and computer-readable storage medium
CN111352834A (en) * 2020-02-25 2020-06-30 江苏大学 Self-adaptive random test method based on locality sensitive hashing
CN111581640A (en) * 2020-04-02 2020-08-25 北京兰云科技有限公司 Malicious software detection method, device and equipment and storage medium
CN112380537A (en) * 2020-11-30 2021-02-19 北京天融信网络安全技术有限公司 Method, device, storage medium and electronic equipment for detecting malicious software
CN112883375A (en) * 2021-02-03 2021-06-01 深信服科技股份有限公司 Malicious file identification method, device, equipment and storage medium
CN113254935A (en) * 2021-07-02 2021-08-13 北京微步在线科技有限公司 Malicious file identification method and device and storage medium
CN113569241A (en) * 2021-07-28 2021-10-29 新华三技术有限公司 Virus detection method and device
CN114115730A (en) * 2021-11-02 2022-03-01 北京银盾泰安网络科技有限公司 Application container storage engine platform
CN114115730B (en) * 2021-11-02 2023-06-13 北京银盾泰安网络科技有限公司 Application container storage engine platform
CN115221857A (en) * 2022-09-21 2022-10-21 中国电子信息产业集团有限公司 Data similarity detection method and device containing numerical value types
CN115221857B (en) * 2022-09-21 2023-01-13 中国电子信息产业集团有限公司 Data similarity detection method and device containing numerical value types

Also Published As

Publication number Publication date
CN109359439B (en) 2019-12-13

Similar Documents

Publication Publication Date Title
CN109359439A (en) Software detecting method, device, equipment and storage medium
CN109784056B (en) Malicious software detection method based on deep learning
CN110704840A (en) Convolutional neural network CNN-based malicious software detection method
CN110135157B (en) Malicious software homology analysis method and system, electronic device and storage medium
CN110263538B (en) Malicious code detection method based on system behavior sequence
CN107609399A (en) Malicious code mutation detection method based on NIN neutral nets
CN109829306A (en) A kind of Malware classification method optimizing feature extraction
CN110363003B (en) Android virus static detection method based on deep learning
CN111915437A (en) RNN-based anti-money laundering model training method, device, equipment and medium
Chaganti et al. Image-based malware representation approach with EfficientNet convolutional neural networks for effective malware classification
CN111753290A (en) Software type detection method and related equipment
CN110909348A (en) Internal threat detection method and device
Jin et al. A malware detection approach using malware images and autoencoders
CN111400713B (en) Malicious software population classification method based on operation code adjacency graph characteristics
Rahman et al. Interpreting Machine and Deep Learning Models for PDF Malware Detection using XAI and SHAP Framework
Nahhas et al. Android Malware Detection Using ResNet-50 Stacking.
CN112000954B (en) Malicious software detection method based on feature sequence mining and simplification
CN115545091A (en) Integrated learner-based malicious program API (application program interface) calling sequence detection method
Waghmare et al. A review on malware detection methods
CN115842645A (en) UMAP-RF-based network attack traffic detection method and device and readable storage medium
Dai et al. Anticoncept drift method for malware detector based on generative adversarial network
CN113821840A (en) Bagging-based hardware Trojan detection method, medium and computer
CN113609290A (en) Address recognition method and device and storage medium
CN111581640A (en) Malicious software detection method, device and equipment and storage medium
Jiang et al. A pyramid stripe pooling-based convolutional neural network for malware detection and classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant