CN109871686A - Rogue program recognition methods and device based on icon representation and software action consistency analysis - Google Patents

Rogue program recognition methods and device based on icon representation and software action consistency analysis Download PDF

Info

Publication number
CN109871686A
CN109871686A CN201910123265.3A CN201910123265A CN109871686A CN 109871686 A CN109871686 A CN 109871686A CN 201910123265 A CN201910123265 A CN 201910123265A CN 109871686 A CN109871686 A CN 109871686A
Authority
CN
China
Prior art keywords
icon
software
classification
sample
tested
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910123265.3A
Other languages
Chinese (zh)
Inventor
舒辉
杨萍
康绯
熊小兵
光焱
桂智杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Publication of CN109871686A publication Critical patent/CN109871686A/en
Pending legal-status Critical Current

Links

Abstract

The invention belongs to technical field of network security, in particular to a kind of rogue program recognition methods and device based on icon representation and software action consistency analysis, this method includes: collecting known normal software data of classifying, it extracts known normal software icon resource data and imports Table A PI data, construct CNN deep learning model, icon and importing Table A PI information are trained respectively, establish icon disaggregated model and software classification model, according to icon classification and software action classification information, software program routine information library is obtained;Structure elucidation is carried out to sample to be tested, extract icon resource data and imports Table A PI function data, is tested by CNN deep learning model, the classification of sample to be tested icon and software action classification information are obtained;Classify according to test result to sample to be tested icon and the behavior congruence of software action classification determines.The present invention realizes that automatic, batch rogue program quickly detects, and is effectively identified by the malicious program code of the camouflages such as the similar icon of software.

Description

Rogue program recognition methods based on icon representation and software action consistency analysis and Device
Technical field
The invention belongs to technical field of network security, in particular to a kind of based on icon representation and software action consistency point The rogue program recognition methods of analysis and device.
Background technique
Traditional malicious code analysis method is broadly divided into Static Analysis Method and dynamic analysing method.Static Analysis Method Refer in the case where not executing program, dis-assembling, decompiling etc. are carried out to program, then analyzed again, main method has Static source code analysis, static disassembly analysis, decompiling analysis;Dynamic analysing method refers to using program debugging tool to evil Meaning code is tracked, and is observed malicious code implementation procedure, is dissected the working mechanism of malicious code and verify staticaanalysis results, Main method has system to call behavior analysis method and trigger-initiated scanning technology.But it is traditional based on code and behavioural characteristic Malicious code detecting method generally require to take a substantial amount of time by cumbersome step and can be only achieved preferable effect.According to Statistics, in a large amount of malicious code, has sizable a part to belong to the malicious code of trick type, usually used and WORD It waits the similar icon of popular softwares simply to pretend oneself, and then user is inveigled to go to click.After clicking operation, such evil Meaning code then carries out a series of operation such as steal secret information, extort, and makes the information assets risk of user.Then, in recent years, disliking Meaning code detection field proposes a kind of new approaches based on icon similarity analysis.The innovative point of the thinking is that from icon It sets out, malicious code is utilized pretends using icon similar with normal software this feature of oneself, carry out malicious code Detection, greatly improves Malicious Code Detection efficiency and precision.Therefore, carry out the malicious code based on icon similarity analysis The research of detection method has important practical significance for the detection work of malicious code.
Using the method for machine learning, information is extracted from icon to improve the precision of detection Malicious Code Detection, including Two steps: 1) extracting icon characteristics and use collect statistics (Summary Statics), histograms of oriented gradients (HOG, Histogram of Oriented Gradient) and a convolution autocoder;2) according to the icon characteristics of extraction to figure Mark is clustered.When experiment shows to analyze in prediction model using icon, mean accuracy increases 10%, the disadvantage is that requiring Artificial to extract feature, icon classification speed and precision be not high, and does not provide a kind of vaild act analysis method.Based on application The malicious code of mobile terminal detection method and system of program icon comprise the concrete steps that and first correspond to be carried out with the installation kit of program Analysis, the icon of the application program is extracted, then the extraction system api function from the application code file, will The icon of the application program is corresponding with application icon function rule base, so that function rule corresponding with this icon is retrieved, By the api function of the application call compared with the corresponding function rule of the icon pair, if unanimously, normally to apply Program;It otherwise is the application program of malice;But this technology does not have practicability, malicious code is many kinds of, and application program API information can not reflect software function completely, therefore, in a practical situation, can there are problems that it is serious wrong report or fail to report.From It is existing based on icon analysis in order to adapt to the automatic detection demand of extensive diversified malice sample from the point of view of actual effect The means and method of Malicious Code Detection have the disadvantage that shortage versatility, can only be to using icon similar to normal software Sample detected, and the sample for using other icons can not be detected;Lack practicability, when to extensive sample into When row detection, efficiency is very low.
Summary of the invention
For this purpose, the present invention provides a kind of rogue program recognition methods based on icon representation and software action consistency analysis And device, calculation amount is substantially reduced, realizes that automatic, batch rogue program quickly detects, high-efficient, versatility, strong applicability.
According to design scheme provided by the present invention, a kind of malice based on icon representation and software action consistency analysis Procedure identification method includes following content:
A known normal software data of classifying) are collected, known normal software icon resource data are extracted and import Table A PI number According to building CNN deep learning model is respectively trained icon and importing Table A PI information, and according to icon classification and software Behavior classification information obtains conventional software library;
B structure elucidation) is carried out to sample to be tested, extract icon resource data and imports Table A PI function data, is transmitted to instruction It is tested in the conventional software library perfected, obtains the classification of sample to be tested icon and software action classification information;
C) classify according to test result to sample to be tested icon and the behavior congruence of software action classification information is sentenced It is fixed, if unanimously, being determined as normal software, if inconsistent, it is determined as Malware, and it is defeated to generate rogue program examining report Out.
Above-mentioned, CNN deep learning model includes for the icon according to icon resource data acquiring software programs categories Disaggregated model and according to import the programs categories behavior of Table A PI data acquiring software software action disaggregated model.
Preferably, icon disaggregated model use comprising input layer, convolutional layer, pond layer, full articulamentum and output layer volume Product neural network model, wherein convolutional layer and pond layer are arranged alternately, and convolutional layer extracts icon characteristics by convolution operation, are led to It crosses pond layer and carries out Feature Dimension Reduction, icon classification results are exported by full articulamentum.
Preferably, in software action disaggregated model, Table A PI function data will be imported and be saved as text file format, by software Behavior classification problem is converted into text classification problem.
Further, during converting text classification problem for software action classification problem, it will extract and each of obtain Table A PI function data is imported as sample, every a line indicates a behavior in the sample, and a line is considered as an entirety, traverses All importing Table A PI function data texts carry out duplicate removal to all behaviors occurred, obtain dictionary;Use consecutive numbers Word carries out label to each of dictionary word, obtains the mapping of dynamic behaviour to label id;Text is switched into two-dimensional matrix, it will Each of dictionary word indicates with a vector, and for each sample, the behavior in text is converted to pair according to dictionary The id sequence answered converts the samples into two-dimensional matrix according to the vector of each id in the id sequence and dictionary;Utilize convolution mind Through training sample, for the two-dimensional matrix of input, convolution is carried out with multiple convolution kernels respectively, after convolution, then to every One convolution results takes the maximum value in column vector using maximum pond;The corresponding maximum value of all convolution kernel results is connected, Full articulamentum is constituted, carries out more classification processings with softmax classifier.
Label, the mapping process of acquisition dynamic behaviour to label id are carried out to each of dictionary word with continuous number In, and unknown dynamic behaviour is added, which matches the unknown behavior not in dictionary for after.
Above-mentioned, during conventional software library test, if occurring that the classification of sample to be tested icon and software action can not be obtained Classification information then determines that the sample to be tested is new classification, which is added in software program routine information library.
A kind of rogue program identification device based on icon representation and software action consistency analysis includes: collection module, Test module and determination module, wherein
Collection module, for extracting known normal software icon resource number for collecting known normal software data of classifying According to import Table A PI data, construct CNN deep learning model, respectively to icon and import Table A PI information be trained, and according to According to icon classification and software action classification information, conventional software library is obtained;
Test module, for carrying out structure elucidation to sample to be tested, extracting icon resource data and importing Table A PI function number According to, it is transmitted in trained conventional software library and is tested, the classification of acquisition sample to be tested icon and software action classification information;
Determination module, for the behavior one according to test result to sample to be tested icon classification and software action classification information Cause property is determined, if unanimously, being determined as normal software, if inconsistent, is determined as Malware, and generate rogue program Examining report output.
Beneficial effects of the present invention:
Comprehensive analysis tradition malicious code detecting method of the present invention and the Malicious Code Detection analyzed currently based on icon The advantage and disadvantage of method, obtained using machine learning conventional software test model software category information that icon information gives expression to and Software action classification information, the identification based on icon representation and software action consistency analysis effectively solve traditional malicious code Detection method low efficiency, the problems such as cost is high, meanwhile, it solves and is deposited in the malicious code detecting method currently based on icon analysis Not can be carried out the situations such as effective detection with normal software similar diagram target rogue program for being not used, in extensive sample In the case where this, realizes automatic, batch rogue program and quickly detect, it is high-efficient, it can effectively identify in network by soft The malicious program code that similar icon of part etc. is pretended, is further ensured that network subscriber information assets security, to network security Technology development has great importance.
Detailed description of the invention:
Fig. 1 is one of rogue program recognition methods flow chart in embodiment;
Fig. 2 is two of rogue program recognition methods flow chart in embodiment;
Fig. 3 is rogue program identification device schematic diagram in embodiment;
Fig. 4 is rogue program identification device working principle diagram in embodiment.
Specific embodiment:
To make the object, technical solutions and advantages of the present invention clearer, understand, with reference to the accompanying drawing with technical solution pair The present invention is described in further detail.
In order to adapt to the automatic detection demand of extensive diversified malice sample, in the embodiment of the present invention, referring to Fig. 1 institute Show, a kind of rogue program recognition methods based on icon representation and software action consistency analysis be provided, includes following content:
S101 known normal software data of classifying) are collected, known normal software icon resource data are extracted and import Table A PI Data, construct CNN deep learning model, respectively to icon and import Table A PI information be trained, and according to icon classification and it is soft Part behavior classification information, obtains conventional software library;
S102 structure elucidation) is carried out to sample to be tested, extract icon resource data and imports Table A PI function data, transmission To being tested in trained conventional software library, the classification of sample to be tested icon and software action classification information are obtained;
S103) classify according to test result to sample to be tested icon and the behavior congruence of software action classification information carries out Determine, if unanimously, being determined as normal software, if inconsistent, is determined as Malware, and generate rogue program examining report Output.
Shown in Figure 2, in another embodiment of the present invention, CNN deep learning model includes for according to icon resource The icon disaggregated model of data acquiring software programs categories and according to import the programs categories behavior of Table A PI data acquiring software it is soft Part behavior disaggregated model.Establish icon disaggregated model and software action disaggregated model.Icon disaggregated model is the figure for analyzing software Information is marked, the software category information that icon information gives expression to is obtained;Software classification model is determined according to icon disaggregated model As a result, analyze corresponded in the behavioural information and software action disaggregated model of the software such software behavior it is whether consistent.
During establishing icon disaggregated model, image classification is carried out using CNN even depth learning model, is achieved not Wrong effect, the classification of icon icon still have feasibility.In another embodiment of the present invention, icon disaggregated model use comprising Input layer, convolutional layer, pond layer, full articulamentum and output layer convolutional neural networks model, wherein convolutional layer and pond layer are handed over For setting, convolutional layer extracts icon characteristics by convolution operation, carries out Feature Dimension Reduction by pond layer, is exported by full articulamentum Icon classification results.
In establishing software action disaggregated model, in another embodiment of the present invention, Table A PI function data will be imported and be saved as Software action classification problem is converted text classification problem by text file format.The importing Table A PI information for extracting software is deposited At text file format, text classification problem is converted into software action classification problem, it can be using CNN in natural language processing The method of aspect.Start with from icon, using the method for deep learning, establishes icon disaggregated model and software classification model, firstly, Using the icon information of malicious code, efficiently solve the problems, such as that traditional malicious code detecting method low efficiency, cost are high, together When, solve currently based on icon analysis malicious code detecting method existing deficiency, i.e., for be not used with normally it is soft Part similar diagram target rogue program not can be carried out effective detection, and in the case where extensive sample, realize it is automatic, batch The rogue program of amount quickly detects.
During converting text classification problem for software action classification problem, in another embodiment of the present invention, it will mention The each importing Table A PI function data obtained is as sample, and every a line indicates a behavior in the sample, and a line is considered as one A entirety traverses all importing Table A PI function data texts, carries out duplicate removal to all behaviors occurred, obtains word Library;Label is carried out to each of dictionary word with continuous number, obtains the mapping of dynamic behaviour to label id;Text is switched to Two-dimensional matrix indicates each of dictionary word with a vector, for each sample, by the behavior in text according to word Library is converted to corresponding id sequence, converts the samples into two-dimensional matrix according to the vector of each id in the id sequence and dictionary; Using convolutional neural networks training sample, for the two-dimensional matrix of input, convolution is carried out with multiple convolution kernels respectively, by convolution Afterwards, then to each convolution results using maximum pond, the maximum value in column vector is taken;All convolution kernel results are corresponding most Big value connection, constitutes full articulamentum, carries out more classification processings with softmax classifier.
Label, the mapping process of acquisition dynamic behaviour to label id are carried out to each of dictionary word with continuous number In, and unknown dynamic behaviour is added, which matches the unknown behavior not in dictionary for after.
It is shown in Figure 2,1) sample information extraction module, PE structure elucidation is carried out to incoming sample to be tested, is extracted Api function information in its icon resource information and importing table, icon still save as icon format, import Table A PI and are saved as text lattice Formula, the information input as malicious detection in next step.2) collect a large amount of normal software, as office software, video software, Music software etc., wherein the icon of the application programs such as WORD is applied extremely wide in malicious code, is had for ordinary user There is extremely strong trick;To normal software, the first extraction of progress icon resource information and importing Table A PI information, the side of extraction Formula is identical as the Functional Design of sample information extraction module;Establish icon disaggregated model and software action disaggregated model.Icon point Class model uses CNN, is made of input layer, convolutional layer, pond layer, full articulamentum and output layer, using convolutional layer and pond layer It is arranged alternately;Software classification model will extract obtained each importing Table A PI text and regard sample as, every a line in the sample It indicates a behavior, a line is considered as an entirety, traverse all importing Table A PI texts, acquisition occurred all Behavior (duplicate removal), as dictionary.Label is carried out to each of dictionary word with continuous number, dynamic behaviour available in this way To the mapping of label id.Other than the dynamic behaviour occurred, it is also in addition added to " Unknown " dynamic behaviour, is used for The unknown behavior not in dictionary is matched later.Then, text is switched into two-dimensional matrix.First by each of dictionary word with one A vector indicates.Random initializtion is used when initialization vector, can constantly update term vector with training later.For each Behavior in text is converted to corresponding id sequence according to dictionary by sample.Further according to each id in this id sequence and dictionary Vector convert the samples into two-dimensional matrix.Finally, using CNN training sample, for the sample matrix of input, respectively with multiple Convolution kernel carries out convolution, uses max-pooling after convolution, then to each convolution results, takes the maximum in column vector Value.The corresponding maximum value of all convolution kernel results is linked together and constitutes full articulamentum.Finally carry out classify with softmax more Processing.3) sample to be tested input conventional software model is tested, is determined by icon disaggregated model by malicious detection module The classification of the software determines the classification of the software by software classification model, if the two determine it is consistent, illustrate icon performance with it is soft Part behavior is with uniformity, then is normal software;It otherwise is Malware.During conventional software library test, the present invention is another In a embodiment, if occurring that the classification of sample to be tested icon and software action classification information can not be obtained, the sample to be tested is determined Newly to classify, which is added in software program routine information library.
Based on above-mentioned method, the malice based on icon representation and software action consistency analysis that the present invention also provides a kind of Procedure identification device, it is shown in Figure 3, include: collection module 101, test module 102 and determination module 103, wherein
Collection module 101 extracts known normal software icon resource data for collecting known normal software data of classifying With importing Table A PI data, CNN deep learning model is constructed, icon and importing Table A PI information are trained respectively, and foundation Icon classification and software action classification information, obtain conventional software library;
Test module 102, for carrying out structure elucidation to sample to be tested, extracting icon resource data and importing Table A PI letter Number data, are transmitted in trained conventional software library and are tested, and obtain the classification of sample to be tested icon and software action classification Information;
Determination module 103, for the row according to test result to sample to be tested icon classification and software action classification information Determined for consistency, if unanimously, being determined as normal software, if inconsistent, be determined as Malware, and generates malice Programmable detection report output.
Shown in Figure 4, sample information, which is extracted, carries out PE structure elucidation to the sample to be tested of input, extracts its icon money Api function information in source information and importing table, the information input as malicious detection in next step.Conventional software model refers to logical The executable program for collecting a large amount of normal software is crossed, its icon information is extracted and imports Table A PI information, constructs CNN depth Learning model respectively carries out icon and importing Table A PI information, icon disaggregated model and software classification model is established, as evil The information reference of meaning property detection.Sample to be tested input conventional software model is tested, is classified by icon by malicious detection Model determines the classification of the software, and the classification of the software is determined by software classification model, if the two determines unanimously, to illustrate icon Performance is with uniformity with software action, then is normal software;It otherwise is Malware.If icon disaggregated model or software classification Model can not determine, i.e., can not detect malicious, determine that the input sample is new classification, and routine information is added in the sample Library.Solve that traditional malicious code detecting method low efficiency, cost are high, and the malicious code detecting method based on icon analysis is deposited For be not used with normal software similar diagram target rogue program not can be carried out effective detection the problems such as, in conjunction with routine information Library and real-time update realize automatic, batch rogue program and quickly detect, effectively identify in the case where extensive sample The malicious program code pretended in network by the similar icon of software out, it is easy to accomplish, it is high-efficient, guarantee network user's letter Assets security is ceased, is had important practical significance for network security detection.
Unless specifically stated otherwise, the opposite step of the component and step that otherwise illustrate in these embodiments, digital table It is not limit the scope of the invention up to formula and numerical value.
Based on above-mentioned method, the embodiment of the present invention also provides a kind of server, comprising: one or more processors;It deposits Storage device, for storing one or more programs, when one or more of programs are executed by one or more of processors, So that one or more of processors realize above-mentioned method.
Based on above-mentioned method, the embodiment of the present invention also provides a kind of computer-readable medium, is stored thereon with computer Program, wherein the program realizes above-mentioned method when being executed by processor.
The technical effect and preceding method embodiment phase of device provided by the embodiment of the present invention, realization principle and generation Together, to briefly describe, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In all examples being illustrated and described herein, any occurrence should be construed as merely illustratively, without It is as limitation, therefore, other examples of exemplary embodiment can have different values.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.
The flow chart and block diagram in the drawings show the system of multiple embodiments according to the present invention, method and computer journeys The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, section or code of table, a part of the module, section or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually base Originally it is performed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that It is the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, can uses and execute rule The dedicated hardware based system of fixed function or movement is realized, or can use the group of specialized hardware and computer instruction It closes to realize.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of logical function partition, there may be another division manner in actual implementation, in another example, multiple units or components can To combine or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or beg for The mutual coupling, direct-coupling or communication connection of opinion can be through some communication interfaces, device or unit it is indirect Coupling or communication connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in the executable non-volatile computer-readable storage medium of a processor.Based on this understanding, of the invention Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words The form of product embodies, which is stored in a storage medium, including some instructions use so that One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the present invention State all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read- Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can be with Store the medium of program code.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. a kind of rogue program recognition methods based on icon representation and software action consistency analysis, characterized by comprising:
A known normal software data of classifying) are collected, known normal software icon resource data are extracted and import Table A PI data, structure CNN deep learning model is built, icon and importing Table A PI information are trained respectively, and according to icon classification and software action Classification information obtains conventional software library;
B structure elucidation) is carried out to sample to be tested, extract icon resource data and imports Table A PI function data, is transmitted to and trains Conventional software library in tested, obtain sample to be tested icon classification and software action classification information;
C) classify according to test result to sample to be tested icon and the behavior congruence of software action classification information determines, if Unanimously, then it is determined as normal software, if inconsistent, is determined as Malware, and generates the output of rogue program examining report.
2. the rogue program recognition methods according to claim 1 based on icon representation and software action consistency analysis, It is characterized in that, CNN deep learning model includes to classify for the icon according to icon resource data acquiring software programs categories Model and according to import the programs categories behavior of Table A PI data acquiring software software action disaggregated model.
3. the rogue program recognition methods according to claim 2 based on icon representation and software action consistency analysis, It is characterized in that, icon disaggregated model is using the convolution mind comprising input layer, convolutional layer, pond layer, full articulamentum and output layer Through network model, wherein convolutional layer and pond layer are arranged alternately, and convolutional layer extracts icon characteristics by convolution operation, pass through pond Change layer and carry out Feature Dimension Reduction, icon classification results are exported by full articulamentum.
4. the rogue program recognition methods according to claim 2 based on icon representation and software action consistency analysis, It is characterized in that, will import Table A PI function data in software action disaggregated model and be saved as text file format, by software action Classification problem is converted into text classification problem.
5. the rogue program recognition methods according to claim 4 based on icon representation and software action consistency analysis, It is characterized in that, obtained each importing table will be extracted during converting text classification problem for software action classification problem Api function data are as sample, and every a line indicates a behavior in the sample, and a line is considered as an entirety, is traversed all Table A PI function data text is imported, duplicate removal is carried out to all behaviors occurred, obtains dictionary;With continuous number to word Each of library word carries out label, obtains the mapping of dynamic behaviour to label id;Text is switched into two-dimensional matrix, it will be in dictionary Each word indicated with a vector, for each sample, the behavior in text is converted into corresponding id according to dictionary Sequence converts the samples into two-dimensional matrix according to the vector of each id in the id sequence and dictionary;Utilize convolutional neural networks Training sample carries out convolution with multiple convolution kernels respectively for the two-dimensional matrix of input, after convolution, then to each volume Product result takes the maximum value in column vector using maximum pond;By the corresponding maximum value connection of all convolution kernel results, constitute complete Articulamentum carries out more classification processings with softmax classifier.
6. the rogue program recognition methods according to claim 5 based on icon representation and software action consistency analysis, It is characterized in that, carrying out label to each of dictionary word with continuous number, the mapping of dynamic behaviour to label id is obtained Cheng Zhong, and unknown dynamic behaviour is added, which matches the unknown behavior not in dictionary for after.
7. the rogue program recognition methods according to claim 1 based on icon representation and software action consistency analysis, It is characterized in that, B) in, during conventional software library test, if occurring that the classification of sample to be tested icon and software action can not be obtained Classification information then determines that the sample to be tested is new classification, which is added in software program routine information library.
8. a kind of rogue program identification device based on icon representation and software action consistency analysis, characterized by comprising: Collection module, test module and determination module, wherein
Collection module extracts known normal software icon resource data and importing for collecting known normal software data of classifying Table A PI data construct CNN deep learning model, are trained respectively to icon and importing Table A PI information, and according to icon point Class and software action classification information, obtain conventional software library;
Test module, for carrying out structure elucidation to sample to be tested, extracting icon resource data and importing Table A PI function data, It is transmitted in trained conventional software library and is tested, obtain the classification of sample to be tested icon and software action classification information;
Determination module, for the behavior congruence according to test result to sample to be tested icon classification and software action classification information Determined, if unanimously, being determined as normal software, if inconsistent, be determined as Malware, and generates rogue program detection Report output.
9. the rogue program identification device according to claim 8 based on icon representation and software action consistency analysis, It is characterized in that, CNN deep learning model includes to classify for the icon according to icon resource data acquiring software programs categories Model and according to import the programs categories behavior of Table A PI data acquiring software software action disaggregated model.
10. the rogue program identification device according to claim 8 based on icon representation and software action consistency analysis, It is characterized in that, also including update module, the classification of sample to be tested icon and software can not be obtained for being directed in conventional software library The sample to be tested for the situation occur is added in software program routine information library by the situation of behavior classification information.
CN201910123265.3A 2019-01-31 2019-02-18 Rogue program recognition methods and device based on icon representation and software action consistency analysis Pending CN109871686A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2019100997272 2019-01-31
CN201910099727 2019-01-31

Publications (1)

Publication Number Publication Date
CN109871686A true CN109871686A (en) 2019-06-11

Family

ID=66918928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910123265.3A Pending CN109871686A (en) 2019-01-31 2019-02-18 Rogue program recognition methods and device based on icon representation and software action consistency analysis

Country Status (1)

Country Link
CN (1) CN109871686A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111290785A (en) * 2020-03-06 2020-06-16 北京百度网讯科技有限公司 Method and device for evaluating deep learning framework system compatibility, electronic equipment and storage medium
CN112257757A (en) * 2020-09-27 2021-01-22 北京锐服信科技有限公司 Malicious sample detection method and system based on deep learning
CN112364309A (en) * 2021-01-13 2021-02-12 北京云真信科技有限公司 Information processing method, electronic device, and computer-readable storage medium
CN112487432A (en) * 2020-12-10 2021-03-12 杭州安恒信息技术股份有限公司 Method, system and equipment for malicious file detection based on icon matching
CN112860932A (en) * 2021-02-19 2021-05-28 电子科技大学 Image retrieval method, device, equipment and storage medium for resisting malicious sample attack
CN113076539A (en) * 2021-04-13 2021-07-06 郑州信息科技职业学院 Big data-based computer security protection system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761481A (en) * 2014-01-23 2014-04-30 北京奇虎科技有限公司 Method and device for automatically processing malicious code sample
CN103853979A (en) * 2010-12-31 2014-06-11 北京奇虎科技有限公司 Program identification method and device based on machine learning
CN103902906A (en) * 2013-12-25 2014-07-02 武汉安天信息技术有限责任公司 Mobile terminal malicious code detecting method and system based on application icon
CN105938485A (en) * 2016-04-14 2016-09-14 北京工业大学 Image description method based on convolution cyclic hybrid model
US20180063169A1 (en) * 2016-09-01 2018-03-01 Cylance Inc. Container file analysis using machine learning model
CN108898015A (en) * 2018-06-26 2018-11-27 暨南大学 Application layer dynamic intruding detection system and detection method based on artificial intelligence
CN109165688A (en) * 2018-08-28 2019-01-08 暨南大学 A kind of Android Malware family classification device construction method and its classification method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103853979A (en) * 2010-12-31 2014-06-11 北京奇虎科技有限公司 Program identification method and device based on machine learning
CN103902906A (en) * 2013-12-25 2014-07-02 武汉安天信息技术有限责任公司 Mobile terminal malicious code detecting method and system based on application icon
CN103761481A (en) * 2014-01-23 2014-04-30 北京奇虎科技有限公司 Method and device for automatically processing malicious code sample
CN105938485A (en) * 2016-04-14 2016-09-14 北京工业大学 Image description method based on convolution cyclic hybrid model
US20180063169A1 (en) * 2016-09-01 2018-03-01 Cylance Inc. Container file analysis using machine learning model
CN108898015A (en) * 2018-06-26 2018-11-27 暨南大学 Application layer dynamic intruding detection system and detection method based on artificial intelligence
CN109165688A (en) * 2018-08-28 2019-01-08 暨南大学 A kind of Android Malware family classification device construction method and its classification method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
卓新建 等: "《计算机病毒原理与防治》", 30 April 2004, 北京邮电大学出版社 *
孟曦: "基于深度学习的恶意代码分类与聚类技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李德毅: "《人工智能导论》", 31 August 2018 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111290785A (en) * 2020-03-06 2020-06-16 北京百度网讯科技有限公司 Method and device for evaluating deep learning framework system compatibility, electronic equipment and storage medium
CN112257757A (en) * 2020-09-27 2021-01-22 北京锐服信科技有限公司 Malicious sample detection method and system based on deep learning
CN112487432A (en) * 2020-12-10 2021-03-12 杭州安恒信息技术股份有限公司 Method, system and equipment for malicious file detection based on icon matching
CN112364309A (en) * 2021-01-13 2021-02-12 北京云真信科技有限公司 Information processing method, electronic device, and computer-readable storage medium
CN112860932A (en) * 2021-02-19 2021-05-28 电子科技大学 Image retrieval method, device, equipment and storage medium for resisting malicious sample attack
CN113076539A (en) * 2021-04-13 2021-07-06 郑州信息科技职业学院 Big data-based computer security protection system

Similar Documents

Publication Publication Date Title
CN109871686A (en) Rogue program recognition methods and device based on icon representation and software action consistency analysis
Warnecke et al. Evaluating explanation methods for deep learning in security
Kolosnjaji et al. Empowering convolutional networks for malware classification and analysis
CN106709345B (en) Method, system and equipment for deducing malicious code rules based on deep learning method
CN109784056B (en) Malicious software detection method based on deep learning
CN110020422B (en) Feature word determining method and device and server
CN110837550A (en) Knowledge graph-based question and answer method and device, electronic equipment and storage medium
Xiao et al. Image-based malware classification using section distribution information
CN108491228A (en) A kind of binary vulnerability Code Clones detection method and system
CN109446328A (en) A kind of text recognition method, device and its storage medium
CN111866004B (en) Security assessment method, apparatus, computer system, and medium
CN109344258A (en) A kind of intelligent self-adaptive sensitive data identifying system and method
CN109063478A (en) Method for detecting virus, device, equipment and the medium of transplantable executable file
CN116361801B (en) Malicious software detection method and system based on semantic information of application program interface
CN109829302A (en) Android malicious application family classification method, apparatus and electronic equipment
Chen et al. Applying convolutional neural network for malware detection
CN103530312A (en) User identification method and system using multifaceted footprints
Zhang et al. Malicious code detection based on code semantic features
CN114386511B (en) Malicious software family classification method based on multidimensional feature fusion and model integration
CN111400713A (en) Malicious software family classification method based on operation code adjacency graph characteristics
Fonseca et al. Model-agnostic approaches to handling noisy labels when training sound event classifiers
CN116663019B (en) Source code vulnerability detection method, device and system
CN108985052A (en) A kind of rogue program recognition methods, device and storage medium
EP4227855A1 (en) Graph explainable artificial intelligence correlation
CN114817925B (en) Android malicious software detection method and system based on multi-modal graph features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190611