CN103902911A - Rogue program detection method based on program structural features - Google Patents

Rogue program detection method based on program structural features Download PDF

Info

Publication number
CN103902911A
CN103902911A CN201410152717.8A CN201410152717A CN103902911A CN 103902911 A CN103902911 A CN 103902911A CN 201410152717 A CN201410152717 A CN 201410152717A CN 103902911 A CN103902911 A CN 103902911A
Authority
CN
China
Prior art keywords
program
goes
function
feature
family
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410152717.8A
Other languages
Chinese (zh)
Other versions
CN103902911B (en
Inventor
曾庆凯
魏向宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201410152717.8A priority Critical patent/CN103902911B/en
Publication of CN103902911A publication Critical patent/CN103902911A/en
Application granted granted Critical
Publication of CN103902911B publication Critical patent/CN103902911B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A rogue program detection method based on program structural features comprises the steps of (1) building a rogue program feature library, (2) detecting deformation variations of rogue programs, (3) extracting function structural information, (4) extracting the number of function parameters, (5) extracting features of the rogue programs, and (6) extracting family features of rogue program families, wherein when the rogue program feature library is built, firstly, program structural analysis and extraction are carried out on each sample program in each rogue program family for training to obtain the program features of the sample programs, and then the extracted program features of the sample programs in the rogue program families are fused; when the deformation variations of the rogue programs are detected, firstly, program structural analysis is carried out on programs to be detected so as to obtain the program features of the programs to be detected, then similarity matching is carried out on the program features of the programs to be detected and the family features of the rogue program families in the rogue program feature library, and whether the programs to be detected are the deformation variations of the known rogue programs or not is judged.

Description

A kind of malware detection methods based on program structure feature
Technical field
The present invention relates to identification and the detection of rogue program, particularly a kind of method of identifying and detecting based on program structure feature, to rogue program distortion mutation.
Background technology
The normal work of rogue program EVAC (Evacuation Network Computer Model), it is the effective measures of protection system safety that rogue program detects.Rogue program can adopt various obfuscations to escape and detect.Obfuscation has increased the difficulty that rogue program detects, and increases the weight of rogue program analyst's work.Deformation technology is the normal obfuscation using, by methods such as register renaming, independent instructions location swap, rubbish instruction insertion, equivalent instruction exchange and instruction rearrangements, change syntactic structure, the generation distortion mutation of rogue program code, the work of Interference Detection software.For this reason, need not to be subject in extraction procedure performance of program that deformation technology disturbs and identify the distortion mutation of known malicious program, realize identification and detection to unknown program, alleviate rogue program analyst's work.
The invention provides a kind of method of carrying out detection of malicious program distortion mutation based on program structure feature.The architectural feature of program has relative stability, can tackle obscuring of most of deformation technologies.Meanwhile, performance of program coupling is used simple character string matching method, replaces traditional isomorphism of graph matching process based on program structure diagram, has improved the efficiency of rogue program testing.
Summary of the invention
For realizing the object to rogue program distortion mutation identification, the present invention seeks to, provide a kind of based on program structure feature, implement the method that static analysis detects, identifies rogue program, thereby improve rogue program detection efficiency, alleviate rogue program analyst's working strength.
The invention provides the malware detection methods based on program structure feature, mainly comprise the method such as the foundation of rogue program feature database and the detection of rogue program distortion mutation.
Technical scheme of the present invention is: a kind of malware detection methods based on program structure feature, is characterized in that 1) foundation of rogue program feature database; 2) detection of rogue program distortion mutation; 3) extraction of function structure information; 4) extraction of function parameter number; 5) extraction of rogue program feature; 6) family's feature extraction of rogue program family.
In the time setting up rogue program feature database, first each sample program of the each rogue program family for training is carried out to program structure analysis and extraction, obtain its performance of program; Then the performance of program of each sample program in extracted rogue program family is merged, obtain family's feature of this rogue program family, and deposit rogue program feature database in this; Program structure analysis is to complete based on IDA inverse assembler.Utilize inverse assembler IDA to treat routine analyzer and carry out dis-assembling, obtain its dis-assembling code, and then find program entry function; According to program entry function, extract the structural information of this program; Program structure information comprises beginning and the end of the beginning of the beginning of function body and end, fundamental block and end, loop body; The program structure information that program structure information extraction process obtains leaves test_feature.lib file in, is used for extracting the performance of program of program to be detected for performance of program leaching process;
In the time that rogue program distortion mutation detects, first treat trace routine and carry out program structure analysis, obtain its performance of program; Then the performance of program of program to be detected is carried out to similarity with family's feature of the rogue program family in rogue program feature database respectively and mate, determine whether the distortion mutation of known malicious program;
Core content of the present invention mainly contains 6 points: the foundation of (1) rogue program feature database.(2) detection of rogue program distortion mutation.(3) extraction of function structure information.(4) extraction of function parameter number.(5) extraction of performance of program.(6) family's feature extraction of rogue program family.
The invention has the beneficial effects as follows, provide a kind of based on program structure feature, implement the method that static analysis detects, identifies rogue program, thereby improve rogue program detection efficiency, alleviate rogue program analyst's working strength.Realize the object to rogue program distortion mutation identification.
Accompanying drawing explanation
Below in conjunction with accompanying drawing, the present invention is described in detail.
The rogue program testing process schematic diagram of Fig. 1 based on program structure feature;
Fig. 2 rogue program feature database Establishing process figure;
Fig. 3 program structure information extraction process flow diagram;
Fig. 4 function parameter number is extracted process flow diagram;
Fig. 5 performance of program extracts process flow diagram;
Fig. 6 function parameter number corrigendum process flow diagram;
Fig. 7 redundancy structure information symbol is removed process flow diagram;
Family's feature extraction process flow diagram of Fig. 8 rogue program family;
Fig. 9 longest common subsequence obtains process flow diagram;
Figure 10 rogue program distortion mutation overhaul flow chart.
Embodiment
Figure 1 shows that one-piece construction and workflow that this method is implemented.In the time setting up rogue program feature database, first each sample program of the each rogue program family for training is carried out to program structure analysis, obtain its performance of program; Then the performance of program of each sample program in extracted rogue program family is merged, obtain family's feature of this rogue program family, and deposit rogue program feature database in this, reference during for detection.In the time that rogue program distortion mutation detects, first treat trace routine and carry out program structure analysis, obtain its performance of program; Then the performance of program of program to be detected is carried out to similarity with family's feature of the rogue program family in rogue program feature database respectively and mate, determine whether the distortion mutation of known malicious program.
Program structure analysis is to complete based on IDA inverse assembler.Utilize inverse assembler IDA to treat routine analyzer and carry out dis-assembling, obtain its dis-assembling code, and then find program entry function.According to program entry function, extract the structural information of this program.Program structure information comprises beginning and the end of the beginning of the beginning of function body and end, fundamental block and end, loop body.In addition, for function call instruction, extract the number of parameters of this function, and recurrence is extracted the structural information of invoked local function first, the architectural feature using this as program.Program structure character representation is for describing the structural information of program structure and the sequence that function parameter number forms.Beginning LOOP_START and the end LOOP_END of the beginning BLOCK_START of the beginning FUN_START of the structural information symbol difference representative function body of program and end FUN_END, fundamental block and end BLOCK_END, loop body, difference value is " { ", " } ", " (", ") ", " [", "] ".
Rogue program feature database is kept at malware_feature.lib file, and this file generates and uses at rogue program detection-phase at rogue program feature database establishment stage, and each family feature saves as a line of file.
Fig. 2 represents the flow process of rogue program feature database process of establishing.Classify by rogue program family for the known malicious program sample of training, leave different files in.In rogue program feature database process of establishing, progressively obtain the rogue program sample in each rogue program family place file, extract its program structure information, and then generator program feature; Then the performance of program of all sample programs in this rogue program family is merged, extract family's feature of rogue program family, join in rogue program feature database.The program structure information that program structure information extraction process obtains leaves sample_feature.lib file in, uses for performance of program leaching process.The performance of program of all sample programs of rogue program family is kept at family_feature.lib file, uses for family's characteristic extraction procedure of rogue program family, and each performance of program saves as a line of this file.Idiographic flow is as follows.Step 20 is initial actuatings.Step 21 is obtained known malicious family program file.Step 22 judges whether to obtain, if obtain, goes to step 23, otherwise goes to step 2c.Step 23 is obtained a rogue program sample, and program name is designated as sample-prg.Step 24 judges whether to obtain, if obtain, goes to step 25, otherwise goes to step 2a.Step 25 is utilized built-in function ShellExecute () to call IDA rogue program sample sample-prg to be carried out to dis-assembling, obtain its dis-assembling code.Step 26 is called the program entry function of IDC function G etEntryPoint () acquisition dis-assembling code.Step 27 extraction procedure structural information.According to program entry function, extraction procedure structural information, is kept at sample_feature.lib file, and concrete treatment scheme as shown in Figure 3.Step 28 extraction procedure feature.Program structure information in sample_feature.lib file is carried out to the adjustment of structural information symbol, function parameter number corrigendum processing, obtain performance of program, concrete treatment scheme as shown in Figure 5.The performance of program that step 29 obtains extraction is kept in family_feature.lib file as a line, goes to step 23.Step 2a extracts family's feature of rogue program family.All performance of program in family_feature.lib are merged, obtain the rogue program family feature of fusion, concrete treatment scheme as shown in Figure 8.Step 2b deposits rogue program family feature in rogue program feature database, goes to step 21.Step 2c is done state.
The flow process of Fig. 3 representation program structural information leaching process.This process is according to program entry function, and extraction procedure structural information, comprises beginning and the end of function body, fundamental block and loop body, needs the destination address of record order jump instruction and rebound instruction to obtain the end of fundamental block and the beginning of loop body.For function call instruction, extract the number of parameters of this function, and the recurrence extraction structural information of invoked local function first, need to record the function parameter information of call function.Along the set of jumping destination address and rebound destination address and be respectively in program the destination address of order jump instruction and rebound instruction.Function parameter feature is the set of the function parameter information of called function, and function parameter information comprises function name, function call number of times, different parameters number and respective calls number of times thereof.The program structure information of this Procedure Acquisition leaves feature.lib file in, uses for performance of program leaching process.To the local function recurrence of calling first, call function structural information leaching process, also adopts the flow process of program structure information extraction process; Idiographic flow is as follows.Step 30 is initial actuatings.Step 31 obtains the address realm of function body.Step 32 is exported FUN_START to file.Step 33 is obtained the place's instruction of function body first address.Whether step 34 decision instruction address is less than function body tail address, if go to step 35, otherwise goes to step 3C.Step 35 obtains instruction address in the occurrence number along jumping in destination address.Step 36 is exported corresponding several BLOCK_END to file.Whether step 37 decision instruction operational code is call, if not, go to step 38, otherwise go to step 3a.Whether step 38 decision instruction operational code is jmp, if go to step 39, otherwise goes to step 3m.Step 39 judges that whether destination address is the first address of its place function, if go to step 3a, otherwise goes to step 3m.Step 3a extracts function parameter number.Obtain function name fun_name and number of parameters para_num, concrete treatment scheme as shown in Figure 4.Step 3b searches function name fun_name in function parameter feature.Step 3c judges whether to find, if do not find, goes to step 3d, otherwise goes to step 3h.Step 3d adds this function parameter information in function parameter feature to.Step 3e finds sub_ character string in function name fun_name.Step 3f judges whether to find, if find, goes to step 3g, otherwise goes to step 3B.Step 3g extraction procedure structural information, goes to step 3B.Step 3h increases the function call number of times of this function.Step 3i searches this function parameter number para_num in function parameter information.Step 3j judges whether to find, if find, goes to step 3k, otherwise goes to step 3l.Step 3k increases the call number of this number of parameters, goes to step 3B.Step 3l adds relevant parameter number and call number thereof in function parameter information, goes to step 3B.In step 3m decision instruction operational code, whether comprise j character, if comprise, go to step 3o, otherwise go to step 3n.Whether step 3n decision instruction operational code is loop, if so, goes to step 3o, otherwise goes to step 3B.Step 3o judges that destination address whether in function body, if so, goes to step 3p, otherwise goes to step 3B.Step 3p judges whether destination address is greater than instruction address, if so, goes to step 3q, otherwise goes to step 3s.Step 3q deposits destination address along jumping destination address in.BLOCK_START is to file in step 3r output, goes to step 3w.Step 3s order in loop body is found function call instruction.Step 3t judges whether to find, if find, goes to step 3u, otherwise goes to step 3w.Step 3u leaves the address of function call instruction in rebound destination address.LOOP_END is to file in step 3v output.Whether step 3w decision instruction operational code is jmp, if so, goes to step 3x, otherwise goes to step 3B.Step 3x obtains the subsequent instructions of this instruction.Step 3y finds this instruction address in along jumping destination address.Step 3z judges whether to find, if find, goes to step 34, otherwise goes to step 3A.Whether step 3A decision instruction address is less than function body tail address, if so, goes to step 3x, otherwise goes to step 34.Step 3B obtains the subsequent instructions of this instruction, goes to step 34.FUN_END is to file in step 3C output.Step 3D is done state.
The flow process of Fig. 4 representative function number of parameters leaching process.This flow process is obtained function parameter number and function information is outputed to file.Function information comprises function call address, function name and number of parameters.Idiographic flow is as follows.Step 40 is origination action.Step 41 is obtained function name.Step 42 is found '@' character in function name.Step 43 judges whether to find, if find, goes to step 44, otherwise goes to step 48.Step 44 is obtained the character after '@'.Step 45 judges that whether this character is numeral, if so, goes to step 46, otherwise goes to step 48.Step 46 is obtained the numeric string after@symbol.It is numeric string that step 47 makes number of parameters, goes to step 4v.Step 48 is obtained the stack architexture of function.Step 49 size that gets parms from stack space.Step 4a judges whether parameter size is less than the threshold value of setting, and threshold value gets 25, if be less than, goes to step 4v here, otherwise goes to step 4b.Step 4b obtains the function body tail address of called function.Step 4c obtains this address place instruction.Step 4d obtains instruction operation code.Whether step 4e decision instruction operational code is retn, if so, goes to step 4f, otherwise goes to step 4j.Step 4f obtains the operand of instruction.Step 4g judges whether to obtain, if obtain, goes to step 4h, otherwise goes to step 4i.Step 4h makes parameter size for instruction operands, goes to step 4v.It is 0 that step 4i makes parameter size, goes to step 4v.Step 4j obtains the subsequent instructions of this instruction.Step 4k obtains instruction operation code.Whether step 4l decision instruction operational code is add, if so, goes to step 4m, otherwise goes to step 4q.Step 4m obtains target operand.Step 4n judges that whether destination operand is esp, if so, goes to step 4o, otherwise goes to step 4q.Step 4o obtains the source operand of instruction.Step 4p makes parameter size for source operation, goes to step 4v.Step 4q initiation parameter number is 0.Step 4r obtains a upper instruction of instruction.Step 4s obtains instruction operation code.Whether step 4t decision instruction operational code is push, if so, goes to step 4u, otherwise goes to step 4w.Step 4u increases number of parameters, goes to step 4r.It is that parameter size is except 4 that step 4v makes number of parameters.Step 4w output function information is to file.Step 4x is done state.
Fig. 5 is the flow process of performance of program leaching process.The feature.lib file that program structure information extraction process is obtained carries out the adjustment of structural information symbol and number of parameters corrigendum, obtains performance of program.Idiographic flow is as follows.Step 50 is origination action.Step 51 is corrected function parameter number.The function parameter feature that program structure information extraction process is obtained is corrected, and obtains and has the not final argument number of the function of single parameter number, and concrete treatment scheme as shown in Figure 6.Step 52 is removed redundancy structure information.File Program structural information is processed, removed the structural information symbol of redundancy, concrete treatment scheme as shown in Figure 7.Step 53 is obtained function information from file.Step 54 judges whether to obtain, if obtain, goes to step 55, otherwise goes to step 5c.Step 55 obtains the number of times that function call address occurs in rebound destination address.Step 56 judges that occurrence number is greater than 0 and whether sets up, if set up, goes to step 57, otherwise goes to step 58.Step 57 is added the LOOP_START of respective numbers before this function information.Step 58 is found this function in function parameter feature.Step 59 judges whether to find, if find, goes to step 5a, otherwise goes to step 5b.In step 5a revised file, the number of parameters of this function information is final argument number.Step 5b removes function call address and the function name of this function information in file, only retains function parameter number, goes to step 53.Step 5c is done state.
Fig. 6 is the flow process of function parameter number corrigendum process.Process function parameter feature, remove the function parameter information with single parameter number, adjust final call number.Idiographic flow is as follows.Step 60 is initialization action.Step 61 is obtained function parameter information.Step 62 judges whether to obtain, if obtain, goes to step 63, otherwise goes to step 6g.Step 63 obtains number of parameters.Step 64 is obtained corresponding call number.Step 65 judges whether call number equals function call number of times, if so, goes to step 66, otherwise goes to step 67.Step 66 is deleted this function parameter information, goes to step 61.Step 67 initialization final argument number is this number of parameters.The final call number of step 68 initialization is this call number.Step 69 number that gets parms.Step 6a obtains respective calls number of times.Step 6b judges whether to get, if get, goes to step 6c, otherwise goes to step 6f.Step 6c judges whether call number is greater than final call number, if so, goes to step 6d, otherwise goes to step 69.It is this number of parameters that step 6d makes final argument number.It is this call number that step 6e makes final call number, goes to step 69.Step 6f Modification growth function parameter information, makes wherein an inclusion function name and final argument number, goes to step 61.Step 6g is done state.
Fig. 7 is the flow process that redundancy structure information symbol is removed process.Step 70 is initialization action.Step 71 is obtained the structural identifier string in file.Step 72 judges whether to obtain, if obtain, goes to step 73, otherwise goes to step 7o.Step 73 is obtained FUN_START in this structural identifier string.Step 74 judges whether to obtain, if obtain, goes to step 75, otherwise goes to step 78.Step 75 is obtained FUN_END in structural identifier string.Step 76 judges whether to obtain, if obtain, goes to step 77, otherwise goes to step 78.Step 77 from structural identifier string, remove FUN_START, FUN_END and between structural identifier.Step 78 is obtained FUN_END in structural identifier string.Step 79 judges whether to obtain, if obtain, goes to step 7a, otherwise goes to step 7h.Step 7a intercepts FUN_END and front sub-identifier string thereof.Step 7b obtains BLOCK_START in sub-identifier string.Step 7c judges whether to get, if get, goes to step 7d, otherwise goes to step 7g.Step 7d obtains the BLOCK_END after BLOCK_START.Step 7e judges whether to obtain, if obtain, goes to step 7f, otherwise goes to step 7g.Step 7f removes BLOCK_START and BLOCK_END from sub-identifier string, goes to step 7b.The sub-identifier string of step 7g output residue, goes to step 78.Step 7h obtains the sub-identifier string of residue.Step 7i obtains BLOCK_START in the sub-identifier string of residue.Step 7j judges whether to obtain, if obtain, goes to step 7k, otherwise goes to step 7n.Step 7k obtains the BLOCK_END after BLOCK_START.Step 7l judges whether to obtain, if obtain, goes to step 7m, otherwise goes to step 7n.Step 7m removes BLOCK_START and the BLOCK_END in sub-identifier string, goes to step 7i.Step 7n exports sub-identifier string, goes to step 71.Step 7o is done state.
Fig. 8 is the flow process of family's characteristic extraction procedure of rogue program family.The all performance of program that merge in family_features.lib file obtain rogue program family feature.Idiographic flow is as follows.Step 80 is initialization action.Step 81 is obtained performance of program, obtains a line in file.Rogue program family feature is initialized as this performance of program by step 82.Step 83 is obtained performance of program from file.Step 84 judges whether to obtain, if obtain, goes to step 85, otherwise goes to step 87; Step 85 is obtained longest common subsequence.Ask the longest common subsequence of family's feature and performance of program, concrete treatment scheme as shown in Figure 9.Step 86 makes family be characterized as longest common subsequence, goes to step 83.Step 87 is done state.
Fig. 9 is the flow process of longest common subsequence acquisition process.Step 90 is initialization action.Step 91 deposits family's feature in character string S.Step 92 deposits performance of program in character string T.Step 93 judges whether S or T are null character string, if so, go to step 94, otherwise go to step 95.It is null character string that step 94 obtains longest common subsequence, goes to step 9f.Character string S is divided into character string S0 and trailing character by step 95.Character string T is divided into character string T0 and trailing character by step 96.Step 97 judges whether S trailing character equals T trailing character, if not etc., go to step 98, otherwise go to step 9d.Step 98 recurrence is obtained the longest common subsequence ST0 of character string S and T0.Step 99 recurrence is obtained the longest common subsequence TS0 of character string S0 and T.Step 9a judges that whether ST0 is longer than TS0, if so, goes to step 9b, otherwise goes to step 9c.Step 9b, using character string ST0 as longest common subsequence, goes to step 9f.Step 9c, using character string TS0 as longest common subsequence, goes to step 9f.Step 9d recurrence obtains the longest common subsequence of character string S0 and T0.Step 9e adds trailing character to form thereafter longest common subsequence to.Step 9f is done state.
Figure 10 represents the flow process of rogue program distortion mutation testing process.The program structure information that program structure information extraction process obtains leaves test_feature.lib file in, is used for extracting the performance of program of program to be detected for performance of program leaching process.Idiographic flow is as follows.Step a0 is initialization action.Step a1 utilizes built-in function ShellExecute () to call IDA and treats trace routine (program name is designated as test-prg) and carry out dis-assembling, obtains its dis-assembling code.Step a2 calls the program entry function of IDC function G etEntryPoint () acquisition dis-assembling code.Step a3 extracts the program structure information of program to be detected.According to program entry function, extraction procedure structural information, is kept at test_feature.lib file, and concrete treatment scheme as shown in Figure 3.Step a4 extracts the performance of program of program to be detected.Program structure information in test_feature.lib file is carried out to the adjustment of structural information symbol, function parameter number corrigendum processing, extract the performance of program of program to be detected, concrete treatment scheme as shown in Figure 5.Step a5 treats the performance of program of trace routine, and Using Call Library Function strlen () obtains the length of performance of program.Step a6 obtains family's feature from rogue program feature database.Step a7 judges whether to obtain, if obtain, goes to step a8, otherwise goes to step ag.Step a8 is to family's feature, and Using Call Library Function strlen () obtains the length of family's feature.Step a9 obtains longest common subsequence.Obtain the performance of program of program to be detected and the longest common subsequence of family's feature, idiographic flow as shown in Figure 9.Step aa Using Call Library Function strlen () obtains the length of longest common subsequence.Whether step ab determining program characteristic length is greater than family's characteristic length, if so, goes to step ac, otherwise goes to step ad.Step ac arrange maximum similarity be longest common subsequence length divided by family's characteristic length, go to step ae.It is that longest common subsequence length is divided by performance of program length that step ad arranges maximum similarity.Step ae judges that whether maximum similarity is less than threshold value (threshold value value is 0.82) here, if so, goes to step a6, otherwise goes to step af.Step af judges that program to be detected, as rogue program, goes to step ah.Step ag judges that program to be detected is not rogue program, goes to step ah.Step ah is done state.

Claims (10)

1. the malware detection methods based on program structure feature, is characterized in that step is as follows: the 1) foundation of rogue program feature database; 2) detection of rogue program distortion mutation; 3) extraction of function structure information; 4) extraction of function parameter number; 5) extraction of rogue program feature; 6) family's feature extraction of rogue program family;
In the time setting up rogue program feature database, first each sample program of the each rogue program family for training is carried out to program structure analysis and extraction, obtain its performance of program; Then the performance of program of each sample program in extracted rogue program family is merged, obtain family's feature of this rogue program family, and deposit rogue program feature database in this; Program structure analysis is to complete based on IDA inverse assembler; Utilize inverse assembler IDA to treat routine analyzer and carry out dis-assembling, obtain its dis-assembling code, and then find program entry function; According to program entry function, extract the structural information of this program; Program structure information comprises beginning and the end of the beginning of the beginning of function body and end, fundamental block and end, loop body; The program structure information that program structure information extraction process obtains leaves test_feature.lib file in, is used for extracting the performance of program of program to be detected for performance of program leaching process;
In the time that rogue program distortion mutation detects, first treat trace routine and carry out program structure analysis, obtain its performance of program; Then the performance of program of program to be detected is carried out to similarity with family's feature of the rogue program family in rogue program feature database respectively and mate, determine whether the distortion mutation of known malicious program;
Rogue program feature database is kept at malware_feature.lib file, and this file generates and uses at rogue program detection-phase at rogue program feature database establishment stage, and each family feature saves as a line of file.
2. the malware detection methods based on program structure feature according to claim 1, is characterized in that the flow process of rogue program feature database process of establishing: by the classification of rogue program family, leave different files for the known malicious program sample of training in; In rogue program feature database process of establishing, progressively obtain the rogue program sample in each rogue program family place file, extract its program structure information, and then generator program feature; Then the performance of program of all sample programs in this rogue program family is merged, extract family's feature of rogue program family, join in rogue program feature database; The program structure information that program structure information extraction process obtains leaves sample_feature.lib file in, uses for performance of program leaching process; The performance of program of all sample programs of rogue program family is kept at family_feature.lib file, uses for family's characteristic extraction procedure of rogue program family, and each performance of program saves as a line of this file;
Idiographic flow is as follows: step 20 is initial actuatings; Step 21 is obtained known malicious family program file; Step 22 judges whether to obtain, if obtain, goes to step 23, otherwise goes to step 2c; Step 23 is obtained a rogue program sample, and program name is designated as sample-prg; Step 24 judges whether to obtain, if obtain, goes to step 25, otherwise goes to step 2a; Step 25 is utilized built-in function ShellExecute () to call IDA rogue program sample sample-prg to be carried out to dis-assembling, obtain its dis-assembling code; Step 26 is called the program entry function of IDC function G etEntryPoint () acquisition dis-assembling code; Step 27 extraction procedure structural information; According to program entry function, extraction procedure structural information, is kept at sample_feature.lib file; Step 28 extraction procedure feature; Program structure information in sample_feature.lib file is carried out to the adjustment of structural information symbol, function parameter number corrigendum processing, obtain performance of program; The performance of program that step 29 obtains extraction is kept in family_feature.lib file as a line, goes to step 23; Step 2a extracts family's feature of rogue program family; All performance of program in family_feature.lib are merged, obtain the rogue program family feature merging; Step 2b deposits rogue program family feature in rogue program feature database, goes to step 21; Step 2c is done state.
3. the malware detection methods based on program structure feature according to claim 1, it is characterized in that the flow process of program structure information extraction process: extraction procedure structural information, comprise beginning and the end of function body, fundamental block and loop body, need the destination address of record order jump instruction and rebound instruction to obtain the end of fundamental block and the beginning of loop body; For function call instruction, extract the number of parameters of this function, and the recurrence extraction structural information of invoked local function first, need to record the function parameter information of call function; Along the set of jumping destination address and rebound destination address and be respectively in program the destination address of order jump instruction and rebound instruction; Function parameter feature is the set of the function parameter information of called function, and function parameter information comprises function name, function call number of times, different parameters number and respective calls number of times thereof; The program structure information of this Procedure Acquisition leaves feature.lib file in, uses for performance of program leaching process; To the local function recurrence of calling first, call function structural information leaching process, also adopts the flow process of program structure information extraction process;
Idiographic flow is as follows: step 30 is initial actuatings; Step 31 obtains the address realm of function body; Step 32 is exported FUN_START to file; Step 33 is obtained the place's instruction of function body first address; Whether step 34 decision instruction address is less than function body tail address, if go to step 35, otherwise goes to step 3C; Step 35 obtains instruction address in the occurrence number along jumping in destination address; Step 36 is exported corresponding several BLOCK_END to file; Whether step 37 decision instruction operational code is call, if not, go to step 38, otherwise go to step 3a; Whether step 38 decision instruction operational code is jmp, if go to step 39, otherwise goes to step 3m; Step 39 judges that whether destination address is the first address of its place function, if go to step 3a, otherwise goes to step 3m; Step 3a extracts function parameter number; Obtain function name fun_name and number of parameters para_num, concrete treatment scheme as shown in Figure 4; Step 3b searches function name fun_name in function parameter feature; Step 3c judges whether to find, if do not find, goes to step 3d, otherwise goes to step 3h; Step 3d adds this function parameter information in function parameter feature to; Step 3e finds sub_ character string in function name fun_name; Step 3f judges whether to find, if find, goes to step 3g, otherwise goes to step 3B; Step 3g extraction procedure structural information, goes to step 3B; To the local function recursive call function structure information extraction process calling first, concrete treatment scheme as shown in Figure 3; Step 3h increases the function call number of times of this function; Step 3i searches this function parameter number para_num in function parameter information; Step 3j judges whether to find, if find, goes to step 3k, otherwise goes to step 3l; Step 3k increases the call number of this number of parameters, goes to step 3B; Step 3l adds relevant parameter number and call number thereof in function parameter information, goes to step 3B; In step 3m decision instruction operational code, whether comprise j character, if comprise, go to step 3o, otherwise go to step 3n; Whether step 3n decision instruction operational code is loop, if so, goes to step 3o, otherwise goes to step 3B; Step 3o judges that destination address whether in function body, if so, goes to step 3p, otherwise goes to step 3B; Step 3p judges whether destination address is greater than instruction address, if so, goes to step 3q, otherwise goes to step 3s; Step 3q deposits destination address along jumping destination address in; BLOCK_START is to file in step 3r output, goes to step 3w; Step 3s order in loop body is found function call instruction; Step 3t judges whether to find, if find, goes to step 3u, otherwise goes to step 3w; Step 3u leaves the address of function call instruction in rebound destination address; LOOP_END is to file in step 3v output; Whether step 3w decision instruction operational code is jmp, if so, goes to step 3x, otherwise goes to step 3B; Step 3x obtains the subsequent instructions of this instruction; Step 3y finds this instruction address in along jumping destination address; Step 3z judges whether to find, if find, goes to step 34, otherwise goes to step 3A; Whether step 3A decision instruction address is less than function body tail address, if so, goes to step 3x, otherwise goes to step 34; Step 3B obtains the subsequent instructions of this instruction, goes to step 34; FUN_END is to file in step 3C output; Step 3D is done state.
4. the malware detection methods based on program structure feature according to claim 1, is characterized in that the flow process of function parameter number leaching process is, this flow process is obtained function parameter number and function information is outputed to file; Function information comprises function call address, function name and number of parameters;
Idiographic flow is as follows: step 40 is origination action; Step 41 is obtained function name; Step 42 is found '@' character in function name; Step 43 judges whether to find, if find, goes to step 44, otherwise goes to step 48; Step 44 is obtained the character after '@'; Step 45 judges that whether this character is numeral, if so, goes to step 46, otherwise goes to step 48; Step 46 is obtained the numeric string after@symbol; It is numeric string that step 47 makes number of parameters, goes to step 4v; Step 48 is obtained the stack architexture of function; Step 49 size that gets parms from stack space; Step 4a judges whether parameter size is less than the threshold value of setting, and threshold value gets 25, if be less than, goes to step 4v here, otherwise goes to step 4b; Step 4b obtains the function body tail address of called function; Step 4c obtains this address place instruction; Step 4d obtains instruction operation code; Whether step 4e decision instruction operational code is retn, if so, goes to step 4f, otherwise goes to step 4j; Step 4f obtains the operand of instruction; Step 4g judges whether to obtain, if obtain, goes to step 4h, otherwise goes to step 4i; Step 4h makes parameter size for instruction operands, goes to step 4v; It is 0 that step 4i makes parameter size, goes to step 4v; Step 4j obtains the subsequent instructions of this instruction; Step 4k obtains instruction operation code; Whether step 4l decision instruction operational code is add, if so, goes to step 4m, otherwise goes to step 4q; Step 4m obtains target operand; Step 4n judges that whether destination operand is esp, if so, goes to step 4o, otherwise goes to step 4q; Step 4o obtains the source operand of instruction; Step 4p makes parameter size for source operation, goes to step 4v; Step 4q initiation parameter number is 0; Step 4r obtains a upper instruction of instruction; Step 4s obtains instruction operation code; Whether step 4t decision instruction operational code is push, if so, goes to step 4u, otherwise goes to step 4w; Step 4u increases number of parameters, goes to step 4r; It is that parameter size is except 4 that step 4v makes number of parameters; Step 4w output function information is to file; Step 4x is done state.
5. the malware detection methods based on program structure feature according to claim 1, is characterized in that the flow process of performance of program leaching process; The feature.lib file that program structure information extraction process is obtained carries out the adjustment of structural information symbol and number of parameters corrigendum, obtains performance of program; Idiographic flow is as follows; Step 50 is origination action; Step 51 is corrected function parameter number; The function parameter feature that program structure information extraction process is obtained is corrected, and obtains and has the not final argument number of the function of single parameter number, and concrete treatment scheme as shown in Figure 6; Step 52 is removed redundancy structure information; File Program structural information is processed, removed the structural information symbol of redundancy, concrete treatment scheme as shown in Figure 7; Step 53 is obtained function information from file; Step 54 judges whether to obtain, if obtain, goes to step 55, otherwise goes to step 5c; Step 55 obtains the number of times that function call address occurs in rebound destination address; Step 56 judges that occurrence number is greater than 0 and whether sets up, if set up, goes to step 57, otherwise goes to step 58; Step 57 is added the LOOP_START of respective numbers before this function information; Step 58 is found this function in function parameter feature; Step 59 judges whether to find, if find, goes to step 5a, otherwise goes to step 5b; In step 5a revised file, the number of parameters of this function information is final argument number; Step 5b removes function call address and the function name of this function information in file, only retains function parameter number, goes to step 53; Step 5c is done state.
6. the malware detection methods based on program structure feature according to claim 1, is characterized in that the extraction flow process of function parameter number: process function parameter feature, remove the function parameter information with single parameter number, adjust final call number;
Step 60 is initialization action; Step 61 is obtained function parameter information; Step 62 judges whether to obtain, if obtain, goes to step 63, otherwise goes to step 6g; Step 63 obtains number of parameters; Step 64 is obtained corresponding call number; Step 65 judges whether call number equals function call number of times, if so, goes to step 66, otherwise goes to step 67; Step 66 is deleted this function parameter information, goes to step 61; Step 67 initialization final argument number is this number of parameters; The final call number of step 68 initialization is this call number; Step 69 number that gets parms; Step 6a obtains respective calls number of times; Step 6b judges whether to get, if get, goes to step 6c, otherwise goes to step 6f; Step 6c judges whether call number is greater than final call number, if so, goes to step 6d, otherwise goes to step 69; It is this number of parameters that step 6d makes final argument number; It is this call number that step 6e makes final call number, goes to step 69; Step 6f Modification growth function parameter information, makes wherein an inclusion function name and final argument number, goes to step 61; Step 6g is done state.
7. the malware detection methods based on program structure feature according to claim 5, is characterized in that described redundancy structure information symbol removes the flow process of process: step 70 is initialization action; Step 71 is obtained the structural identifier string in file; Step 72 judges whether to obtain, if obtain, goes to step 73, otherwise goes to step 7o; Step 73 is obtained FUN_START in this structural identifier string; Step 74 judges whether to obtain, if obtain, goes to step 75, otherwise goes to step 78; Step 75 is obtained FUN_END in structural identifier string; Step 76 judges whether to obtain, if obtain, goes to step 77, otherwise goes to step 78; Step 77 from structural identifier string, remove FUN_START, FUN_END and between structural identifier; Step 78 is obtained FUN_END in structural identifier string; Step 79 judges whether to obtain, if obtain, goes to step 7a, otherwise goes to step 7h; Step 7a intercepts FUN_END and front sub-identifier string thereof; Step 7b obtains BLOCK_START in sub-identifier string; Step 7c judges whether to get, if get, goes to step 7d, otherwise goes to step 7g; Step 7d obtains the BLOCK_END after BLOCK_START; Step 7e judges whether to obtain, if obtain, goes to step 7f, otherwise goes to step 7g; Step 7f removes BLOCK_START and BLOCK_END from sub-identifier string, goes to step 7b; The sub-identifier string of step 7g output residue, goes to step 78; Step 7h obtains the sub-identifier string of residue; Step 7i obtains BLOCK_START in the sub-identifier string of residue; Step 7j judges whether to obtain, if obtain, goes to step 7k, otherwise goes to step 7n; Step 7k obtains the BLOCK_END after BLOCK_START; Step 7l judges whether to obtain, if obtain, goes to step 7m, otherwise goes to step 7n; Step 7m removes BLOCK_START and the BLOCK_END in sub-identifier string, goes to step 7i; Step 7n exports sub-identifier string, goes to step 71; Step 7o is done state.
8. the malware detection methods based on program structure feature according to claim 1, is characterized in that the flow process of family's characteristic extraction procedure of rogue program family; The all performance of program that merge in family_features.lib file obtain rogue program family feature: step 80 is initialization action; Step 81 is obtained performance of program, obtains a line in file; Rogue program family feature is initialized as this performance of program by step 82; Step 83 is obtained performance of program from file; Step 84 judges whether to obtain, if obtain, goes to step 85, otherwise goes to step 87; Step 85 is obtained longest common subsequence; Ask the longest common subsequence of family's feature and performance of program, concrete treatment scheme as shown in Figure 9; Step 86 makes family be characterized as longest common subsequence, goes to step 83; Step 87 is done state.
9. the malware detection methods based on program structure feature according to claim 8, is characterized in that the flow process of longest common subsequence acquisition process: step 90 is initialization action; Step 91 deposits family's feature in character string S; Step 92 deposits performance of program in character string T; Step 93 judges whether S or T are null character string, if so, go to step 94, otherwise go to step 95; It is null character string that step 94 obtains longest common subsequence, goes to step 9f; Character string S is divided into character string S0 and trailing character by step 95; Character string T is divided into character string T0 and trailing character by step 96; Step 97 judges whether S trailing character equals T trailing character, if not etc., go to step 98, otherwise go to step 9d; Step 98 recurrence is obtained the longest common subsequence ST0 of character string S and T0; Step 99 recurrence is obtained the longest common subsequence TS0 of character string S0 and T; Step 9a judges that whether ST0 is longer than TS0, if so, goes to step 9b, otherwise goes to step 9c; Step 9b, using character string ST0 as longest common subsequence, goes to step 9f; Step 9c, using character string TS0 as longest common subsequence, goes to step 9f; Step 9d recurrence obtains the longest common subsequence of character string S0 and T0; Step 9e adds trailing character to form thereafter longest common subsequence to; Step 9f is done state.
10. the malware detection methods based on program structure feature according to claim 1, it is characterized in that representing the flow process of rogue program distortion mutation testing process: the program structure information that program structure information extraction process obtains leaves test_feature.lib file in, is used for extracting the performance of program of program to be detected for performance of program leaching process; Step a0 is initialization action; Step a1 utilizes built-in function ShellExecute () to call IDA and treats trace routine (program name is designated as test-prg) and carry out dis-assembling, obtains its dis-assembling code; Step a2 calls the program entry function of IDC function G etEntryPoint () acquisition dis-assembling code; Step a3 extracts the program structure information of program to be detected; According to program entry function, extraction procedure structural information, is kept at test_feature.lib file, and concrete treatment scheme as shown in Figure 3; Step a4 extracts the performance of program of program to be detected; Program structure information in test_feature.lib file is carried out to the adjustment of structural information symbol, function parameter number corrigendum processing, extract the performance of program of program to be detected, concrete treatment scheme as shown in Figure 5; Step a5 treats the performance of program of trace routine, and Using Call Library Function strlen () obtains the length of performance of program; Step a6 obtains family's feature from rogue program feature database; Step a7 judges whether to obtain, if obtain, goes to step a8, otherwise goes to step ag; Step a8 is to family's feature, and Using Call Library Function strlen () obtains the length of family's feature; Step a9 obtains longest common subsequence; Obtain the performance of program of program to be detected and the longest common subsequence of family's feature, idiographic flow as shown in Figure 9; Step aa Using Call Library Function strlen () obtains the length of longest common subsequence; Whether step ab determining program characteristic length is greater than family's characteristic length, if so, goes to step ac, otherwise goes to step ad; Step ac arrange maximum similarity be longest common subsequence length divided by family's characteristic length, go to step ae; It is that longest common subsequence length is divided by performance of program length that step ad arranges maximum similarity; Step ae judges that whether maximum similarity is less than threshold value (threshold value value is 0.82) here, if so, goes to step a6, otherwise goes to step af; Step af judges that program to be detected, as rogue program, goes to step ah; Step ag judges that program to be detected is not rogue program, goes to step ah; Step ah is done state.
CN201410152717.8A 2014-04-16 2014-04-16 A kind of malware detection methods based on program structure feature Expired - Fee Related CN103902911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410152717.8A CN103902911B (en) 2014-04-16 2014-04-16 A kind of malware detection methods based on program structure feature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410152717.8A CN103902911B (en) 2014-04-16 2014-04-16 A kind of malware detection methods based on program structure feature

Publications (2)

Publication Number Publication Date
CN103902911A true CN103902911A (en) 2014-07-02
CN103902911B CN103902911B (en) 2016-09-14

Family

ID=50994224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410152717.8A Expired - Fee Related CN103902911B (en) 2014-04-16 2014-04-16 A kind of malware detection methods based on program structure feature

Country Status (1)

Country Link
CN (1) CN103902911B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599681A (en) * 2016-12-22 2017-04-26 北京邮电大学 Malicious program characteristic extraction method and system
CN106664201A (en) * 2014-08-28 2017-05-10 三菱电机株式会社 Process analysis device, process analysis method, and process analysis program
CN106909839A (en) * 2015-12-22 2017-06-30 北京奇虎科技有限公司 A kind of method and device for extracting sample code feature
CN108334776A (en) * 2017-01-19 2018-07-27 中国移动通信有限公司研究院 A kind of detection method and device of Metamorphic malware
CN108985063A (en) * 2018-07-13 2018-12-11 南方电网科学研究院有限责任公司 A kind of malicious code obscures detection method, system, computer equipment, medium
CN110135150A (en) * 2019-05-10 2019-08-16 上海红神信息技术有限公司 A kind of program operation control method and system
CN110866251A (en) * 2018-12-14 2020-03-06 哈尔滨安天科技集团股份有限公司 Extraction method and device of encrypted character string, electronic equipment and storage medium
CN111651768A (en) * 2020-08-05 2020-09-11 中国人民解放军国防科技大学 Method and device for identifying link library function name of computer binary program
CN112434293A (en) * 2020-11-13 2021-03-02 北京鸿腾智能科技有限公司 File feature extraction method, equipment, storage medium and device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650445B (en) * 2016-12-16 2019-05-28 华东师范大学 A kind of rogue program recognition methods

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324888B (en) * 2012-03-19 2016-04-27 哈尔滨安天科技股份有限公司 Based on virus characteristic extraction method and the system of family's sample
CN103268445B (en) * 2012-12-27 2016-01-13 武汉安天信息技术有限责任公司 A kind of android malicious code detecting method based on OpCode and system
CN103440459B (en) * 2013-09-25 2016-04-06 西安交通大学 A kind of Android malicious code detecting method based on function call

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106664201A (en) * 2014-08-28 2017-05-10 三菱电机株式会社 Process analysis device, process analysis method, and process analysis program
CN106909839A (en) * 2015-12-22 2017-06-30 北京奇虎科技有限公司 A kind of method and device for extracting sample code feature
CN106909839B (en) * 2015-12-22 2020-04-17 北京奇虎科技有限公司 Method and device for extracting sample code features
CN106599681A (en) * 2016-12-22 2017-04-26 北京邮电大学 Malicious program characteristic extraction method and system
CN108334776A (en) * 2017-01-19 2018-07-27 中国移动通信有限公司研究院 A kind of detection method and device of Metamorphic malware
CN108985063A (en) * 2018-07-13 2018-12-11 南方电网科学研究院有限责任公司 A kind of malicious code obscures detection method, system, computer equipment, medium
CN110866251A (en) * 2018-12-14 2020-03-06 哈尔滨安天科技集团股份有限公司 Extraction method and device of encrypted character string, electronic equipment and storage medium
CN110135150A (en) * 2019-05-10 2019-08-16 上海红神信息技术有限公司 A kind of program operation control method and system
CN111651768A (en) * 2020-08-05 2020-09-11 中国人民解放军国防科技大学 Method and device for identifying link library function name of computer binary program
CN112434293A (en) * 2020-11-13 2021-03-02 北京鸿腾智能科技有限公司 File feature extraction method, equipment, storage medium and device

Also Published As

Publication number Publication date
CN103902911B (en) 2016-09-14

Similar Documents

Publication Publication Date Title
CN103902911A (en) Rogue program detection method based on program structural features
CN110348214B (en) Method and system for detecting malicious codes
CN103914657B (en) A kind of malware detection methods based on Function feature
CN109462575B (en) Webshell detection method and device
CN104933364B (en) A kind of malicious code based on the behavior of calling automates homologous determination method and system
CN109992969B (en) Malicious file detection method and device and detection platform
CN108932430A (en) A kind of malware detection method based on software gene technology
CN1235108C (en) Computer viruses detection and identification system and method
CN106485146B (en) A kind of information processing method and server
CN109190371A (en) A kind of the Android malware detection method and technology of Behavior-based control figure
CN101441687B (en) Method and apparatus for extracting virus characteristic of virus document
CN112597495B (en) Malicious code detection method, system, equipment and storage medium
CN105930447B (en) A method of tree-like nested data is converted into panel data table
CN103839006A (en) Program identification method and device based on machine learning
CN109886021A (en) A kind of malicious code detecting method based on API overall situation term vector and layered circulation neural network
RU2728497C1 (en) Method and system for determining belonging of software by its machine code
CN106682506A (en) Virus program detecting method and terminal
CN109871686A (en) Rogue program recognition methods and device based on icon representation and software action consistency analysis
CN102542190B (en) Program identifying method and device based on machine learning
CN111181980B (en) Network security-oriented regular expression matching method and device
CN103942495A (en) Program identification method and device on basis of machine learning
CN105119910A (en) Template-based online social network rubbish information real-time detecting method
CN102298681A (en) Software identification method based on data stream sliced sheet
WO2018110997A1 (en) Method and apparatus for generating network intrusion detection rule
CN106326746B (en) A kind of rogue program behavioural characteristic base construction method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160914

Termination date: 20170416

CF01 Termination of patent right due to non-payment of annual fee