CN104517053A - Software recognition method and device - Google Patents

Software recognition method and device Download PDF

Info

Publication number
CN104517053A
CN104517053A CN201310454828.XA CN201310454828A CN104517053A CN 104517053 A CN104517053 A CN 104517053A CN 201310454828 A CN201310454828 A CN 201310454828A CN 104517053 A CN104517053 A CN 104517053A
Authority
CN
China
Prior art keywords
character string
functional blocks
built
function
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310454828.XA
Other languages
Chinese (zh)
Inventor
王鑫
姚辉
刘桂峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Seal Fun Technology Co., Ltd.
Original Assignee
Shell Internet Beijing Security Technology Co Ltd
Beijing Kingsoft Internet Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shell Internet Beijing Security Technology Co Ltd, Beijing Kingsoft Internet Science and Technology Co Ltd filed Critical Shell Internet Beijing Security Technology Co Ltd
Priority to CN201310454828.XA priority Critical patent/CN104517053A/en
Publication of CN104517053A publication Critical patent/CN104517053A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Character Discrimination (AREA)

Abstract

The embodiment of the invention discloses software recognition method and device. The software recognition method comprises the steps of performing disassembling an executable file of software to be recognized; removing codes of a library function in the disassembled executable file; extracting a character string to be recognized from the rest codes; determining the first quantity of the character strings to be recognized, matching with the character strings in a first character string library from the character strings to be recognized, and a first quantity of the character strings to be recognized, matching with the character strings in a second character string library; storing the character strings which are extracted from the rest codes outside the codes of the library function and corresponding to malicious software through the first character string library, and storing the character strings which are extracted from the rest codes outside the codes of the library function and corresponding to normal software through the second character string library; determining the recognizing result of the software to be recognized according to the ratio of the first quantity to the second quantity. According to the scheme, the method and device can improve the accuracy of character string based software recognition.

Description

Software identification method and device
Technical field
The present invention relates to software identification field, particularly a kind of software identification method and device.
Background technology
Because Malware affects the operation of electronic equipment, therefore, for identifying that the software identification method of normal software and Malware receives much concern always.And due to character string be the effective material in software code, therefore, software identification can be carried out based on character string.Wherein, character string can be a string character that the element such as numeral, letter, underscore, Chinese character is formed.
In prior art, the mode according to character string identification software is generally: by the executable file dis-assembling process of software to be identified, from the full text code of the executable file after dis-assembling process, extract character string, and as character string to be identified; The malice character string storehouse of character string character string to be identified extracted from the corresponding code in full of Malware with storing of building in advance respectively and storing is mated from the normal character string storehouse of the character string extracted code in full corresponding to normal software, and according to the quantity of the character string to be identified belonged in malice character string storehouse and the ratio of quantity belonging to the character string to be identified in normal character string storehouse, determine the recognition result of this software to be identified, complete with this identification treating identification software.
But, due to the Malware person of making up write code time, usually the function code that built-in function code and user write voluntarily is utilized, and the key forming Malware is the function code that user writes voluntarily, therefore, existing based in the software identifying of character string, the character string of code covers the character string of non-malicious in full, and the character string of non-malicious will have influence on matching ratio undoubtedly, finally cause the accuracy of software identification lower.
Visible, how to improve and utilize the accuracy of character string identification software to be a problem demanding prompt solution.
Summary of the invention
Based on the problems referred to above, the embodiment of the invention discloses a kind of software identification method and device, to improve the accuracy utilizing character string identification software.Technical scheme is as follows:
First aspect, embodiments provides a kind of software identification method, comprising:
The executable file of software to be identified is carried out dis-assembling process;
Remove the code belonging to built-in function in the executable file after dis-assembling process;
Character string to be identified is extracted from residue code;
From described character string to be identified, determine the first quantity with the character string to be identified of the first character string storehouse institute store character String matching, and the second quantity of character string to be identified with the second character string storehouse institute store character String matching; Wherein, described first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, described second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function;
According to the ratio of described first quantity and described second quantity, determine the recognition result of described software to be identified.
Preferably, from residue code, extract character string to be identified, comprising:
Determine the global address in described residue code;
Character string to be identified is extracted from determined global address.
Preferably, remove the code belonging to built-in function in the executable file after dis-assembling process, comprising:
Build the functional blocks of the executable file after dis-assembling process, and be stored in the 3rd functional blocks set;
Determine the functional blocks belonging to built-in function in described 3rd functional blocks set, and remove the functional blocks belonging to built-in function determined;
Accordingly, from residue code, extract character string to be identified, comprising:
Belong to from removal in the remaining functional blocks the described 3rd functional blocks set of the functional blocks of built-in function and extract character string to be identified.
Preferably, the mode of being mated by code segmentation, determines the functional blocks belonging to built-in function in described 3rd functional blocks set.
Preferably, the building process in described first character string storehouse and the second character string storehouse comprises:
Respectively dis-assembling process is carried out to the first executable file corresponding to the Malware of the 3rd quantity, and respectively dis-assembling process is carried out to the first executable file corresponding to the normal software of the 4th quantity;
Build the functional blocks of the first executable file after dis-assembling process, and be stored into the first functional blocks set, build the functional blocks of the second executable file after dis-assembling process, and be stored in the second functional blocks set;
Determine the functional blocks belonging to built-in function in described first functional blocks set and described second functional blocks set, and remove the functional blocks belonging to built-in function determined;
From the functional blocks that described first functional blocks set is remaining, extract character string and build the first character string storehouse;
From the functional blocks that described second functional blocks set is remaining, extract character string and build the second character string storehouse.
Preferably, behind structure first character string storehouse and the second character string storehouse, described method also comprises:
Determine the common character string existed in described first character string storehouse and the second character string storehouse;
The character string of determined common existence is removed from the first character string storehouse and the second character string storehouse.
Preferably, described software identification method also comprises:
Show the recognition result of described software to be identified.
Preferably, described recognition result, comprising:
Normal software, Malware, partially normal software or partially Malware.
Second aspect, embodiments provides a kind of software recognition device, comprising:
Dis-assembling module, for carrying out dis-assembling process by the executable file of software to be identified;
Module removed by built-in function code, for the code belonging to built-in function in the executable file after removing dis-assembling process;
Text string extracting module, for extracting character string to be identified from residue code;
Quantity determination module, for from described character string to be identified, determines the first quantity with the character string to be identified of the first character string storehouse institute store character String matching, and the second quantity of character string to be identified with the second character string storehouse institute store character String matching;
Result determination module, for the ratio according to described first quantity and described second quantity, determines the recognition result of described software to be identified;
Character string storehouse builds module, for building described first character string storehouse and described second character string storehouse, wherein, wherein, described first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, described second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function.
Preferably, described text string extracting module, comprising:
Global address determining unit, for determining the global address in described residue code;
First text string extracting unit, for extracting character string to be identified from determined global address.
Preferably, module removed by described built-in function code, comprising:
Functional blocks construction unit, for building the functional blocks of the executable file after dis-assembling process, and is stored in the 3rd functional blocks set;
Built-in function block determining unit, for determining the functional blocks belonging to built-in function in described 3rd functional blocks set;
First built-in function block removal unit, for removing the determined functional blocks belonging to built-in function;
Described text string extracting module, comprising:
Second text string extracting unit, extracts character string to be identified for belonging to from removal in the remaining functional blocks in the described 3rd functional blocks set of the functional blocks of built-in function.
Preferably, described built-in function block determining unit, for the mode of being mated by code segmentation, determines the functional blocks belonging to built-in function in described 3rd functional blocks set.
Preferably, described character string storehouse builds module, comprising:
Dis-assembling unit, for carrying out dis-assembling process to the first executable file corresponding to the Malware of the 3rd quantity respectively, and carries out dis-assembling process to the first executable file corresponding to the normal software of the 4th quantity respectively;
Second functional blocks construction unit, for building the functional blocks of the first executable file after dis-assembling process, and be stored into the first functional blocks set, build the functional blocks of the second executable file after dis-assembling process, and be stored in the second functional blocks set;
Second built-in function block removal unit, for determining the functional blocks belonging to built-in function in described first functional blocks set and described second functional blocks set, and removes the functional blocks belonging to built-in function determined;
First character string storehouse construction unit, for extracting character string and building the first character string storehouse from the remaining functional blocks of described first functional blocks set;
Second character string storehouse construction unit, for extracting character string and building the second character string storehouse from the remaining functional blocks of described second functional blocks set.
Preferably, described character string storehouse builds module, also comprises:
Common character string determining unit, for determining the common character string existed in described first character string storehouse and the second character string storehouse;
Common character string delete cells, for removing the character string of determined common existence from the first character string storehouse and the second character string storehouse.
Preferably, described software recognition device also comprises:
Recognition result display module, for showing the recognition result of described software to be identified
Compared with prior art, in this programme, be built with the first character string storehouse and the second character string storehouse in advance, wherein, this first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, this second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function; In software identifying, belong to the residue code of the executable file after dis-assembling process of the code of built-in function from removal and extract character string as character string to be identified, and the character string that character string character string to be identified stored with the first character string storehouse respectively and the second character string storehouse store is mated, and according to matching ratio determination recognition result.Visible, in this programme, build in character string storehouse and in identifying, eliminate the code of built-in function, thus reduce the impact of character string on matching ratio of non-malicious, improve the accuracy utilizing character string identification software.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The first method process flow diagram of a kind of software identification method that Fig. 1 provides for the embodiment of the present invention;
The second method process flow diagram of a kind of software identification method that Fig. 2 provides for the embodiment of the present invention;
The structural representation of a kind of software recognition device that Fig. 3 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
In order to improve the accuracy utilizing character string identification software, embodiments provide a kind of software identification method and device.
First a kind of software identification method is provided to be introduced to the embodiment of the present invention below.
It should be noted that, a kind of software identification method that the embodiment of the present invention provides is applicable to electronic equipment.Wherein, in actual applications, this electronic equipment can be: mobile phone, panel computer, notebook computer etc.
As shown in Figure 1, a kind of software identification method, can comprise:
S101, carries out dis-assembling process by the executable file of software to be identified;
When needs identification software, after the executable file obtaining software to be identified, this executable file to be identified can be carried out dis-assembling process, thus carry out follow-up process.
Utilize higher level lanquage as higher level lanquages such as C, pascal when it will be appreciated by persons skilled in the art that coding, and then generate the file that directly can be performed by operating system through program compiler, i.e. executable file; And namely dis-assembling refers to these executable file decompilings is reduced into assembly language or other higher level lanquages.
Further, it should be noted that, executable file is binary file, and in actual applications, executable file can be: exe formatted file, sys formatted file or com formatted file etc.
S102, removes the code belonging to built-in function in the executable file after dis-assembling process;
After executive software carries out dis-assembling process, can remove the code belonging to built-in function in the executable file after dis-assembling process, thus making not comprise the code belonging to built-in function in residue code this software to be identified.
Wherein, because built-in function stores several functional blocks, therefore, the code belonging to built-in function in executable file after can removing dis-assembling process by the mode of constructor block, concrete, remove the code belonging to built-in function in the executable file after dis-assembling process, can comprise:
Build the functional blocks of the executable file after dis-assembling process, and be stored in the 3rd functional blocks set;
Determine the functional blocks belonging to built-in function in the 3rd functional blocks set, and remove the functional blocks belonging to built-in function determined.
Wherein, determine the functional blocks belonging to built-in function in the 3rd functional blocks set, can comprise: the current functional blocks to be identified in the 3rd functional blocks set is mated with the functional blocks in built-in function respectively, when mating completely with a certain functional blocks of built-in function, show that current functional blocks to be identified belongs to built-in function, complete the judgement to current functional blocks to be identified, thus next functional blocks to be identified is processed as current functional blocks to be identified.
Further, in order to improve processing speed, the mode can mated by code segmentation, determine the functional blocks belonging to built-in function in the 3rd functional blocks set, concrete mode is: mated to the corresponding first paragraph code of the functional blocks in built-in function respectively by the first paragraph code preset of current functional blocks to be identified, when there is not the built-in function of coupling, showing that current functional blocks to be identified does not belong to built-in function, completing the judgement to this function to be identified; When the first paragraph code matches of the first paragraph code of current functional blocks to be identified and a certain functional blocks of built-in function, then continue through both second segment code judgements whether to mate, and when judged result is no, determine that current functional blocks to be identified does not belong to built-in function, terminate matching process, otherwise, continue the coupling of follow-up code segment; And when there is unmatched situation, terminate coupling, and determine that current functional blocks to be identified does not belong to built-in function, and if all codes all mate time, show that current functional blocks to be identified belongs to built-in function.
It should be noted that, above-mentioned code matches refers to that code is identical; Further, the mode belonging to the code of built-in function in the executable file after above-mentioned removal dis-assembling process, as just example, should not form the restriction to the embodiment of the present invention.
S103, extracts character string to be identified from residue code;
Wherein, from residue code, extract character string to be identified, can comprise: determine the global address in residue code; Character string to be identified is extracted from determined global address.
It should be noted that, because the process belonging to the code of built-in function in the executable file after dis-assembling process can be: the functional blocks building the executable file after dis-assembling process, and be stored in the 3rd functional blocks set; Determine the functional blocks belonging to built-in function in the 3rd functional blocks set, and remove the functional blocks belonging to built-in function determined; Therefore, accordingly, from residue code, extract character string to be identified can comprise: belong to from removal in the remaining functional blocks the 3rd functional blocks set of the functional blocks of built-in function and extract character string to be identified.Wherein, time understandable, the mode extracting character string to be identified from remaining functional blocks is: determine the global address in remaining functional blocks; Character string to be identified is extracted from determined global address.
S104, from this character string to be identified, determines the first quantity with the character string to be identified of the first character string storehouse institute store character String matching, and the second quantity of character string to be identified with the second character string storehouse institute store character String matching;
Wherein, this first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, this second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function.
After determining the character string to be identified that this software to be identified is corresponding, can determine that each character string to be identified is the string matching stored with the first character string storehouse, or the string matching stored with the second character string storehouse, or, both all do not mate, second quantity of the first quantity determining the character string to be identified belonging to the first character string storehouse further and the character string to be identified belonging to the second character string storehouse, and then carry out follow-up process.
Concrete, as shown in Figure 2, the building process in this first character string storehouse and the second character string storehouse can comprise:
S201, carries out dis-assembling process to the first executable file corresponding to the Malware of the 3rd quantity respectively;
S202, carries out dis-assembling process to the first executable file corresponding to the normal software of the 4th quantity respectively;
Wherein, in actual applications, the 3rd quantity and the 4th quantity can be identical or different, and quantity is larger, the first constructed character string storehouse and the confidence level in the second character string storehouse higher.
S203, builds the functional blocks of the first executable file after dis-assembling process, and is stored into the first functional blocks set;
S204, builds the functional blocks of the second executable file after dis-assembling process, and is stored in the second functional blocks set;
S205, determines the functional blocks belonging to built-in function in this first functional blocks set and this second functional blocks set, and removes the functional blocks belonging to built-in function determined;
Wherein, determine with above-mentioned, this first functional blocks set can determine that the mode belonging to the functional blocks of built-in function in the 3rd functional blocks set is identical with the mode belonging to the functional blocks of built-in function in this second functional blocks set, do not repeat them here.
S206, extracts character string and builds the first character string storehouse from the functional blocks that this first functional blocks set is remaining;
S207, extracts character string and builds the second character string storehouse from the functional blocks that this second functional blocks set is remaining.
Wherein, neutralizing from the functional blocks that the first functional blocks set is remaining the mode extracting character string the remaining functional blocks of second functional blocks set can be identical with the above-mentioned mode extracting character string to be identified from remaining functional blocks, and therefore not to repeat here.
Further, because some character string is present in the first character string storehouse and the second character string storehouse jointly, cause it little to the contribution identified, but may matching ratio be affected, the final accuracy affecting recognition result, therefore, affects matching ratio to reduce the common character string existed, behind structure first character string storehouse and the second character string storehouse, described method also comprises:
Determine the common character string existed in this first character string storehouse and the second character string storehouse;
The character string of determined common existence is removed from the first character string storehouse and the second character string storehouse.
It should be noted that, the process in above-mentioned structure first character string storehouse and the second character string storehouse, as just example, should not form the restriction to the embodiment of the present invention.
S105, according to the ratio of this first quantity and this second quantity, determines the recognition result of this software to be identified.
After determining this first quantity and the second quantity, according to the ratio of this first quantity and this second quantity, the recognition result of this software to be identified can be determined.
Wherein, this recognition result can comprise: normal software, Malware, partially normal software or partially Malware; Further, be understandable that, under different application scenarioss, recognition result can only comprise: normal software, Malware, also can only comprise: normal software or partially Malware partially, this is all rational.
Wherein, each recognition result can a corresponding ratio interval, such as: determined ratio belong to ratio corresponding to normal software interval time, the recognition result of this software to be identified is normal software; Belong to ratio corresponding to Malware at determined ratio interval, the recognition result of this software to be identified is Malware; Belong to ratio corresponding to inclined normal software at determined ratio interval, the recognition result of this software to be identified is inclined normal software; Belong to ratio corresponding to inclined Malware at determined ratio interval, the recognition result of this software to be identified is inclined Malware.Be understandable that, under different application scenarioss, the ratio interval corresponding to each recognition result can be different.
Further, after determining the recognition result that this software to be identified is corresponding, this software identification method can also comprise: the recognition result showing this software to be identified.
Further, after the recognition result determining this software to be identified, follow-up process can be carried out further, such as: when determine this recognition result be Malware or partially Malware time, the link of deleting this software to be identified can be provided for user, or, provide warning message to user, the problem may brought after being mounted to warn this software to be identified, is not limited thereto certainly; And when determine this result to be identified be normal software or partially normal software time, corresponding information can be shown to user, to inform that user can this software to be identified of relieved use, certainly be not limited thereto.
Compared with prior art, in this programme, be built with the first character string storehouse and the second character string storehouse in advance, wherein, this first character string stock contains the character string extracted from the residue code beyond the code belonging to built-in function corresponding to Malware, and this second character string stock contains the character string extracted residue code beyond the code belonging to built-in function corresponding to normal software; In software identifying, belong to the residue code of the executable file after dis-assembling process of the code of built-in function from removal and extract character string as character string to be identified, and the character string that character string character string to be identified stored with the first character string storehouse respectively and the second character string storehouse store is mated, and according to matching ratio determination recognition result.Visible, in this programme, build in character string storehouse and in identifying, eliminate the code of built-in function, thus reduce the impact of character string on matching ratio of non-malicious, improve the accuracy utilizing character string identification software.
It should be noted that, " first " in above-mentioned " the first functional blocks set ", " second " in " the second functional blocks set ", " the 3rd " in " the 3rd functional blocks set ", just to distinguishing different functional blocks set, does not have any limiting meaning; Same, " the 4th " in " the 3rd ", " the 4th quantity " in " second " in " first " in " the first quantity ", " the second quantity ", " the 3rd quantity ", just to distinguishing different quantity, does not have any limiting meaning.
Corresponding to said method embodiment, the embodiment of the present invention additionally provides a kind of software recognition device, as shown in Figure 3, can comprise:
Dis-assembling module 310, for carrying out dis-assembling process by the executable file of software to be identified;
Module 320 removed by built-in function code, for the code belonging to built-in function in the executable file after removing dis-assembling process;
Text string extracting module 330, for extracting character string to be identified from residue code;
Quantity determination module 340, for from described character string to be identified, determine the first quantity with the character string to be identified of the first character string storehouse institute store character String matching, and the second quantity of character string to be identified with the second character string storehouse institute store character String matching;
Result determination module 350, for the ratio according to described first quantity and described second quantity, determines the recognition result of described software to be identified;
Character string storehouse builds module 360, for building described first character string storehouse and described second character string storehouse, wherein, wherein, described first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, described second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function.
Compared with prior art, in this programme, be built with the first character string storehouse and the second character string storehouse in advance, wherein, this first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, this second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function; In software identifying, belong to the residue code of the executable file after dis-assembling process of the code of built-in function from removal and extract character string as character string to be identified, and the character string that character string character string to be identified stored with the first character string storehouse respectively and the second character string storehouse store is mated, and according to matching ratio determination recognition result.Visible, in this programme, build in character string storehouse and in identifying, eliminate the code of built-in function, thus reduce the impact of character string on matching ratio of non-malicious, improve the accuracy utilizing character string identification software.
Wherein, described text string extracting module 330, can comprise:
Global address determining unit, for determining the global address in described residue code;
First text string extracting unit, for extracting character string to be identified from determined global address.
Wherein, module 320 removed by described built-in function code, can comprise:
Functional blocks construction unit, for building the functional blocks of the executable file after dis-assembling process, and is stored in the 3rd functional blocks set;
Built-in function block determining unit, for determining the functional blocks belonging to built-in function in described 3rd functional blocks set;
First built-in function block removal unit, for removing the determined functional blocks belonging to built-in function;
Described text string extracting module, can comprise:
Second text string extracting unit, extracts character string to be identified for belonging to from removal in the remaining functional blocks in the described 3rd functional blocks set of the functional blocks of built-in function.
Wherein, described built-in function block determining unit, for the mode of being mated by code segmentation, determines the functional blocks belonging to built-in function in described 3rd functional blocks set.
Wherein, described character string storehouse builds module 350, can comprise:
Dis-assembling unit, for carrying out dis-assembling process to the first executable file corresponding to the Malware of the 3rd quantity respectively, and carries out dis-assembling process to the first executable file corresponding to the normal software of the 4th quantity respectively;
Second functional blocks construction unit, for building the functional blocks of the first executable file after dis-assembling process, and be stored into the first functional blocks set, build the functional blocks of the second executable file after dis-assembling process, and be stored in the second functional blocks set;
Second built-in function block removal unit, for determining the functional blocks belonging to built-in function in described first functional blocks set and described second functional blocks set, and removes the functional blocks belonging to built-in function determined;
First character string storehouse construction unit, for extracting character string and building the first character string storehouse from the remaining functional blocks of described first functional blocks set;
Second character string storehouse construction unit, for extracting character string and building the second character string storehouse from the remaining functional blocks of described second functional blocks set.
Wherein, described character string storehouse builds module 350, can also comprise:
Common character string determining unit, for determining the common character string existed in described first character string storehouse and the second character string storehouse;
Common character string delete cells, for removing the character string of determined common existence from the first character string storehouse and the second character string storehouse.
Further, described software recognition device can also comprise:
Recognition result display module, for showing the recognition result of described software to be identified.
For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
It should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.
One of ordinary skill in the art will appreciate that all or part of step realized in said method embodiment is that the hardware that can carry out instruction relevant by program has come, described program can be stored in computer read/write memory medium, here the alleged storage medium obtained, as: ROM/RAM, magnetic disc, CD etc.
The foregoing is only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.

Claims (15)

1. a software identification method, is characterized in that, comprising:
The executable file of software to be identified is carried out dis-assembling process;
Remove the code belonging to built-in function in the executable file after dis-assembling process;
Character string to be identified is extracted from residue code;
From described character string to be identified, determine the first quantity with the character string to be identified of the first character string storehouse institute store character String matching, and the second quantity of character string to be identified with the second character string storehouse institute store character String matching; Wherein, described first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, described second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function;
According to the ratio of described first quantity and described second quantity, determine the recognition result of described software to be identified.
2. method according to claim 1, is characterized in that, extracts character string to be identified, comprising from residue code:
Determine the global address in described residue code;
Character string to be identified is extracted from determined global address.
3. method according to claim 1, is characterized in that, removes the code belonging to built-in function in the executable file after dis-assembling process, comprising:
Build the functional blocks of the executable file after dis-assembling process, and be stored in the 3rd functional blocks set;
Determine the functional blocks belonging to built-in function in described 3rd functional blocks set, and remove the functional blocks belonging to built-in function determined;
Accordingly, from residue code, extract character string to be identified, comprising:
Belong to from removal in the remaining functional blocks the described 3rd functional blocks set of the functional blocks of built-in function and extract character string to be identified.
4. method according to claim 4, is characterized in that, the mode of being mated by code segmentation, determines the functional blocks belonging to built-in function in described 3rd functional blocks set.
5. the method according to claim 1-4 any one, is characterized in that, the building process in described first character string storehouse and the second character string storehouse comprises:
Respectively dis-assembling process is carried out to the first executable file corresponding to the Malware of the 3rd quantity, and respectively dis-assembling process is carried out to the first executable file corresponding to the normal software of the 4th quantity;
Build the functional blocks of the first executable file after dis-assembling process, and be stored into the first functional blocks set, build the functional blocks of the second executable file after dis-assembling process, and be stored in the second functional blocks set;
Determine the functional blocks belonging to built-in function in described first functional blocks set and described second functional blocks set, and remove the functional blocks belonging to built-in function determined;
From the functional blocks that described first functional blocks set is remaining, extract character string and build the first character string storehouse;
From the functional blocks that described second functional blocks set is remaining, extract character string and build the second character string storehouse.
6. method according to claim 5, is characterized in that, behind structure first character string storehouse and the second character string storehouse, described method also comprises:
Determine the common character string existed in described first character string storehouse and the second character string storehouse;
The character string of determined common existence is removed from the first character string storehouse and the second character string storehouse.
7. the method according to claim 1-4 any one, is characterized in that, also comprises:
Show the recognition result of described software to be identified.
8. the method according to claim 1-4 any one, is characterized in that, described recognition result, comprising:
Normal software, Malware, partially normal software or partially Malware.
9. a software recognition device, is characterized in that, comprising:
Dis-assembling module, for carrying out dis-assembling process by the executable file of software to be identified;
Module removed by built-in function code, for the code belonging to built-in function in the executable file after removing dis-assembling process;
Text string extracting module, for extracting character string to be identified from residue code;
Quantity determination module, for from described character string to be identified, determines the first quantity with the character string to be identified of the first character string storehouse institute store character String matching, and the second quantity of character string to be identified with the second character string storehouse institute store character String matching;
Result determination module, for the ratio according to described first quantity and described second quantity, determines the recognition result of described software to be identified;
Character string storehouse builds module, for building described first character string storehouse and described second character string storehouse, wherein, wherein, described first character string stock contain from corresponding to Malware, the character string extracted residue code beyond the code that belongs to built-in function, described second character string stock contain from corresponding to normal software, the character string extracted residue code beyond the code that belongs to built-in function.
10. device according to claim 9, is characterized in that, described text string extracting module, comprising:
Global address determining unit, for determining the global address in described residue code;
First text string extracting unit, for extracting character string to be identified from determined global address.
11. devices according to claim 9, is characterized in that, module removed by described built-in function code, comprising:
Functional blocks construction unit, for building the functional blocks of the executable file after dis-assembling process, and is stored in the 3rd functional blocks set;
Built-in function block determining unit, for determining the functional blocks belonging to built-in function in described 3rd functional blocks set;
First built-in function block removal unit, for removing the determined functional blocks belonging to built-in function;
Described text string extracting module, comprising:
Second text string extracting unit, extracts character string to be identified for belonging to from removal in the remaining functional blocks in the described 3rd functional blocks set of the functional blocks of built-in function.
12. devices according to claim 11, is characterized in that, described built-in function block determining unit, for the mode of being mated by code segmentation, determine the functional blocks belonging to built-in function in described 3rd functional blocks set.
13. devices according to claim 9-11 any one, is characterized in that, described character string storehouse builds module, comprising:
Dis-assembling unit, for carrying out dis-assembling process to the first executable file corresponding to the Malware of the 3rd quantity respectively, and carries out dis-assembling process to the first executable file corresponding to the normal software of the 4th quantity respectively;
Second functional blocks construction unit, for building the functional blocks of the first executable file after dis-assembling process, and be stored into the first functional blocks set, build the functional blocks of the second executable file after dis-assembling process, and be stored in the second functional blocks set;
Second built-in function block removal unit, for determining the functional blocks belonging to built-in function in described first functional blocks set and described second functional blocks set, and removes the functional blocks belonging to built-in function determined;
First character string storehouse construction unit, for extracting character string and building the first character string storehouse from the remaining functional blocks of described first functional blocks set;
Second character string storehouse construction unit, for extracting character string and building the second character string storehouse from the remaining functional blocks of described second functional blocks set.
14. devices according to claim 13, is characterized in that, described character string storehouse builds module, also comprises:
Common character string determining unit, for determining the common character string existed in described first character string storehouse and the second character string storehouse;
Common character string delete cells, for removing the character string of determined common existence from the first character string storehouse and the second character string storehouse.
15. devices according to claim 9-11 any one, is characterized in that, also comprise: recognition result display module, for showing the recognition result of described software to be identified.
CN201310454828.XA 2013-09-29 2013-09-29 Software recognition method and device Pending CN104517053A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310454828.XA CN104517053A (en) 2013-09-29 2013-09-29 Software recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310454828.XA CN104517053A (en) 2013-09-29 2013-09-29 Software recognition method and device

Publications (1)

Publication Number Publication Date
CN104517053A true CN104517053A (en) 2015-04-15

Family

ID=52792340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310454828.XA Pending CN104517053A (en) 2013-09-29 2013-09-29 Software recognition method and device

Country Status (1)

Country Link
CN (1) CN104517053A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399729A (en) * 2019-04-11 2019-11-01 国家计算机网络与信息安全管理中心 A kind of binary software analysis1 method based on module diagnostic weight
WO2020047782A1 (en) * 2018-09-05 2020-03-12 西门子股份公司 Malicious code scanning method and system, computer device, storage medium and program
CN111368510A (en) * 2020-03-08 2020-07-03 苏州浪潮智能科技有限公司 Method and device for automatically generating redfish character string name
CN113282917A (en) * 2021-06-25 2021-08-20 深圳市联软科技股份有限公司 Security process identification method and system based on machine instruction structure

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924866A (en) * 2006-09-28 2007-03-07 北京理工大学 Static feature based web page malicious scenarios detection method
CN101763481A (en) * 2010-01-15 2010-06-30 北京工业大学 Unknown malicious code detecting method based on LZW compression algorithm
EP2725510A1 (en) * 2011-08-09 2014-04-30 Huawei Technologies Co., Ltd Method, system and relevant device for detecting malicious codes

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924866A (en) * 2006-09-28 2007-03-07 北京理工大学 Static feature based web page malicious scenarios detection method
CN101763481A (en) * 2010-01-15 2010-06-30 北京工业大学 Unknown malicious code detecting method based on LZW compression algorithm
EP2725510A1 (en) * 2011-08-09 2014-04-30 Huawei Technologies Co., Ltd Method, system and relevant device for detecting malicious codes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
石云峰: "恶意代码检测方法及其在安全评估中的应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020047782A1 (en) * 2018-09-05 2020-03-12 西门子股份公司 Malicious code scanning method and system, computer device, storage medium and program
CN110399729A (en) * 2019-04-11 2019-11-01 国家计算机网络与信息安全管理中心 A kind of binary software analysis1 method based on module diagnostic weight
CN110399729B (en) * 2019-04-11 2021-04-27 国家计算机网络与信息安全管理中心 Binary software analysis method based on component characteristic weight
CN111368510A (en) * 2020-03-08 2020-07-03 苏州浪潮智能科技有限公司 Method and device for automatically generating redfish character string name
CN113282917A (en) * 2021-06-25 2021-08-20 深圳市联软科技股份有限公司 Security process identification method and system based on machine instruction structure

Similar Documents

Publication Publication Date Title
CN109189888B (en) Electronic device, infringement analysis method, and storage medium
CN110933104B (en) Malicious command detection method, device, equipment and medium
CN103679030B (en) Malicious code analysis and detection method based on dynamic semantic features
CN107346284B (en) Application program detection method and detection device
CN107688742B (en) Large-scale rapid mobile application APP detection and analysis method
CN104517053A (en) Software recognition method and device
CN110750789B (en) De-obfuscation method, de-obfuscation device, computer apparatus, and storage medium
CN103810428A (en) Method and device for detecting macro virus
WO2023116561A1 (en) Entity extraction method and apparatus, and electronic device and storage medium
CN104751053A (en) Static behavior analysis method of mobile smart terminal software
CN113360580A (en) Abnormal event detection method, device, equipment and medium based on knowledge graph
CN104679495A (en) Method and device for recognizing software
CN104199704A (en) Application program installation package clearing method and device
US9349002B1 (en) Android application classification using common functions
CN110598996A (en) Risk processing method and device, electronic equipment and storage medium
CN112148602A (en) Source code security analysis method based on history optimization feature intelligent learning
CN106709350B (en) Virus detection method and device
EP3087527B1 (en) System and method of detecting malicious multimedia files
CN110837635A (en) Method, device, equipment and storage medium for equipment verification
CN104200164B (en) Loader virus searching and killing method, device and terminal
CN104636661A (en) Method and system for analyzing Android application program
CN111752958A (en) Intelligent associated label method, device, computer equipment and storage medium
CN113901457A (en) Method, system, equipment and readable storage medium for identifying malicious software
CN114637988A (en) Binary-oriented function level software randomization method
CN105138918A (en) Recognition method and device for secure file

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100041 A-0070 2, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant after: BEIJING LIEBAO NETWORK TECHNOLOGY CO., LTD.

Applicant after: Beijing cheetah Mobile Technology Co., Ltd.

Address before: 100041 room 3, 3 West well road, Badachu hi tech park, Shijingshan District, Beijing, 1592A

Applicant before: Beijing Kingsoft Internet Science and Technology Co., Ltd.

Applicant before: SHELL INTERNET (BEIJING) SECURITY TECHNOLOGY CO., LTD.

CB02 Change of applicant information
TA01 Transfer of patent application right

Effective date of registration: 20181212

Address after: Room 105-53967, No. 6 Baohua Road, Hengqin New District, Zhuhai City, Guangdong Province

Applicant after: Zhuhai Seal Fun Technology Co., Ltd.

Address before: 100041 A-0070 2, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant before: BEIJING LIEBAO NETWORK TECHNOLOGY CO., LTD.

Applicant before: Beijing cheetah Mobile Technology Co., Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20150415

RJ01 Rejection of invention patent application after publication