CN106909841A - A kind of method and device for judging viral code - Google Patents

A kind of method and device for judging viral code Download PDF

Info

Publication number
CN106909841A
CN106909841A CN201510971165.8A CN201510971165A CN106909841A CN 106909841 A CN106909841 A CN 106909841A CN 201510971165 A CN201510971165 A CN 201510971165A CN 106909841 A CN106909841 A CN 106909841A
Authority
CN
China
Prior art keywords
decompiling
information structure
function
threshold value
virtual machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510971165.8A
Other languages
Chinese (zh)
Inventor
杨康
陈卓
唐海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510971165.8A priority Critical patent/CN106909841A/en
Publication of CN106909841A publication Critical patent/CN106909841A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

This application discloses a kind of method and device for judging viral code.Method therein includes:The virtual machine execution file to application program carries out decompiling, obtains the function information structure of decompiling;The function information structure of the decompiling is parsed, the function instruction sequence in the function information structure of the decompiling is extracted;It is determined that the editing distance between the function instruction sequence extracted and the function instruction sequence of default viral code;If it is determined that the editing distance be less than predetermined threshold value, it is determined that the virtual machine execution file of the application program include viral code.Using application scheme, can accurately judge whether certain application program on intelligent terminal belongs to the program by changing the character string of viral code reference to reach purpose free to kill, so that the safety of intelligent terminal.

Description

A kind of method and device for judging viral code
Technical field
The application is related to intelligent terminal security technology area, more particularly to a kind of method for judging viral code and Device.
Background technology
With the development of science and technology, intelligent terminal has increasing function.For example, the mobile phone of people is from biography GSM, TDMA digital mobile phone of system turned to possess can process multimedia resource, provide web page browsing, The smart mobile phone of the much informations such as videoconference, ecommerce service.However, the increasingly various mobile phone of kind Malicious code is attacked and the increasingly serious personal data safety problem of situation is also following, more and more It is bitter that mobile phone viruses endure it to the fullest extent by smart phone user.
At present, the character string of virtual machine execution file is mainly based upon for the antivirus technique of all kinds of intelligent terminals Killing is carried out, the character string feature of extraction is matched with the feature in virus base, however, some are viral (such as trojan horse) can be easily free to kill to reach by changing the character string of viral code reference Purpose, so as to the safety of intelligent terminal can not be ensured.
The content of the invention
The embodiment of the present application is provided and a kind of overcomes above mentioned problem or sentencing of solving the above problems at least in part The method and device of disconnected viral code.
The embodiment of the present application uses following technical proposals:
A kind of method for judging viral code, including:
Virtual machine execution file to application program carries out decompiling, obtains the function information structure of decompiling;
The function information structure of the decompiling is parsed, the letter in the function information structure of the decompiling is extracted Number command sequence;
It is determined that between the function instruction sequence extracted and the function instruction sequence of default viral code Editing distance;
Judge the editing distance for determining whether less than predetermined threshold value, however, it is determined that the editing distance be less than Predetermined threshold value, it is determined that the virtual machine execution file of the application program includes viral code.
Preferably, before whether the editing distance for judging to determine is less than predetermined threshold value, methods described is also Including:
It is determined that the character sum of the function instruction sequence extracted;
The character sum of the function instruction sequence is defined as the predetermined threshold value with the product of default value; Wherein, the default value is between 0~1.
Preferably, the virtual machine execution file to application program carries out decompiling, obtains decompiling Function information structure, specifically include:
Virtual machine execution file is parsed according to virtual machine execution file form, obtains the function of each class Information structure;
According to the field in the function information structure, the position of the function of the virtual machine execution file is determined Put and size, obtain the function information structure of the decompiling.
A kind of method for judging viral code, including:
Virtual machine execution file to application program carries out decompiling, obtains the function information structure of decompiling;
Parse the function information structure of the decompiling, helping in the function information structure of the extraction decompiling Note symbol sequence;
It is determined that the volume between the memonic symbol sequence extracted and the memonic symbol sequence of default viral code Collect distance;
Whether the editing distance for determining is judged less than predetermined threshold value, if the editing distance is less than default threshold Value, it is determined that the virtual machine execution file of the application program includes viral code.
Preferably, before whether the editing distance for judging to determine is less than predetermined threshold value, methods described is also Including:
It is determined that the character sum of the memonic symbol sequence extracted;
The character sum of the memonic symbol sequence is defined as the predetermined threshold value with the product of default value;Its In, the default value is between 0~1.
A kind of device for judging viral code, the device includes:
Decompiling unit, decompiling is carried out for the virtual machine execution file to application program, obtains decompiling Function information structure;
Extraction unit, the function information structure for parsing the decompiling extracts the function of the decompiling Function instruction sequence in message structure;
Editing distance determining unit, for the function instruction sequence for determining to extract and default viral generation Editing distance between the function instruction sequence of code;
Whether viral code determining unit, the editing distance for judging to determine is less than predetermined threshold value, When the editing distance for determining is less than predetermined threshold value, the virtual machine execution file bag of the application program is determined Containing viral code.
Preferably, described device also includes:
Number of characters determining unit, for judge determine the editing distance whether less than predetermined threshold value it Before, it is determined that the character sum of the function instruction sequence extracted;
Predetermined threshold value determining unit, for multiplying the character of function instruction sequence sum and default value Product is defined as the predetermined threshold value;Wherein, the default value is between 0~1.
Preferably, the decompiling unit includes:
Information structure obtaining unit, for being entered to virtual machine execution file according to virtual machine execution file form Row parsing, obtains the function information structure of each class;
Function information structure obtaining unit, for the field in the function information structure, determines institute Position and the size of the function of virtual machine execution file are stated, the function information structure of the decompiling is obtained.
A kind of device for judging viral code, the device includes:
Decompiling unit, decompiling is carried out for the virtual machine execution file to application program, obtains decompiling Function information structure;
Extraction unit, the function information structure for parsing the decompiling extracts the function of the decompiling Memonic symbol sequence in message structure;
Editing distance determining unit, for the memonic symbol sequence for determining to extract and default viral code Function instruction sequence between editing distance;
Whether viral code determining unit, the editing distance for judging to determine is less than predetermined threshold value, When the editing distance is less than predetermined threshold value, determine that the virtual machine execution file of the application program includes virus Code.
Preferably, described device also includes:
Number of characters determining unit, for judge determine the editing distance whether less than predetermined threshold value it Before, it is determined that the character sum of the memonic symbol sequence extracted;
Predetermined threshold value determining unit, for the product of the total and default value by the character of the memonic symbol sequence It is defined as the predetermined threshold value;Wherein, the default value is between 0~1.
Above-mentioned at least one technical scheme that the embodiment of the present application is used can reach following beneficial effect:
By the analysis and decompiling of the virtual machine execution file of the application program to being installed on intelligent terminal, can With function instruction sequence (or the mnemonic(al) in the function information structure for obtaining decompiling corresponding with the application program Symbol sequence), and the function instruction sequence (or the memonic symbol sequence for determining to extract using editing distance algorithm Row) editing distance and the function instruction sequence (or memonic symbol sequence) of default viral code between, most Eventually it is determined that the editing distance be less than predetermined threshold value when, determine that the virtual machine of the application program performs text Part includes viral code, it is possible thereby to whether accurate certain application program judged on intelligent terminal belongs to pass through Change the character string of viral code reference to reach the program of purpose free to kill, so as to ensure the peace of intelligent terminal Entirely.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes of the application Point, the schematic description and description of the application is used to explain the application, does not constitute to the application not Work as restriction.In the accompanying drawings:
The flow chart of the method for judging viral code that Fig. 1 is provided for the embodiment of the application one;
Fig. 2 is the example that the embodiment of the present application carries out the function information structure that decompiling is obtained to dex files;
The flow chart of the method for judging viral code that Fig. 3 is provided for another embodiment of the application;
The module map of the device for judging viral code that Fig. 4 is provided for the embodiment of the application one.
Specific embodiment
It is specifically real below in conjunction with the application to make the purpose, technical scheme and advantage of the application clearer Apply example and corresponding accompanying drawing is clearly and completely described to technical scheme.Obviously, it is described Embodiment is only some embodiments of the present application, rather than whole embodiments.Based on the implementation in the application Example, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made Example, belongs to the scope of the application protection.
Herein by by taking Android (Android) operating system that mobile terminal is used as an example come to narration this technology Scheme, but it is not limited to Android (Android) operating system.
By taking Android (Android) operating system as an example, typically at least include application layer (app layers) and System framework layer (framework layers), other layer of the application for including is possible to then as being divided from function It is not covered.Wherein, the generally above-mentioned app layers interface that can be to carry out user mutual, for example:With With realize the interface of application maintenance, the different types of click on content of identifying user and show corresponding upper Hereafter menu etc..Generally above-mentioned framework layers is mainly used in the above-mentioned app layers user's request of acquisition (such as:Start and preserve picture etc. with program, clickthrough, click) forwarded toward lower floor, or by lower floor The content managed is distributed to upper strata by message or middle-agent's class mode, is shown with to user.
Dalvik is the Java Virtual Machine for Android platform.Dalvik is by optimization, it is allowed to limited Internal memory in run the example of multiple virtual machines simultaneously, and each Dalvik is using independent as one Linux processes are performed, and independent process can prevent all programs when virtual machine crashes to be all closed. Dalvik virtual machine can be supported to have been converted into the Java application journeys of dex (Dalvik Executable) form The operation of sequence, dex forms are a kind of compressed format for aiming at Dalvik designs, are adapted to internal memory and processor speed The limited sorts of systems of degree.
It can be seen that, in android system, dex files can be directly at Dalvik virtual machine (Dalvik VM) The virtual machine execution file of middle load operating.By ADT (Android Development Tools), pass through Java source codes, can be converted to dex files by complicated compiling.Dex files are directed to embedded system The result of optimization, the instruction code of Dalvik virtual machine is not the Java Virtual Machine instruction code of standard, but is made With oneself exclusive a set of instruction set.Many class names, constant character string are shared in dex files, has been made Its volume is smaller, and operational efficiency is also higher.It is worth mentioning that in android system, it is empty It is dex files that plan machine performs file, and in other operating systems, above-mentioned virtual machine execution file can be Other kinds of file, the application is not construed as limiting.
Below in conjunction with accompanying drawing, the technical scheme that each embodiment of the application is provided is described in detail.
Fig. 1 is the flow of the method for judging viral code of offer in the embodiment of the application one, including:
S101:Virtual machine execution file to application program carries out decompiling, obtains the function information of decompiling Structure.
Above-mentioned application program can be mounted to the application program on mobile terminal.Above-mentioned virtual machine execution file E.g. dex files.As it was previously stated, Android operation system includes application layer (app layers) and is System ccf layer (framework layers), the application focuses on the research and improvement to app layers.But, this Art personnel understand, when Android starts, Dalvik VM monitor all of program (APK texts Part) and framework, and for they create a dependency tree.DalvikVM passes through this dependence Tree comes for each program optimization code and stores in Dalvik cachings (dalvik-cache).So, institute Having program operationally can all use the code for optimizing.When a program (or framework storehouse) is changed, Dalvik VM re-optimization code and will be deposited in the buffer again.In cache/dalvik-cache It is the dex files for depositing the Program Generating on system, and data/dalvik-cache is then storage data/app The dex files of generation.That is, the application focuses on carrying out the dex files of data/app generations Analysis and treatment.
Mode on obtaining dex files, can be by parsing APK (Android Package, Android Installation kit) obtain.APK file can be a compressed package of zip forms, but its suffix name can be repaiied Apk is changed to, after UnZip is decompressed, it is possible to obtain above-mentioned dex files.
Decompiling (or dis-assembling) is carried out to dex files various ways, and two kinds are exemplarily given here Mode, those skilled in the art can on this basis expand other modes, and these modes are in the application Protection domain within:
First way:Dex files are parsed according to dex file formats, obtains the function of each class Information structure;According to the field in function information structure, the position of the function of dex files and big is determined It is small, obtain the function information structure of decompiling.Wherein, by analytical function information structure, indicated The list of the bytecode array field of the function position of dex files and the function size of instruction dex files is long Degree field, so that it is determined that the position of the function of dex files and size.
The second way:Dex file reverses are compiled as Virtual Machine bytecodes using dex files decompiling instrument.
As it was previously stated, Dalvik virtual machine operation is Dalvik bytecodes, the Dalvik bytecodes can be with It is exist in the form of a dex (Dalvik Executable) executable file, Dalvik virtual machine passes through Dex files are explained to perform code.There are some instruments at present, dex file reverses can be assembled into Dalvik and converged Compile code.The decompiling instrument of this kind of dex files is included but is not limited to:baksmali、Dedexer1.26、 Dexdump, dexinspecto03-12-12r, IDA Pro, androguard, dex2jar, 010Editor etc..
It can be seen that, by the decompiling to dex files, all function information structures of decompiling can be obtained. Wherein, function information structure performs code comprising function, is by virtual machine instructions sequence in the embodiment of the present application Row and virtual machine memonic symbol Sequence composition, such as following example, by Dalvik VM command sequence and The memonic symbol Sequence composition function information structure of Dalvik VM.
For example, shown in Fig. 2 being to carry out the function letter that decompiling is obtained in the embodiment of the present application to dex files Cease the example of structure.It can be seen that, dex files are decompiled into the command sequence and Dalvik VM of Dalvik VM Memonic symbol sequence.
S102:The function information structure of the decompiling is parsed, the function information structure of the decompiling is extracted In function instruction sequence.
Such as the example of figure 2 above, each in machine code field in the function information structure that decompiling is obtained Capable preceding 2 numerals refer to make sequence (upper example left side is by circle part), and the corresponding part of command sequence It is memonic symbol (upper example right side, is partly enclosed, not all selections).Memonic symbol is primarily to convenient use Family exchanges and written in code.As above example, dex files can be obtained by the sequence of instructions of function by decompiling It is classified as:“12 54 38 71 0c 6e 0c 6e 0a 38 54 54 6e 0c 6e 54 6e 0c 6e 0c 38 72 0a 39 12 38 54 6e 54 71 0e 01 28 54 13 6e”。
Memonic symbol sequence is:
“const/4iget-object if-eqz invoke-static move-result-object invoke-virtual move-result-object invoke-virtual move-result if-eqz iget-object iget-object invoke-virtual move-result-object invoke-virtual iget-object invoke-virtual move-result-object invoke-virtual move-result-object if-eqz invoke-interface move-result if-nez const/4if-eqz iget-object invoke-virtual iget-object invoke-static return-void move goto iget-object const/16invoke-virtual”。
S103:It is determined that the function instruction sequence of the function instruction sequence extracted and default viral code Between editing distance.
Editing distance (Edit Distance), also known as Levenshtein distances, refer between two word strings, by One change into another needed for minimum edit operation number of times.Such as:Calculate the editor of cafe and coffee Distance, by cafe operations for the process of coffee is:Cafe → caffe → coffe → coffee, then edited Distance is 3.Typically, for two function instruction sequences, if between the two function instruction sequences Editing distance is smaller, shows that the two function instruction sequence similarities are higher, that is, shows to be judged answering The mutation being likely to belong to certain viral code in virus base is got over the code of program.
S104:Whether the editing distance for determining is judged less than predetermined threshold value, if the editing distance is less than Predetermined threshold value, it is determined that the virtual machine execution file of the application program includes viral code.
Viral code (Virus code) refers to be propagated by storage medium or network, is being recognized without permission Operating system integrality is destroyed in the case of card, the journey logic bomb of undisclosed secret information in system is stolen. By taking mobile phone as an example, mobile phone malicious code refers to the malicious code for handheld devices such as mobile phone, PDA.Mobile phone Malicious code can be simply divided into science malicious code and non-replicating malicious code.Wherein science Malicious code mainly includes viral (Virus), worm (Worm), and non-replicating malicious code mainly includes Backdoor Trojan (Trojan Horse), rogue software (Rogue Software), malice are mobile Code (Malicious Mobile Code) and Rootkit programs etc..
For example, being by the command sequence that step S103 obtains function:“12 54 38 71 0c 6e 0c 6e 0a 38 54 54 6e 0c 6e 54 6e 0c 6e 0c 38 72 0a 39 12 38 54 6e 54 71 0e 01 28 54 13 6e”。
The command sequence of certain viral code present in default virus base is:“1238 54 71 0c 6e 0c 6e 0a 38 54 54 6e 0c 6e 54 6e 0c 6e 0c 38 72 0a 39 12 38 54 6e 54 71 0e 01 28 54 13 6e”。
By calculating the editing distance of above-mentioned two command sequence, editing distance=4 are obtained, it is assumed that predetermined threshold value It is 5, then finds that the editing distance of above-mentioned two command sequence is less than above-mentioned predetermined threshold value by comparing, Therefore can determine that the code of the program is the mutation of certain viral code in virus base, that is, viral code.One As, above-mentioned predetermined threshold value can be preset based on experience value.
Determination on above-mentioned predetermined threshold value can include various ways, for example:By artificially rule of thumb setting The fixed predetermined threshold value is how many, or determines above-mentioned predetermined threshold value according to certain computation rule.The application reality Apply in example, in order to improve the accuracy of identification viral code, performed in the virtual machine for judging the application program Whether comprising before viral code (step S104), methods described can also include file:
It is determined that the character sum of the function instruction sequence extracted.
The character sum of the function instruction sequence is defined as with the product of default value α (0 < α < 1) The predetermined threshold value.
For example, it may be determined that command sequence obtained above:“1238 54 71 0c 6e 0c 6e 0a 38 54 The word of the 6e of 54 6e 0c 6e, 54 54 71 0e of 6e 0c 6e 0c 38 72 0a, 39 12 38 54 6e 01 28 54 13 " Symbol sum is 72, then it is 0.05 (between 0~1) that can set default value, may finally be determined pre- If threshold value is 72*0.05 ≈ 4.Wherein, the default value also can be empirical value.By above-mentioned steps, The code of the function instruction sequence that similarity can be reached into more than 95% is defined as the mutation of viral code.
It should be noted that the application is not limited being carried out to malicious code using which kind of malicious code protectiving scheme Detection, it is for instance possible to use sample characteristics killing (characteristic value scanning) presented hereinbefore, based on virtual machine Killing or heuristic killing, it can in addition contain carry out similar sample clustering.Cluster on similar sample, Specifically, a large amount of code samples can be directed to be classified according to editing distance (similarity), at two When the editing distance of the function instruction sequence of code sample is less than predetermined threshold value, the two code samples are divided To in same classification, so as to realize the automatic cluster of a large amount of code samples.It is worth addressing, the application Also it is not restricted for matching algorithm, it is for instance possible to use fuzzy matching algorithm presented hereinbefore or similar Matching algorithm etc..
The flow of the method for judging viral code that Fig. 3 is provided for another embodiment of the application, including:
S201:Virtual machine execution file to application program carries out decompiling, obtains the function information of decompiling Structure;Above-mentioned application program can be mounted to the application program on mobile terminal.
S202:The function information structure of the decompiling is parsed, the function information structure of the decompiling is extracted In memonic symbol sequence;
S203:It is determined that between the memonic symbol sequence extracted and the memonic symbol sequence of default viral code Editing distance;
S204:Whether the editing distance for determining is judged less than predetermined threshold value, if the editing distance is less than Predetermined threshold value, it is determined that the virtual machine execution file of the application program includes viral code.
The present embodiment is similar with a upper embodiment, and used as a kind of alternative embodiment, its difference is:This reality Example is applied by extracting the memonic symbol sequence in function information structure, and is determined using memonic symbol sequence to be identified Application code and viral code between editing distance, finally further according to editing distance (similarity) Virtual machine execution file to determine application program includes viral code.Additionally, above-mentioned pre- on how to determine If the content of threshold value is referred to above-described embodiment, no longer repeated herein.
It can be seen that, in the method that above-described embodiment is provided, by the application program to being installed on intelligent terminal The analysis and decompiling of virtual machine execution file, can obtain the function of decompiling corresponding with the application program Function instruction sequence (or memonic symbol sequence) in message structure, and determine to extract using editing distance algorithm The function instruction sequence (or memonic symbol sequence) for arriving and default viral code function instruction sequence (or Memonic symbol sequence) between editing distance, finally it is determined that the editing distance be less than predetermined threshold value when, Determine that the virtual machine execution file of the application program includes viral code, it is possible thereby to accurate judge intelligence eventually Whether certain application program on end belongs to the character string by changing viral code reference (in dex files) To reach the program of purpose free to kill, so as to ensure the safety of intelligent terminal.
Fig. 4 is the module map of the device for judging viral code of offer in the embodiment of the application one.In the device The function of each unit is similar with the function of each step in the above method, therefore the device is referred to above method reality Apply the particular content of example.The device includes:
Decompiling unit 401, decompiling is carried out for the virtual machine execution file to application program, obtains anti- The function information structure of compiling;
Extraction unit 402, the function information structure for parsing the decompiling extracts the decompiling Function instruction sequence in function information structure;
Editing distance determining unit 403, for the function instruction sequence and default disease that determine to extract Editing distance between the function instruction sequence of malicious code;
Whether viral code determining unit 404, the editing distance for judging to determine is less than predetermined threshold value, When the editing distance is less than predetermined threshold value, determine that the virtual machine execution file of the application program includes disease Malicious code.
By said apparatus, can accurately judge whether certain application program on intelligent terminal belongs to by repairing Change viral code to quote the character string (in dex files) to reach the program of purpose free to kill, so that it is guaranteed that The safety of intelligent terminal.
Determination on above-mentioned predetermined threshold value can include various ways, for example:By artificially rule of thumb setting The fixed predetermined threshold value is how many, or determines above-mentioned predetermined threshold value according to certain computation rule.The application reality Apply in example, in order to improve the accuracy of identification viral code, described device also includes:
Number of characters determining unit, for it is determined that the virtual machine execution file of the application program includes viral generation Before code, it is determined that the character sum of the function instruction sequence extracted;
Predetermined threshold value determining unit, for multiplying the character of function instruction sequence sum and default value Product is defined as the predetermined threshold value;Wherein, the default value is between 0~1.
In the embodiment of the present application, the decompiling unit 102 includes:
Information structure obtaining unit, for being entered to virtual machine execution file according to virtual machine execution file form Row parsing, obtains the function information structure of each class;
Function information structure obtaining unit, for the field in the function information structure, determines institute Position and the size of the function of virtual machine execution file are stated, the function information structure of the decompiling is obtained.
In the embodiment of the present application, the function information structure obtaining unit is used for:
The function information structure is parsed, the bytecode of the function position of instruction virtual machine execution file is obtained The list length field of the function size of array field and instruction virtual machine execution file;
According to the bytecode array field and the list length field, determine that the virtual machine performs text The position of the function of part and size.
In another embodiment of the application, the device of above-mentioned judgement viral code, including:
Decompiling unit 401, decompiling is carried out for the virtual machine execution file to application program, obtains anti- The function information structure of compiling;Above-mentioned application program can be mounted to the application program on mobile terminal.
Extraction unit 402, the function information structure for parsing the decompiling extracts the decompiling Memonic symbol sequence in function information structure;
Editing distance determining unit 403, for the memonic symbol sequence and default virus that determine to extract Editing distance between the function instruction sequence of code;
Whether viral code determining unit 404, the editing distance for judging to determine is less than predetermined threshold value, When the editing distance is less than predetermined threshold value, determine that the virtual machine execution file of the application program includes disease Malicious code.
By said apparatus, can accurately judge whether certain application program on intelligent terminal belongs to by repairing Change viral code to quote the character string (in dex files) to reach the program of purpose free to kill, so that it is guaranteed that The safety of intelligent terminal.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or meter Calculation machine program product.Therefore, the application can be using complete hardware embodiment, complete software embodiment or knot Close the form of the embodiment in terms of software and hardware.And, the application can be used and wherein wrapped at one or more Containing computer usable program code computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) on implement computer program product form.
The application is produced with reference to the method according to the embodiment of the present application, equipment (system) and computer program The flow chart and/or block diagram of product is described.It should be understood that can by computer program instructions realize flow chart and / or block diagram in each flow and/or the flow in square frame and flow chart and/or block diagram and/ Or the combination of square frame.These computer program instructions to all-purpose computer, special-purpose computer, insertion can be provided The processor of formula processor or other programmable data processing devices is producing a machine so that by calculating The instruction of the computing device of machine or other programmable data processing devices is produced for realizing in flow chart one The device of the function of being specified in individual flow or multiple one square frame of flow and/or block diagram or multiple square frames.
These computer program instructions may be alternatively stored in can guide computer or the treatment of other programmable datas to set In the standby computer-readable memory for working in a specific way so that storage is in the computer-readable memory Instruction produce include the manufacture of command device, the command device realization in one flow of flow chart or multiple The function of being specified in one square frame of flow and/or block diagram or multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices, made Obtain and series of operation steps is performed on computer or other programmable devices to produce computer implemented place Reason, so as to the instruction performed on computer or other programmable devices is provided for realizing in flow chart one The step of function of being specified in flow or multiple one square frame of flow and/or block diagram or multiple square frames.
In a typical configuration, computing device includes one or more processors (CPU), input/defeated Outgoing interface, network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory And/or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory (RAM). Internal memory is the example of computer-readable medium.
Computer-readable medium includes that permanent and non-permanent, removable and non-removable media can be by appointing What method or technique realizes information Store.Information can be computer-readable instruction, data structure, program Module or other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), its The random access memory (RAM) of his type, read-only storage (ROM), electrically erasable are read-only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic cassette tape, tape magnetic Disk storage or other magnetic storage apparatus or any other non-transmission medium, can be used for storage can be calculated The information that equipment is accessed.Defined according to herein, computer-readable medium does not include temporary computer-readable matchmaker Body (transitory media), such as data-signal and carrier wave of modulation.
Also, it should be noted that term " including ", "comprising" or its any other variant be intended to non-row His property is included, so that process, method, commodity or equipment including a series of key elements not only include Those key elements, but also other key elements including being not expressly set out, or also include for this process, Method, commodity or the intrinsic key element of equipment.In the absence of more restrictions, by sentence " including One ... " key element that limits, it is not excluded that in the process including the key element, method, commodity or set Also there is other identical element in standby.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer journey Sequence product.Therefore, the application can using complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.And, the application can be used and wherein include calculating at one or more Machine usable program code computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, Optical memory etc.) on implement computer program product form.
Embodiments herein is the foregoing is only, the application is not limited to.For this area skill For art personnel, the application can have various modifications and variations.All institutes within spirit herein and principle Any modification, equivalent substitution and improvements of work etc., within the scope of should be included in claims hereof.

Claims (10)

1. it is a kind of judge viral code method, it is characterised in that including:
Virtual machine execution file to application program carries out decompiling, obtains the function information structure of decompiling;
The function information structure of the decompiling is parsed, the letter in the function information structure of the decompiling is extracted Number command sequence;
It is determined that between the function instruction sequence extracted and the function instruction sequence of default viral code Editing distance;
Whether the editing distance for determining is judged less than predetermined threshold value, if the editing distance is less than default threshold Value, it is determined that the virtual machine execution file of the application program includes viral code.
2. the method for claim 1, it is characterised in that in the editing distance for judging to determine Whether less than before predetermined threshold value, methods described also includes:
It is determined that the character sum of the function instruction sequence extracted;
The character sum of the function instruction sequence is defined as the predetermined threshold value with the product of default value; Wherein, the default value is between 0~1.
3. the method for claim 1, it is characterised in that the virtual machine to application program is held Style of writing part carries out decompiling, obtains the function information structure of decompiling, specifically includes:
Virtual machine execution file form according to application program is parsed to virtual machine execution file, obtains every The function information structure of individual class;
According to the field in the function information structure, the position of the function of the virtual machine execution file is determined Put and size, obtain the function information structure of the decompiling.
4. it is a kind of judge viral code method, it is characterised in that including:
Virtual machine execution file to application program carries out decompiling, obtains the function information structure of decompiling;
Parse the function information structure of the decompiling, helping in the function information structure of the extraction decompiling Note symbol sequence;
It is determined that the volume between the memonic symbol sequence extracted and the memonic symbol sequence of default viral code Collect distance;
Whether the editing distance for determining is judged less than predetermined threshold value, if the editing distance is less than default threshold Value, it is determined that the virtual machine execution file of the application program includes viral code.
5. method as claimed in claim 4, it is characterised in that in the editing distance for judging to determine Whether less than before predetermined threshold value, methods described also includes:
It is determined that the character sum of the memonic symbol sequence extracted;
The character sum of the memonic symbol sequence is defined as the predetermined threshold value with the product of default value;Its In, the default value is between 0~1.
6. it is a kind of judge viral code device, it is characterised in that the device includes:
Decompiling unit, decompiling is carried out for the virtual machine execution file to application program, obtains decompiling Function information structure;
Extraction unit, the function information structure for parsing the decompiling extracts the function of the decompiling Function instruction sequence in message structure;
Editing distance determining unit, for the function instruction sequence for determining to extract and default viral generation Editing distance between the function instruction sequence of code;
Whether viral code determining unit, the editing distance for judging to determine is less than predetermined threshold value, When the editing distance is less than predetermined threshold value, determine that the virtual machine execution file of the application program includes virus Code.
7. device as claimed in claim 6, it is characterised in that described device also includes:
Number of characters determining unit, for judge determine the editing distance whether less than predetermined threshold value it Before, it is determined that the character sum of the function instruction sequence extracted;
Predetermined threshold value determining unit, for multiplying the character of function instruction sequence sum and default value Product is defined as the predetermined threshold value;Wherein, the default value is between 0~1.
8. device as claimed in claim 6, it is characterised in that the decompiling unit includes:
Information structure obtaining unit, for according to the virtual machine execution file form of application program to virtual machine Perform file to be parsed, obtain the function information structure of each class;
Function information structure obtaining unit, for the field in the function information structure, determines institute Position and the size of the function of virtual machine execution file are stated, the function information structure of the decompiling is obtained.
9. it is a kind of judge viral code device, it is characterised in that the device includes:
Decompiling unit, decompiling is carried out for the virtual machine execution file to application program, obtains decompiling Function information structure;
Extraction unit, the function information structure for parsing the decompiling extracts the function of the decompiling Memonic symbol sequence in message structure;
Editing distance determining unit, for the memonic symbol sequence for determining to extract and default viral code Function instruction sequence between editing distance;
Whether viral code determining unit, the editing distance for judging to determine is less than predetermined threshold value, When the editing distance is less than predetermined threshold value, determine that the virtual machine execution file of the application program includes virus Code.
10. device as claimed in claim 9, it is characterised in that described device also includes:
Number of characters determining unit, for judge determine the editing distance whether less than predetermined threshold value it Before, it is determined that the character sum of the memonic symbol sequence extracted;
Predetermined threshold value determining unit, for the product of the total and default value by the character of the memonic symbol sequence It is defined as the predetermined threshold value;Wherein, the default value is between 0~1.
CN201510971165.8A 2015-12-22 2015-12-22 A kind of method and device for judging viral code Pending CN106909841A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510971165.8A CN106909841A (en) 2015-12-22 2015-12-22 A kind of method and device for judging viral code

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510971165.8A CN106909841A (en) 2015-12-22 2015-12-22 A kind of method and device for judging viral code

Publications (1)

Publication Number Publication Date
CN106909841A true CN106909841A (en) 2017-06-30

Family

ID=59200979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510971165.8A Pending CN106909841A (en) 2015-12-22 2015-12-22 A kind of method and device for judging viral code

Country Status (1)

Country Link
CN (1) CN106909841A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107547547A (en) * 2017-09-05 2018-01-05 成都知道创宇信息技术有限公司 A kind of TCP CC recognition methods based on editing distance
CN108446559A (en) * 2018-02-13 2018-08-24 北京兰云科技有限公司 A kind of recognition methods of APT tissue and device
CN108491718A (en) * 2018-02-13 2018-09-04 北京兰云科技有限公司 A kind of method and device for realizing information classification
CN108804920A (en) * 2018-05-24 2018-11-13 河南省躬行信息科技有限公司 A method of based on striding course behavior monitoring malicious code homology analysis
CN110225007A (en) * 2019-05-27 2019-09-10 国家计算机网络与信息安全管理中心 The clustering method of webshell data on flows and controller and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761475A (en) * 2013-12-30 2014-04-30 北京奇虎科技有限公司 Method and device for detecting malicious code in intelligent terminal
CN103902910A (en) * 2013-12-30 2014-07-02 北京奇虎科技有限公司 Method and device for detecting malicious codes in intelligent terminal

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761475A (en) * 2013-12-30 2014-04-30 北京奇虎科技有限公司 Method and device for detecting malicious code in intelligent terminal
CN103902910A (en) * 2013-12-30 2014-07-02 北京奇虎科技有限公司 Method and device for detecting malicious codes in intelligent terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵作鹏: "《面向煤矿应急管理的数据处理关键技术研究》", 30 November 2013 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107547547A (en) * 2017-09-05 2018-01-05 成都知道创宇信息技术有限公司 A kind of TCP CC recognition methods based on editing distance
CN107547547B (en) * 2017-09-05 2020-06-02 成都知道创宇信息技术有限公司 TCP CC identification method based on edit distance
CN108446559A (en) * 2018-02-13 2018-08-24 北京兰云科技有限公司 A kind of recognition methods of APT tissue and device
CN108491718A (en) * 2018-02-13 2018-09-04 北京兰云科技有限公司 A kind of method and device for realizing information classification
CN108491718B (en) * 2018-02-13 2022-03-04 北京兰云科技有限公司 Method and device for realizing information classification
CN108446559B (en) * 2018-02-13 2022-03-29 北京兰云科技有限公司 APT organization identification method and device
CN108804920A (en) * 2018-05-24 2018-11-13 河南省躬行信息科技有限公司 A method of based on striding course behavior monitoring malicious code homology analysis
CN108804920B (en) * 2018-05-24 2021-09-28 河南省躬行信息科技有限公司 Method for monitoring malicious code homology analysis based on cross-process behavior
CN110225007A (en) * 2019-05-27 2019-09-10 国家计算机网络与信息安全管理中心 The clustering method of webshell data on flows and controller and medium

Similar Documents

Publication Publication Date Title
CN106909841A (en) A kind of method and device for judging viral code
JP5992622B2 (en) Malicious application diagnostic apparatus and method
CN103761475B (en) Method and device for detecting malicious code in intelligent terminal
CN109564608A (en) Updating virtual memory addresses of target application functions for updated versions of application binary code
Kapratwar et al. Static and dynamic analysis of android malware
CN106250769B (en) A kind of the source code data detection method and device of multistage filtering
Cho et al. Security assessment of code obfuscation based on dynamic monitoring in android things
CN112148305B (en) Application detection method, device, computer equipment and readable storage medium
CN102446255B (en) Method and device for detecting page tamper
CN105653949B (en) A kind of malware detection methods and device
US11030393B2 (en) Estimation of document structure
CN108090360B (en) Behavior feature-based android malicious application classification method and system
US20240054802A1 (en) System and method for spatial encoding and feature generators for enhancing information extraction
Bhattacharya et al. DMDAM: data mining based detection of android malware
CN103473104A (en) Method for discriminating re-package of application based on keyword context frequency matrix
CN112817877B (en) Abnormal script detection method and device, computer equipment and storage medium
CN106803040A (en) Virus signature processing method and processing device
CN106874760A (en) A kind of Android malicious code sorting techniques based on hierarchy type SimHash
Linoy et al. Exploring Ethereum’s blockchain anonymity using smart contract code attribution
CN105631336B (en) Detect the system and method for the malicious file in mobile device
CN106909844A (en) The sorting technique and device of a kind of application program sample
CN107122663A (en) A kind of detection method for injection attack and device
Feichtner et al. Obfuscation-resilient code recognition in Android apps
CN106909839A (en) A kind of method and device for extracting sample code feature
Guo et al. WLTDroid: repackaging detection approach for android applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170630