CN108734215A - Software classification method and device - Google Patents
Software classification method and device Download PDFInfo
- Publication number
- CN108734215A CN108734215A CN201810489257.6A CN201810489257A CN108734215A CN 108734215 A CN108734215 A CN 108734215A CN 201810489257 A CN201810489257 A CN 201810489257A CN 108734215 A CN108734215 A CN 108734215A
- Authority
- CN
- China
- Prior art keywords
- software
- code
- sorted
- family
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/53—Decompilation; Disassembly
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/74—Reverse engineering; Extracting design information from source code
Abstract
The application provides a kind of software classification method and device, the method includes:Obtain the software code of multiple softwares to be sorted;According to the position where the code of calling system API in the software code of each software to be sorted, the software code of each software to be sorted is split as multiple code genetic fragments, obtain include the code genetic fragment of the multiple software to be sorted software gene pool;Clustering is carried out to multiple softwares to be sorted in the software gene pool according to the code genetic fragment, the software to be sorted is divided into multiple software families;Corresponding family's label is added for each software family.In this way, may be implemented to classify to software according to software code itself so that the classification information of software is more accurate and classification information can not forge.
Description
Technical field
This application involves technical field of software security, in particular to a kind of software classification method and device.
Background technology
With the continuous development of information technology, apply the software on various electronic equipments more and more, in various softwares
Appearance supplier is also more and more complicated, and correspondingly, software management and software security problem are increasingly taken seriously.Have to software
The classification of effect ground can facilitate the management of software, and can make to identify Malware or protect more targeted.For example, by same
The software that a malware content supplier provides is divided into one kind, can targetedly be identified to this type software
Filter.But in the prior art, usually classify to software according only to the class condition except software code itself, such as pass through
The software type or claim content provider that software incidental information is claimed, the information of these class conditions claimed may be pseudo-
It makes, causes software classification inaccurate.
Invention content
In order to overcome above-mentioned deficiency in the prior art, the application's is designed to provide a kind of software classification method, institute
The method of stating includes:
Obtain the software code of multiple softwares to be sorted;
It, will be each to be sorted soft according to the position where the code of calling system API in the software code of each software to be sorted
The software code of part is split as multiple code genetic fragments, obtain include the multiple software to be sorted code genetic fragment
Software gene pool;
Clustering is carried out to multiple softwares to be sorted in the software gene pool according to the code genetic fragment, it will
The software to be sorted is divided into multiple software families;
Corresponding family's label is added for each software family.
Optionally, the position in the software code according to each software to be sorted where the code of calling system API, will
The step of software code of each software to be sorted is split as multiple code genetic fragments, including:
For each software to be sorted, obtain in the code of each software to be sorted where the code of calling system API
Position will be each described using the part between the code of two adjacent calling system API as a code genetic fragment
The code of software to be sorted is split as multiple code genetic fragments.
Optionally, the software code for obtaining multiple softwares to be sorted may include:
Multiple softwares to be sorted are obtained, it is to be sorted to this by IDA disassemblers for each software to be sorted
Software carries out decompiling, obtains the software code of asm formats corresponding with the software to be sorted.
Optionally, described that multiple softwares to be sorted in the software gene pool are carried out according to the code genetic fragment
Clustering, the step of software to be sorted is divided into multiple software families, including:
According to the code genetic fragment by Affinity Propagation clustering algorithms to the multiple to be sorted
Software carries out clustering, and the multiple software to be sorted is divided into multiple software families.
The another object of the application is to provide a kind of software classification method, the method includes:
Obtain the software code for waiting for target software;
It is according to the position where the code of calling system API in the software code of the target software, the target is soft
The software code of part is split as multiple object code genetic fragments;
It, will be in the target software and preset software gene pool according to the object code genetic fragment of the target software
Software genetic fragment carry out clustering, wherein the software gene pool includes multiple software families, each the software
Family includes the code genetic fragment of at least one software, and each software family has corresponding family's label;
Family's label of the corresponding software family of the target software is obtained according to cluster analysis result.
Optionally, the position in the software code according to the target software where the code of calling system API, will
The software code of the target software is split as the step of multiple object code genetic fragments, including:
The position where the code of calling system API in the software code of the target software is obtained, with two adjacent tune
It uses the part between the code of system API as a code genetic fragment, the software code of the target software is split as
Multiple code genetic fragments.
The another object of the application is to provide a kind of software classification device, and described device includes:
Acquisition module, the software code for obtaining multiple softwares to be sorted;
Gene extraction module, according to the position where the code of calling system API in the software code of each software to be sorted,
The software code of each software to be sorted is split as multiple code genetic fragments, obtain include the multiple software to be sorted generation
The software gene pool of code genetic fragment;
Cluster module, for according to the code genetic fragment to multiple softwares to be sorted in the software gene pool into
The software to be sorted is divided into multiple software families by row clustering;
Mark module, for adding corresponding family's label for each software family.
Optionally, it is adjusted in the code for being specifically used for obtaining each software to be sorted for each software to be sorted
With the position where the code of system API, using the part between the code of two adjacent calling system API as a code
The code of each software to be sorted is split as multiple code genetic fragments by genetic fragment.
The another object of the application is to provide a kind of software classification device, and described device includes:
Acquisition module, for obtaining the software code for waiting for target software;
Gene extraction module, for according to where the code of calling system API in the software code of the target software
The software code of the target software is split as multiple object code genetic fragments by position;
Cluster module by the target software and is preset for the object code genetic fragment according to the target software
Software gene pool in software genetic fragment carry out clustering, wherein the software gene pool includes multiple software men
Race, each software family include the code genetic fragment of at least one software, and each software family has corresponding
Family's label;
Mark module, the family for obtaining the corresponding software family of the target software according to cluster analysis result marks
Label.
Optionally, the gene extraction module is specifically used for calling system API in the software code for obtaining the target software
Code where position, using the part between the code of two adjacent calling system API as a code genetic fragment,
The software code of the target software is split as multiple code genetic fragments.
In terms of existing technologies, the application has the advantages that:
Software classification method and device provided by the present application, will be soft by the system api interface called according to software code
Part is divided into code genetic fragment, and carry out clustering according to code genetic fragment is classified with software.In this way, may be implemented
Classified to software according to software code itself so that the classification information of software is more accurate and classification information can not forge.
Description of the drawings
It, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present application
Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the block diagram of electronic equipment provided by the embodiments of the present application;
Fig. 2 is the flow diagram for the software classification method that the application first embodiment provides;
Fig. 3 is the high-level schematic functional block diagram for the software classification device that the application first embodiment provides;
Fig. 4 is the flow diagram for the software classification method that the application second embodiment provides;
Fig. 5 is the high-level schematic functional block diagram for the software classification device that the application second embodiment provides.
Icon:100- electronic equipments;110 (210)-software classification device;The first acquisition modules of 111-;The first genes of 112-
Extraction module;The first cluster modules of 113-;114- first identifier modules;The second acquisition modules of 211-;The second genes of 212- extract
Module;The second cluster modules of 213-;214- second identifier modules;120- memories;130- processors.
Specific implementation mode
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In attached drawing, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described embodiment is
Some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is implemented
The component of example can be arranged and be designed with a variety of different configurations.
Therefore, below the detailed description of the embodiments herein to providing in the accompanying drawings be not intended to limit it is claimed
Scope of the present application, but be merely representative of the selected embodiment of the application.Based on the embodiment in the application, this field is common
The every other embodiment that technical staff is obtained without creative efforts belongs to the model of the application protection
It encloses.
It should be noted that:Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing.
In the description of the present application, it is also necessary to which explanation is unless specifically defined or limited otherwise, term " setting ",
" installation ", " connected ", " connection " shall be understood in a broad sense, for example, it may be fixedly connected, may be a detachable connection or one
Connect to body;It can be mechanical connection, can also be electrical connection;It can be directly connected, it can also be indirect by intermediary
It is connected, can is the connection inside two elements.For the ordinary skill in the art, on being understood with concrete condition
State the concrete meaning of term in this application.
Fig. 1 is please referred to, Fig. 1 is the block diagram of a kind of electronic equipment 100 provided in this embodiment.The electronic equipment
100 include software classification device 110 (210), memory 120, processor 130, communication unit 140.
The memory 120, processor 130 and 140 each element of communication unit are directly or indirectly electrical between each other
Connection, to realize the transmission or interaction of data.For example, these elements can pass through one or more communication bus or letter between each other
Number line, which is realized, to be electrically connected.The software classification device 110 (210) include it is at least one can be with software or firmware (firmware)
Form be stored in the memory 120 or be solidificated in the operating system (operating of the electronic equipment 100
System, OS) in software function module.The processor 130 is for executing the executable mould stored in the memory 120
Block, such as software function module included by the software classification device 110 (210) and computer program etc..
Wherein, the memory 120 may be, but not limited to, random access memory (Random Access
Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable
Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only
Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only
Memory, EEPROM) etc..Wherein, memory 120 is for storing program, the processor 130 after receiving and executing instruction,
Execute described program.
First embodiment
Fig. 2 is please referred to, Fig. 2 is a kind of flow of application message acquisition methods applied to electronic equipment 100 shown in FIG. 1
Figure, below will be to the method includes each steps to be described in detail.
Step S110 obtains the software code of multiple softwares to be sorted.
In the present embodiment, the software to be sorted can be the file of the file either ELF types of PE types.It is described
Electronic equipment 100 can be compiled by IDA disassemblers by software to be sorted is counter after getting and obtaining multiple softwares to be sorted
It is translated into the pending code of asm formats, wherein the IDA disassemblers is a kind of disassembler plug-in unit of interactive mode, can
With by the format of software decompilation bit combination language.
It can be by the multiple software decompilation to be sorted at unified asm formats, after being conducive to by step S110
The clustering of step.
Step S120 will be each according to the position where the code of calling system API in the software code of each software to be sorted
The software code of software to be sorted is split as multiple code genetic fragments, obtain include the multiple software to be sorted code base
Because of the software gene pool of segment.
In the present embodiment, the software code to each software to be sorted is needed to split, the principle of fractionation is normal
In the case of operation (no external terminal or internal collapse), no matter any input, each code snippet split out should be can
To be individually completely performed or not be performed completely individually, that is to say, that the code snippet split out can do one small
Entirety express, that is, the code snippet split out has gene atomicity.
Through inventor the study found that many API can be called in software running process, wherein if the API called is that this is soft
Part API itself, then needing to rely on the API return values subsequent action can execute in software inhouse;If the API called is
Unite API, then needs the API return values for waiting for peripheral operation system that can just continue to execute subsequent step.That is, adjacent two
Part between the code of calling system API usually when never calling system API, can be completely performed.
Therefore in the present embodiment, according to the position where the code of calling system API in the software code of each software to be sorted
It sets, the software code of each software to be sorted is split as multiple code genetic fragments.Specifically, in the present embodiment, institute is obtained
State the position where the code of calling system API in pending code.With between the code of two adjacent calling system API
Part is used as a code genetic fragment, and the software code of each software to be sorted is split as multiple code genetic fragments.
After fractionation, the code genetic fragment of the multiple software to be sorted constitutes a software gene pool.
Step S130 gathers multiple softwares to be sorted in the software gene pool according to the code genetic fragment
The software to be sorted is divided into multiple software families by alanysis.
In the present embodiment, according to the code genetic fragment by Affinity Propagation clustering algorithms to institute
It states multiple softwares to be sorted and carries out clustering, the multiple software to be sorted is divided into multiple software families.
By clustering algorithm, each software of the electronic equipment 100 automatically first in the software gene pool is made to carry out
The multiple software to be sorted is divided into multiple softwares by clustering according to the similarity relation between the code genetic fragment
Family.
Step S140 adds corresponding family's label for each software family.
After step S130 classification, it is each software man that the electronic equipment 100, which can respond user's operation,
Family's label of race's addition response.Software gene pool finally formed in this way includes multiple software families, each software
Family includes the code genetic fragment of at least one software, and each software family has corresponding family's label.Follow-up
In use, the unknown software identified can will be needed to do clustering with the software in the software gene pool, obtain to be identified
Which software family software belongs to.
Correspondingly, Fig. 3 is please referred to, the present embodiment also provides a kind of software classification device 110, the software classification device
110 include the first acquisition module 111, the first gene extraction module 112, the first cluster module 113 and first identifier module 114.
First acquisition module 111, the software code for obtaining multiple softwares to be sorted;
In the present embodiment, first acquisition module 111 can be used for executing step S110 (210) shown in Fig. 2, about institute
Description to the step S110 (210) can be joined by stating the specific descriptions of the first acquisition module 111.
The first gene extraction module 112, according to the code of calling system API in the software code of each software to be sorted
The software code of each software to be sorted is split as multiple code genetic fragments, obtains including the multiple wait for by the position at place
The software gene pool of the code genetic fragment of classification software;
In the present embodiment, the first gene extraction module 112 can be used for executing step S120 shown in Fig. 2, about institute
Description to the step S120 can be joined by stating the specific descriptions of the first gene extraction module 112.
First cluster module 113 is used for according to the code genetic fragment to multiple in the software gene pool
Software to be sorted carries out clustering, and the software to be sorted is divided into multiple software families;
In the present embodiment, first cluster module 113 can be used for executing step S130 shown in Fig. 2, about described
The specific descriptions of one cluster module 113 can join the description to the step S130.
The first identifier module 114, for adding corresponding family's label for each software family.
In the present embodiment, the first identifier module 114 can be used for executing step S140 shown in Fig. 2, about described
The specific descriptions of one mark module 114 can join the description to the step S140.
Optionally, the gene extraction module is specifically used for, for each software to be sorted, it is each to be sorted soft obtaining this
Position in the code of part where the code of calling system API, with the part between the code of two adjacent calling system API
As a code genetic fragment, the code of each software to be sorted is split as multiple code genetic fragments.
Second embodiment
Fig. 4 is please referred to, Fig. 4 is a kind of Malware applied to electronic equipment 100 shown in FIG. 1 provided in this embodiment
The flow chart of recognition methods, below will be to the method includes each steps to be described in detail.
Step S210 obtains the software code for waiting for target software.
Step S220, according to the position where the code of calling system API in the software code of the target software, by institute
The software code for stating target software is split as multiple object code genetic fragments;
Wherein, to step S110 in the processing procedure first embodiment of the target software in step S210 and step S220
(210) similar to the processing procedure of single software to be sorted and in step S120, refer to the first implementation steps S110
(210) and the description of step S120.
For example, in step S220, the electronic equipment 100 can obtain in the software code of the target software and call system
Position where the code of system API, using the part between the code of two adjacent calling system API as a code gene
The software code of the target software is split as multiple code genetic fragments by segment.
Step S230, according to the object code genetic fragment of the target software, by the target software with it is preset soft
Software genetic fragment in part gene pool carries out clustering, wherein the software gene pool includes multiple software families, often
A software family includes the code genetic fragment of at least one software, and each software family marks with corresponding family
Label;
Wherein, the software gene pool used in the present embodiment can be the software gene pool that first embodiment provides.
In the present embodiment, after the object code genetic fragment that the target software is extracted by step S210 and step S220, with
Code genetic fragment in the software gene pool carries out clustering, it can be deduced that the target software and the software gene
The code genetic fragment of which software has inherent similitude in library.
Step S240 obtains family's label of the corresponding software family of the target software according to cluster analysis result.
In the present embodiment, the electronic equipment 100 show which software the target software belongs to by clustering
After family, family's label of the corresponding software family of the target software can be exported.
Correspondingly, Fig. 5 is please referred to, the present embodiment also provides a kind of software classification device 210, the software classification device
110 include the second acquisition module 211, the second gene extraction module 212, the second cluster module 213 i.e. second identifier module 214.
Second acquisition module 211, for obtaining the software code for waiting for target software;
In the present embodiment, second acquisition module 211 can be used for executing step S210 shown in Fig. 4, about described
The specific descriptions of two acquisition modules 211 can join the description to the step S210.
The second gene extraction module 212, for according to calling system API in the software code of the target software
The software code of the target software is split as multiple object code genetic fragments by the position where code;
In the present embodiment, the second gene extraction module 212 can be used for executing step S220 shown in Fig. 4, about institute
Description to the step S220 can be joined by stating the specific descriptions of the second gene extraction module 212.
Second cluster module 213, for the object code genetic fragment according to the target software, by the target
Software genetic fragment in software and preset software gene pool carries out clustering, wherein the software gene pool includes
Multiple software families, each software family include the code genetic fragment of at least one software, each software family
With corresponding family's label;
In the present embodiment, second cluster module 213 can be used for executing step S230 shown in Fig. 4, about described
The specific descriptions of two cluster modules 213 can join the description to the step S230.
The second identifier module 214, for obtaining the corresponding software man of the target software according to cluster analysis result
Family's label of race.
In the present embodiment, the second identifier module 214 can be used for executing step S240 shown in Fig. 4, about described
The specific descriptions of two mark modules 214 can join the description to the step S240.
Optionally the second gene extraction module 212 is specifically used for calling system in the software code for obtaining the target software
Position where the code of system API, using the part between the code of two adjacent calling system API as a code gene
The software code of the target software is split as multiple code genetic fragments by segment.
In conclusion software classification method and device provided by the present application, passes through the system API called according to software code
Interface divides software into code genetic fragment, and carry out clustering according to code genetic fragment is classified with software.In this way,
May be implemented to classify to software according to software code itself so that the classification information of software is more accurate and classification information without
Method is forged.
In embodiment provided herein, it should be understood that disclosed device and method, it can also be by other
Mode realize.The apparatus embodiments described above are merely exemplary, for example, the flow chart and block diagram in attached drawing are shown
According to the device, the architectural framework in the cards of method and computer program product, function of multiple embodiments of the application
And operation.In this regard, each box in flowchart or block diagram can represent one of a module, section or code
Point, a part for the module, section or code includes one or more for implementing the specified logical function executable
Instruction.It should also be noted that at some as in the realization method replaced, the function of being marked in box can also be attached to be different from
The sequence marked in figure occurs.For example, two continuous boxes can essentially be basically executed in parallel, they also may be used sometimes
To execute in the opposite order, this is depended on the functions involved.It is also noted that each of block diagram and or flow chart
The combination of box in box and block diagram and or flow chart, function or the dedicated of action are based on as defined in execution
The system of hardware is realized, or can be realized using a combination of dedicated hardware and computer instructions.
In addition, each function module in each embodiment of the application can integrate to form an independent portion
Point, can also be modules individualism, can also two or more modules be integrated to form an independent part.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module
It is stored in a computer read/write memory medium.Based on this understanding, the technical solution of the application is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be expressed in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be
People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of step.
And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic disc or CD.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
The above, the only specific implementation mode of the application, but the protection domain of the application is not limited thereto, it is any
Those familiar with the art can easily think of the change or the replacement in the technical scope that the application discloses, and should all contain
It covers within the protection domain of the application.Therefore, the protection domain of the application shall be subject to the protection scope of the claim.
Claims (10)
1. a kind of software classification method, which is characterized in that the method includes:
Obtain the software code of multiple softwares to be sorted;
According to the position where the code of calling system API in the software code of each software to be sorted, by each software to be sorted
Software code is split as multiple code genetic fragments, obtain include the code genetic fragment of the multiple software to be sorted software
Gene pool;
Clustering is carried out to multiple softwares to be sorted in the software gene pool according to the code genetic fragment, it will be described
Software to be sorted is divided into multiple software families;
Corresponding family's label is added for each software family.
2. according to the method described in claim 1, it is characterized in that, being called in the software code according to each software to be sorted
Position where the code of system API, the step of software code of each software to be sorted is split as multiple code genetic fragments,
Including:
For each software to be sorted, the position where the code of calling system API in the code of each software to be sorted is obtained
It sets, using the part between the code of two adjacent calling system API as a code genetic fragment, described is waited for each point
The code of class software is split as multiple code genetic fragments.
3. according to the method described in claim 1, it is characterized in that, it is described according to the code genetic fragment to the software base
The step of carrying out clustering because of multiple softwares to be sorted in library, the software to be sorted be divided into multiple software families,
Including:
According to the code genetic fragment by Affinity Propagation clustering algorithms to the multiple software to be sorted
Clustering is carried out, the multiple software to be sorted is divided into multiple software families.
4. according to the method described in claim 1, it is characterized in that, the software code for obtaining multiple softwares to be sorted can wrap
It includes:
Multiple softwares to be sorted are obtained, for each software to be sorted, by IDA disassemblers to the software to be sorted
Decompiling is carried out, the software code of asm formats corresponding with the software to be sorted is obtained.
5. a kind of software classification method, which is characterized in that the method includes:
Obtain the software code for waiting for target software;
According to the position where the code of calling system API in the software code of the target software, by the target software
Software code is split as multiple object code genetic fragments;
According to the object code genetic fragment of the target software, by the target software with it is soft in preset software gene pool
Part genetic fragment carries out clustering, wherein the software gene pool includes multiple software families, each software family
Code genetic fragment including at least one software, each software family have corresponding family's label;
Family's label of the corresponding software family of the target software is obtained according to cluster analysis result.
6. according to the method described in claim 5, it is characterized in that, being called in the software code according to the target software
The software code of the target software is split as multiple object code genetic fragments by the position where the code of system API
Step, including:
The position where the code of calling system API in the software code of the target software is obtained, system is called with adjacent two
The software code of the target software is split as multiple by the part between the code of system API as a code genetic fragment
Code genetic fragment.
7. a kind of software classification device, which is characterized in that described device includes:
Acquisition module, the software code for obtaining multiple softwares to be sorted;
Gene extraction module will be each according to the position where the code of calling system API in the software code of each software to be sorted
The software code of software to be sorted is split as multiple code genetic fragments, obtain include the multiple software to be sorted code base
Because of the software gene pool of segment;
Cluster module, for being gathered to multiple softwares to be sorted in the software gene pool according to the code genetic fragment
The software to be sorted is divided into multiple software families by alanysis;
Mark module, for adding corresponding family's label for each software family.
8. device according to claim 7, which is characterized in that the gene extraction module is specifically used for waiting for point for each
Class software obtains the position where the code of calling system API in the code of each software to be sorted, with two adjacent tune
It uses the part between the code of system API as a code genetic fragment, the code of each software to be sorted is torn open
It is divided into multiple code genetic fragments.
9. a kind of software classification device, which is characterized in that described device includes:
Acquisition module, for obtaining the software code for waiting for target software;
Gene extraction module is used for according to the position where the code of calling system API in the software code of the target software,
The software code of the target software is split as multiple object code genetic fragments;
Cluster module, for according to the object code genetic fragment of the target software, by the target software with it is preset soft
Software genetic fragment in part gene pool carries out clustering, wherein the software gene pool includes multiple software families, often
A software family includes the code genetic fragment of at least one software, and each software family marks with corresponding family
Label;
Mark module, for obtaining family's label of the corresponding software family of the target software according to cluster analysis result.
10. device according to claim 9, which is characterized in that the gene extraction module is specifically used for obtaining the target
Position in the software code of software where the code of calling system API, between the code of two adjacent calling system API
Part as a code genetic fragment, the software code of the target software is split as multiple code genetic fragments.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810489257.6A CN108734215A (en) | 2018-05-21 | 2018-05-21 | Software classification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810489257.6A CN108734215A (en) | 2018-05-21 | 2018-05-21 | Software classification method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108734215A true CN108734215A (en) | 2018-11-02 |
Family
ID=63937745
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810489257.6A Pending CN108734215A (en) | 2018-05-21 | 2018-05-21 | Software classification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108734215A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508546A (en) * | 2018-11-12 | 2019-03-22 | 杭州安恒信息技术股份有限公司 | A kind of software homology analysis method and device based on software gene |
CN111290775A (en) * | 2020-04-02 | 2020-06-16 | 麒麟软件有限公司 | Automatic classification method and system for software package types of Linux system |
CN113536308A (en) * | 2021-06-11 | 2021-10-22 | 中国人民解放军战略支援部队信息工程大学 | Binary code tracing method for multi-granularity information fusion under software gene view angle |
CN114254316A (en) * | 2021-11-29 | 2022-03-29 | 上海戎磐网络科技有限公司 | Software identification method and device based on software gene and storage medium |
WO2022121146A1 (en) * | 2020-12-07 | 2022-06-16 | 中山大学 | Method and apparatus for determining importance of code segment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080005796A1 (en) * | 2006-06-30 | 2008-01-03 | Ben Godwood | Method and system for classification of software using characteristics and combinations of such characteristics |
US20110154495A1 (en) * | 2009-12-21 | 2011-06-23 | Stranne Odd Wandenor | Malware identification and scanning |
CN103902906A (en) * | 2013-12-25 | 2014-07-02 | 武汉安天信息技术有限责任公司 | Mobile terminal malicious code detecting method and system based on application icon |
CN104866765A (en) * | 2015-06-03 | 2015-08-26 | 康绯 | Behavior characteristic similarity-based malicious code homology analysis method |
-
2018
- 2018-05-21 CN CN201810489257.6A patent/CN108734215A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080005796A1 (en) * | 2006-06-30 | 2008-01-03 | Ben Godwood | Method and system for classification of software using characteristics and combinations of such characteristics |
US20110154495A1 (en) * | 2009-12-21 | 2011-06-23 | Stranne Odd Wandenor | Malware identification and scanning |
CN103902906A (en) * | 2013-12-25 | 2014-07-02 | 武汉安天信息技术有限责任公司 | Mobile terminal malicious code detecting method and system based on application icon |
CN104866765A (en) * | 2015-06-03 | 2015-08-26 | 康绯 | Behavior characteristic similarity-based malicious code homology analysis method |
Non-Patent Citations (1)
Title |
---|
HANJIN等: "《2017 International Conferenceon Cyber-Enabled Distributed Computingand Knowledge Discovery》", 30 December 2017 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508546A (en) * | 2018-11-12 | 2019-03-22 | 杭州安恒信息技术股份有限公司 | A kind of software homology analysis method and device based on software gene |
CN111290775A (en) * | 2020-04-02 | 2020-06-16 | 麒麟软件有限公司 | Automatic classification method and system for software package types of Linux system |
WO2022121146A1 (en) * | 2020-12-07 | 2022-06-16 | 中山大学 | Method and apparatus for determining importance of code segment |
CN113536308A (en) * | 2021-06-11 | 2021-10-22 | 中国人民解放军战略支援部队信息工程大学 | Binary code tracing method for multi-granularity information fusion under software gene view angle |
CN113536308B (en) * | 2021-06-11 | 2023-01-06 | 中国人民解放军战略支援部队信息工程大学 | Binary code tracing method for multi-granularity information fusion under software gene view angle |
CN114254316A (en) * | 2021-11-29 | 2022-03-29 | 上海戎磐网络科技有限公司 | Software identification method and device based on software gene and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108734215A (en) | Software classification method and device | |
CN109271512B (en) | Emotion analysis method, device and storage medium for public opinion comment information | |
CN108734012A (en) | Malware recognition methods, device and electronic equipment | |
CN105868166B (en) | Regular expression generation method and system | |
CN105787366A (en) | Android software visualization safety analysis method based on module relations | |
CN106897072A (en) | Traffic engineered call method, device and electronic equipment | |
CN104580093A (en) | Processing method, device and system for notification messages of websites | |
CN108009435A (en) | Data desensitization method, device and storage medium | |
US11695791B2 (en) | System for extracting, classifying, and enriching cyber criminal communication data | |
CN104769598A (en) | Systems and methods for detecting illegitimate applications | |
CN110119340A (en) | Method for monitoring abnormality, device, electronic equipment and storage medium | |
CN109495479A (en) | A kind of user's abnormal behaviour recognition methods and device | |
KR20150083627A (en) | Method for detecting malignant code of android by activity string analysis | |
US11580220B2 (en) | Methods and apparatus for unknown sample classification using agglomerative clustering | |
CN104640105A (en) | Method and system for mobile phone virus analyzing and threat associating | |
KR102516454B1 (en) | Method and apparatus for generating summary of url for url clustering | |
CN112738094A (en) | Expandable network security vulnerability monitoring method, system, terminal and storage medium | |
CN108512822B (en) | Risk identification method and device for data processing event | |
CN105425997B (en) | A kind of user terminal restart after interface display method and user terminal | |
CN112437034A (en) | False terminal detection method and device, storage medium and electronic device | |
CN103246846A (en) | Method and device for detecting safety of customized ROM (read only memory) | |
CN105227528A (en) | To detection method and the device of the attack of Web server group | |
EP4266200A1 (en) | Generating device, generating method, and generating program | |
CN115423030A (en) | Equipment identification method and device | |
CN109471920A (en) | A kind of method, apparatus of Text Flag, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181102 |