CN108734215A - Software classification method and device - Google Patents

Software classification method and device Download PDF

Info

Publication number
CN108734215A
CN108734215A CN201810489257.6A CN201810489257A CN108734215A CN 108734215 A CN108734215 A CN 108734215A CN 201810489257 A CN201810489257 A CN 201810489257A CN 108734215 A CN108734215 A CN 108734215A
Authority
CN
China
Prior art keywords
software
code
sorted
family
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810489257.6A
Other languages
Chinese (zh)
Inventor
刘旭
胡逸漪
章丽娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Junpan Network Technology Co Ltd
Original Assignee
Shanghai Junpan Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Junpan Network Technology Co Ltd filed Critical Shanghai Junpan Network Technology Co Ltd
Priority to CN201810489257.6A priority Critical patent/CN108734215A/en
Publication of CN108734215A publication Critical patent/CN108734215A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/74Reverse engineering; Extracting design information from source code

Abstract

The application provides a kind of software classification method and device, the method includes:Obtain the software code of multiple softwares to be sorted;According to the position where the code of calling system API in the software code of each software to be sorted, the software code of each software to be sorted is split as multiple code genetic fragments, obtain include the code genetic fragment of the multiple software to be sorted software gene pool;Clustering is carried out to multiple softwares to be sorted in the software gene pool according to the code genetic fragment, the software to be sorted is divided into multiple software families;Corresponding family's label is added for each software family.In this way, may be implemented to classify to software according to software code itself so that the classification information of software is more accurate and classification information can not forge.

Description

Software classification method and device
Technical field
This application involves technical field of software security, in particular to a kind of software classification method and device.
Background technology
With the continuous development of information technology, apply the software on various electronic equipments more and more, in various softwares Appearance supplier is also more and more complicated, and correspondingly, software management and software security problem are increasingly taken seriously.Have to software The classification of effect ground can facilitate the management of software, and can make to identify Malware or protect more targeted.For example, by same The software that a malware content supplier provides is divided into one kind, can targetedly be identified to this type software Filter.But in the prior art, usually classify to software according only to the class condition except software code itself, such as pass through The software type or claim content provider that software incidental information is claimed, the information of these class conditions claimed may be pseudo- It makes, causes software classification inaccurate.
Invention content
In order to overcome above-mentioned deficiency in the prior art, the application's is designed to provide a kind of software classification method, institute The method of stating includes:
Obtain the software code of multiple softwares to be sorted;
It, will be each to be sorted soft according to the position where the code of calling system API in the software code of each software to be sorted The software code of part is split as multiple code genetic fragments, obtain include the multiple software to be sorted code genetic fragment Software gene pool;
Clustering is carried out to multiple softwares to be sorted in the software gene pool according to the code genetic fragment, it will The software to be sorted is divided into multiple software families;
Corresponding family's label is added for each software family.
Optionally, the position in the software code according to each software to be sorted where the code of calling system API, will The step of software code of each software to be sorted is split as multiple code genetic fragments, including:
For each software to be sorted, obtain in the code of each software to be sorted where the code of calling system API Position will be each described using the part between the code of two adjacent calling system API as a code genetic fragment The code of software to be sorted is split as multiple code genetic fragments.
Optionally, the software code for obtaining multiple softwares to be sorted may include:
Multiple softwares to be sorted are obtained, it is to be sorted to this by IDA disassemblers for each software to be sorted Software carries out decompiling, obtains the software code of asm formats corresponding with the software to be sorted.
Optionally, described that multiple softwares to be sorted in the software gene pool are carried out according to the code genetic fragment Clustering, the step of software to be sorted is divided into multiple software families, including:
According to the code genetic fragment by Affinity Propagation clustering algorithms to the multiple to be sorted Software carries out clustering, and the multiple software to be sorted is divided into multiple software families.
The another object of the application is to provide a kind of software classification method, the method includes:
Obtain the software code for waiting for target software;
It is according to the position where the code of calling system API in the software code of the target software, the target is soft The software code of part is split as multiple object code genetic fragments;
It, will be in the target software and preset software gene pool according to the object code genetic fragment of the target software Software genetic fragment carry out clustering, wherein the software gene pool includes multiple software families, each the software Family includes the code genetic fragment of at least one software, and each software family has corresponding family's label;
Family's label of the corresponding software family of the target software is obtained according to cluster analysis result.
Optionally, the position in the software code according to the target software where the code of calling system API, will The software code of the target software is split as the step of multiple object code genetic fragments, including:
The position where the code of calling system API in the software code of the target software is obtained, with two adjacent tune It uses the part between the code of system API as a code genetic fragment, the software code of the target software is split as Multiple code genetic fragments.
The another object of the application is to provide a kind of software classification device, and described device includes:
Acquisition module, the software code for obtaining multiple softwares to be sorted;
Gene extraction module, according to the position where the code of calling system API in the software code of each software to be sorted, The software code of each software to be sorted is split as multiple code genetic fragments, obtain include the multiple software to be sorted generation The software gene pool of code genetic fragment;
Cluster module, for according to the code genetic fragment to multiple softwares to be sorted in the software gene pool into The software to be sorted is divided into multiple software families by row clustering;
Mark module, for adding corresponding family's label for each software family.
Optionally, it is adjusted in the code for being specifically used for obtaining each software to be sorted for each software to be sorted With the position where the code of system API, using the part between the code of two adjacent calling system API as a code The code of each software to be sorted is split as multiple code genetic fragments by genetic fragment.
The another object of the application is to provide a kind of software classification device, and described device includes:
Acquisition module, for obtaining the software code for waiting for target software;
Gene extraction module, for according to where the code of calling system API in the software code of the target software The software code of the target software is split as multiple object code genetic fragments by position;
Cluster module by the target software and is preset for the object code genetic fragment according to the target software Software gene pool in software genetic fragment carry out clustering, wherein the software gene pool includes multiple software men Race, each software family include the code genetic fragment of at least one software, and each software family has corresponding Family's label;
Mark module, the family for obtaining the corresponding software family of the target software according to cluster analysis result marks Label.
Optionally, the gene extraction module is specifically used for calling system API in the software code for obtaining the target software Code where position, using the part between the code of two adjacent calling system API as a code genetic fragment, The software code of the target software is split as multiple code genetic fragments.
In terms of existing technologies, the application has the advantages that:
Software classification method and device provided by the present application, will be soft by the system api interface called according to software code Part is divided into code genetic fragment, and carry out clustering according to code genetic fragment is classified with software.In this way, may be implemented Classified to software according to software code itself so that the classification information of software is more accurate and classification information can not forge.
Description of the drawings
It, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present application Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the block diagram of electronic equipment provided by the embodiments of the present application;
Fig. 2 is the flow diagram for the software classification method that the application first embodiment provides;
Fig. 3 is the high-level schematic functional block diagram for the software classification device that the application first embodiment provides;
Fig. 4 is the flow diagram for the software classification method that the application second embodiment provides;
Fig. 5 is the high-level schematic functional block diagram for the software classification device that the application second embodiment provides.
Icon:100- electronic equipments;110 (210)-software classification device;The first acquisition modules of 111-;The first genes of 112- Extraction module;The first cluster modules of 113-;114- first identifier modules;The second acquisition modules of 211-;The second genes of 212- extract Module;The second cluster modules of 213-;214- second identifier modules;120- memories;130- processors.
Specific implementation mode
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is implemented The component of example can be arranged and be designed with a variety of different configurations.
Therefore, below the detailed description of the embodiments herein to providing in the accompanying drawings be not intended to limit it is claimed Scope of the present application, but be merely representative of the selected embodiment of the application.Based on the embodiment in the application, this field is common The every other embodiment that technical staff is obtained without creative efforts belongs to the model of the application protection It encloses.
It should be noted that:Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing.
In the description of the present application, it is also necessary to which explanation is unless specifically defined or limited otherwise, term " setting ", " installation ", " connected ", " connection " shall be understood in a broad sense, for example, it may be fixedly connected, may be a detachable connection or one Connect to body;It can be mechanical connection, can also be electrical connection;It can be directly connected, it can also be indirect by intermediary It is connected, can is the connection inside two elements.For the ordinary skill in the art, on being understood with concrete condition State the concrete meaning of term in this application.
Fig. 1 is please referred to, Fig. 1 is the block diagram of a kind of electronic equipment 100 provided in this embodiment.The electronic equipment 100 include software classification device 110 (210), memory 120, processor 130, communication unit 140.
The memory 120, processor 130 and 140 each element of communication unit are directly or indirectly electrical between each other Connection, to realize the transmission or interaction of data.For example, these elements can pass through one or more communication bus or letter between each other Number line, which is realized, to be electrically connected.The software classification device 110 (210) include it is at least one can be with software or firmware (firmware) Form be stored in the memory 120 or be solidificated in the operating system (operating of the electronic equipment 100 System, OS) in software function module.The processor 130 is for executing the executable mould stored in the memory 120 Block, such as software function module included by the software classification device 110 (210) and computer program etc..
Wherein, the memory 120 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..Wherein, memory 120 is for storing program, the processor 130 after receiving and executing instruction, Execute described program.
First embodiment
Fig. 2 is please referred to, Fig. 2 is a kind of flow of application message acquisition methods applied to electronic equipment 100 shown in FIG. 1 Figure, below will be to the method includes each steps to be described in detail.
Step S110 obtains the software code of multiple softwares to be sorted.
In the present embodiment, the software to be sorted can be the file of the file either ELF types of PE types.It is described Electronic equipment 100 can be compiled by IDA disassemblers by software to be sorted is counter after getting and obtaining multiple softwares to be sorted It is translated into the pending code of asm formats, wherein the IDA disassemblers is a kind of disassembler plug-in unit of interactive mode, can With by the format of software decompilation bit combination language.
It can be by the multiple software decompilation to be sorted at unified asm formats, after being conducive to by step S110 The clustering of step.
Step S120 will be each according to the position where the code of calling system API in the software code of each software to be sorted The software code of software to be sorted is split as multiple code genetic fragments, obtain include the multiple software to be sorted code base Because of the software gene pool of segment.
In the present embodiment, the software code to each software to be sorted is needed to split, the principle of fractionation is normal In the case of operation (no external terminal or internal collapse), no matter any input, each code snippet split out should be can To be individually completely performed or not be performed completely individually, that is to say, that the code snippet split out can do one small Entirety express, that is, the code snippet split out has gene atomicity.
Through inventor the study found that many API can be called in software running process, wherein if the API called is that this is soft Part API itself, then needing to rely on the API return values subsequent action can execute in software inhouse;If the API called is Unite API, then needs the API return values for waiting for peripheral operation system that can just continue to execute subsequent step.That is, adjacent two Part between the code of calling system API usually when never calling system API, can be completely performed.
Therefore in the present embodiment, according to the position where the code of calling system API in the software code of each software to be sorted It sets, the software code of each software to be sorted is split as multiple code genetic fragments.Specifically, in the present embodiment, institute is obtained State the position where the code of calling system API in pending code.With between the code of two adjacent calling system API Part is used as a code genetic fragment, and the software code of each software to be sorted is split as multiple code genetic fragments. After fractionation, the code genetic fragment of the multiple software to be sorted constitutes a software gene pool.
Step S130 gathers multiple softwares to be sorted in the software gene pool according to the code genetic fragment The software to be sorted is divided into multiple software families by alanysis.
In the present embodiment, according to the code genetic fragment by Affinity Propagation clustering algorithms to institute It states multiple softwares to be sorted and carries out clustering, the multiple software to be sorted is divided into multiple software families.
By clustering algorithm, each software of the electronic equipment 100 automatically first in the software gene pool is made to carry out The multiple software to be sorted is divided into multiple softwares by clustering according to the similarity relation between the code genetic fragment Family.
Step S140 adds corresponding family's label for each software family.
After step S130 classification, it is each software man that the electronic equipment 100, which can respond user's operation, Family's label of race's addition response.Software gene pool finally formed in this way includes multiple software families, each software Family includes the code genetic fragment of at least one software, and each software family has corresponding family's label.Follow-up In use, the unknown software identified can will be needed to do clustering with the software in the software gene pool, obtain to be identified Which software family software belongs to.
Correspondingly, Fig. 3 is please referred to, the present embodiment also provides a kind of software classification device 110, the software classification device 110 include the first acquisition module 111, the first gene extraction module 112, the first cluster module 113 and first identifier module 114.
First acquisition module 111, the software code for obtaining multiple softwares to be sorted;
In the present embodiment, first acquisition module 111 can be used for executing step S110 (210) shown in Fig. 2, about institute Description to the step S110 (210) can be joined by stating the specific descriptions of the first acquisition module 111.
The first gene extraction module 112, according to the code of calling system API in the software code of each software to be sorted The software code of each software to be sorted is split as multiple code genetic fragments, obtains including the multiple wait for by the position at place The software gene pool of the code genetic fragment of classification software;
In the present embodiment, the first gene extraction module 112 can be used for executing step S120 shown in Fig. 2, about institute Description to the step S120 can be joined by stating the specific descriptions of the first gene extraction module 112.
First cluster module 113 is used for according to the code genetic fragment to multiple in the software gene pool Software to be sorted carries out clustering, and the software to be sorted is divided into multiple software families;
In the present embodiment, first cluster module 113 can be used for executing step S130 shown in Fig. 2, about described The specific descriptions of one cluster module 113 can join the description to the step S130.
The first identifier module 114, for adding corresponding family's label for each software family.
In the present embodiment, the first identifier module 114 can be used for executing step S140 shown in Fig. 2, about described The specific descriptions of one mark module 114 can join the description to the step S140.
Optionally, the gene extraction module is specifically used for, for each software to be sorted, it is each to be sorted soft obtaining this Position in the code of part where the code of calling system API, with the part between the code of two adjacent calling system API As a code genetic fragment, the code of each software to be sorted is split as multiple code genetic fragments.
Second embodiment
Fig. 4 is please referred to, Fig. 4 is a kind of Malware applied to electronic equipment 100 shown in FIG. 1 provided in this embodiment The flow chart of recognition methods, below will be to the method includes each steps to be described in detail.
Step S210 obtains the software code for waiting for target software.
Step S220, according to the position where the code of calling system API in the software code of the target software, by institute The software code for stating target software is split as multiple object code genetic fragments;
Wherein, to step S110 in the processing procedure first embodiment of the target software in step S210 and step S220 (210) similar to the processing procedure of single software to be sorted and in step S120, refer to the first implementation steps S110 (210) and the description of step S120.
For example, in step S220, the electronic equipment 100 can obtain in the software code of the target software and call system Position where the code of system API, using the part between the code of two adjacent calling system API as a code gene The software code of the target software is split as multiple code genetic fragments by segment.
Step S230, according to the object code genetic fragment of the target software, by the target software with it is preset soft Software genetic fragment in part gene pool carries out clustering, wherein the software gene pool includes multiple software families, often A software family includes the code genetic fragment of at least one software, and each software family marks with corresponding family Label;
Wherein, the software gene pool used in the present embodiment can be the software gene pool that first embodiment provides. In the present embodiment, after the object code genetic fragment that the target software is extracted by step S210 and step S220, with Code genetic fragment in the software gene pool carries out clustering, it can be deduced that the target software and the software gene The code genetic fragment of which software has inherent similitude in library.
Step S240 obtains family's label of the corresponding software family of the target software according to cluster analysis result.
In the present embodiment, the electronic equipment 100 show which software the target software belongs to by clustering After family, family's label of the corresponding software family of the target software can be exported.
Correspondingly, Fig. 5 is please referred to, the present embodiment also provides a kind of software classification device 210, the software classification device 110 include the second acquisition module 211, the second gene extraction module 212, the second cluster module 213 i.e. second identifier module 214.
Second acquisition module 211, for obtaining the software code for waiting for target software;
In the present embodiment, second acquisition module 211 can be used for executing step S210 shown in Fig. 4, about described The specific descriptions of two acquisition modules 211 can join the description to the step S210.
The second gene extraction module 212, for according to calling system API in the software code of the target software The software code of the target software is split as multiple object code genetic fragments by the position where code;
In the present embodiment, the second gene extraction module 212 can be used for executing step S220 shown in Fig. 4, about institute Description to the step S220 can be joined by stating the specific descriptions of the second gene extraction module 212.
Second cluster module 213, for the object code genetic fragment according to the target software, by the target Software genetic fragment in software and preset software gene pool carries out clustering, wherein the software gene pool includes Multiple software families, each software family include the code genetic fragment of at least one software, each software family With corresponding family's label;
In the present embodiment, second cluster module 213 can be used for executing step S230 shown in Fig. 4, about described The specific descriptions of two cluster modules 213 can join the description to the step S230.
The second identifier module 214, for obtaining the corresponding software man of the target software according to cluster analysis result Family's label of race.
In the present embodiment, the second identifier module 214 can be used for executing step S240 shown in Fig. 4, about described The specific descriptions of two mark modules 214 can join the description to the step S240.
Optionally the second gene extraction module 212 is specifically used for calling system in the software code for obtaining the target software Position where the code of system API, using the part between the code of two adjacent calling system API as a code gene The software code of the target software is split as multiple code genetic fragments by segment.
In conclusion software classification method and device provided by the present application, passes through the system API called according to software code Interface divides software into code genetic fragment, and carry out clustering according to code genetic fragment is classified with software.In this way, May be implemented to classify to software according to software code itself so that the classification information of software is more accurate and classification information without Method is forged.
In embodiment provided herein, it should be understood that disclosed device and method, it can also be by other Mode realize.The apparatus embodiments described above are merely exemplary, for example, the flow chart and block diagram in attached drawing are shown According to the device, the architectural framework in the cards of method and computer program product, function of multiple embodiments of the application And operation.In this regard, each box in flowchart or block diagram can represent one of a module, section or code Point, a part for the module, section or code includes one or more for implementing the specified logical function executable Instruction.It should also be noted that at some as in the realization method replaced, the function of being marked in box can also be attached to be different from The sequence marked in figure occurs.For example, two continuous boxes can essentially be basically executed in parallel, they also may be used sometimes To execute in the opposite order, this is depended on the functions involved.It is also noted that each of block diagram and or flow chart The combination of box in box and block diagram and or flow chart, function or the dedicated of action are based on as defined in execution The system of hardware is realized, or can be realized using a combination of dedicated hardware and computer instructions.
In addition, each function module in each embodiment of the application can integrate to form an independent portion Point, can also be modules individualism, can also two or more modules be integrated to form an independent part.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer read/write memory medium.Based on this understanding, the technical solution of the application is substantially in other words The part of the part that contributes to existing technology or the technical solution can be expressed in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of step. And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic disc or CD.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
The above, the only specific implementation mode of the application, but the protection domain of the application is not limited thereto, it is any Those familiar with the art can easily think of the change or the replacement in the technical scope that the application discloses, and should all contain It covers within the protection domain of the application.Therefore, the protection domain of the application shall be subject to the protection scope of the claim.

Claims (10)

1. a kind of software classification method, which is characterized in that the method includes:
Obtain the software code of multiple softwares to be sorted;
According to the position where the code of calling system API in the software code of each software to be sorted, by each software to be sorted Software code is split as multiple code genetic fragments, obtain include the code genetic fragment of the multiple software to be sorted software Gene pool;
Clustering is carried out to multiple softwares to be sorted in the software gene pool according to the code genetic fragment, it will be described Software to be sorted is divided into multiple software families;
Corresponding family's label is added for each software family.
2. according to the method described in claim 1, it is characterized in that, being called in the software code according to each software to be sorted Position where the code of system API, the step of software code of each software to be sorted is split as multiple code genetic fragments, Including:
For each software to be sorted, the position where the code of calling system API in the code of each software to be sorted is obtained It sets, using the part between the code of two adjacent calling system API as a code genetic fragment, described is waited for each point The code of class software is split as multiple code genetic fragments.
3. according to the method described in claim 1, it is characterized in that, it is described according to the code genetic fragment to the software base The step of carrying out clustering because of multiple softwares to be sorted in library, the software to be sorted be divided into multiple software families, Including:
According to the code genetic fragment by Affinity Propagation clustering algorithms to the multiple software to be sorted Clustering is carried out, the multiple software to be sorted is divided into multiple software families.
4. according to the method described in claim 1, it is characterized in that, the software code for obtaining multiple softwares to be sorted can wrap It includes:
Multiple softwares to be sorted are obtained, for each software to be sorted, by IDA disassemblers to the software to be sorted Decompiling is carried out, the software code of asm formats corresponding with the software to be sorted is obtained.
5. a kind of software classification method, which is characterized in that the method includes:
Obtain the software code for waiting for target software;
According to the position where the code of calling system API in the software code of the target software, by the target software Software code is split as multiple object code genetic fragments;
According to the object code genetic fragment of the target software, by the target software with it is soft in preset software gene pool Part genetic fragment carries out clustering, wherein the software gene pool includes multiple software families, each software family Code genetic fragment including at least one software, each software family have corresponding family's label;
Family's label of the corresponding software family of the target software is obtained according to cluster analysis result.
6. according to the method described in claim 5, it is characterized in that, being called in the software code according to the target software The software code of the target software is split as multiple object code genetic fragments by the position where the code of system API Step, including:
The position where the code of calling system API in the software code of the target software is obtained, system is called with adjacent two The software code of the target software is split as multiple by the part between the code of system API as a code genetic fragment Code genetic fragment.
7. a kind of software classification device, which is characterized in that described device includes:
Acquisition module, the software code for obtaining multiple softwares to be sorted;
Gene extraction module will be each according to the position where the code of calling system API in the software code of each software to be sorted The software code of software to be sorted is split as multiple code genetic fragments, obtain include the multiple software to be sorted code base Because of the software gene pool of segment;
Cluster module, for being gathered to multiple softwares to be sorted in the software gene pool according to the code genetic fragment The software to be sorted is divided into multiple software families by alanysis;
Mark module, for adding corresponding family's label for each software family.
8. device according to claim 7, which is characterized in that the gene extraction module is specifically used for waiting for point for each Class software obtains the position where the code of calling system API in the code of each software to be sorted, with two adjacent tune It uses the part between the code of system API as a code genetic fragment, the code of each software to be sorted is torn open It is divided into multiple code genetic fragments.
9. a kind of software classification device, which is characterized in that described device includes:
Acquisition module, for obtaining the software code for waiting for target software;
Gene extraction module is used for according to the position where the code of calling system API in the software code of the target software, The software code of the target software is split as multiple object code genetic fragments;
Cluster module, for according to the object code genetic fragment of the target software, by the target software with it is preset soft Software genetic fragment in part gene pool carries out clustering, wherein the software gene pool includes multiple software families, often A software family includes the code genetic fragment of at least one software, and each software family marks with corresponding family Label;
Mark module, for obtaining family's label of the corresponding software family of the target software according to cluster analysis result.
10. device according to claim 9, which is characterized in that the gene extraction module is specifically used for obtaining the target Position in the software code of software where the code of calling system API, between the code of two adjacent calling system API Part as a code genetic fragment, the software code of the target software is split as multiple code genetic fragments.
CN201810489257.6A 2018-05-21 2018-05-21 Software classification method and device Pending CN108734215A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810489257.6A CN108734215A (en) 2018-05-21 2018-05-21 Software classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810489257.6A CN108734215A (en) 2018-05-21 2018-05-21 Software classification method and device

Publications (1)

Publication Number Publication Date
CN108734215A true CN108734215A (en) 2018-11-02

Family

ID=63937745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810489257.6A Pending CN108734215A (en) 2018-05-21 2018-05-21 Software classification method and device

Country Status (1)

Country Link
CN (1) CN108734215A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508546A (en) * 2018-11-12 2019-03-22 杭州安恒信息技术股份有限公司 A kind of software homology analysis method and device based on software gene
CN111290775A (en) * 2020-04-02 2020-06-16 麒麟软件有限公司 Automatic classification method and system for software package types of Linux system
CN113536308A (en) * 2021-06-11 2021-10-22 中国人民解放军战略支援部队信息工程大学 Binary code tracing method for multi-granularity information fusion under software gene view angle
CN114254316A (en) * 2021-11-29 2022-03-29 上海戎磐网络科技有限公司 Software identification method and device based on software gene and storage medium
WO2022121146A1 (en) * 2020-12-07 2022-06-16 中山大学 Method and apparatus for determining importance of code segment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080005796A1 (en) * 2006-06-30 2008-01-03 Ben Godwood Method and system for classification of software using characteristics and combinations of such characteristics
US20110154495A1 (en) * 2009-12-21 2011-06-23 Stranne Odd Wandenor Malware identification and scanning
CN103902906A (en) * 2013-12-25 2014-07-02 武汉安天信息技术有限责任公司 Mobile terminal malicious code detecting method and system based on application icon
CN104866765A (en) * 2015-06-03 2015-08-26 康绯 Behavior characteristic similarity-based malicious code homology analysis method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080005796A1 (en) * 2006-06-30 2008-01-03 Ben Godwood Method and system for classification of software using characteristics and combinations of such characteristics
US20110154495A1 (en) * 2009-12-21 2011-06-23 Stranne Odd Wandenor Malware identification and scanning
CN103902906A (en) * 2013-12-25 2014-07-02 武汉安天信息技术有限责任公司 Mobile terminal malicious code detecting method and system based on application icon
CN104866765A (en) * 2015-06-03 2015-08-26 康绯 Behavior characteristic similarity-based malicious code homology analysis method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HANJIN等: "《2017 International Conferenceon Cyber-Enabled Distributed Computingand Knowledge Discovery》", 30 December 2017 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508546A (en) * 2018-11-12 2019-03-22 杭州安恒信息技术股份有限公司 A kind of software homology analysis method and device based on software gene
CN111290775A (en) * 2020-04-02 2020-06-16 麒麟软件有限公司 Automatic classification method and system for software package types of Linux system
WO2022121146A1 (en) * 2020-12-07 2022-06-16 中山大学 Method and apparatus for determining importance of code segment
CN113536308A (en) * 2021-06-11 2021-10-22 中国人民解放军战略支援部队信息工程大学 Binary code tracing method for multi-granularity information fusion under software gene view angle
CN113536308B (en) * 2021-06-11 2023-01-06 中国人民解放军战略支援部队信息工程大学 Binary code tracing method for multi-granularity information fusion under software gene view angle
CN114254316A (en) * 2021-11-29 2022-03-29 上海戎磐网络科技有限公司 Software identification method and device based on software gene and storage medium

Similar Documents

Publication Publication Date Title
CN108734215A (en) Software classification method and device
CN109271512B (en) Emotion analysis method, device and storage medium for public opinion comment information
CN108734012A (en) Malware recognition methods, device and electronic equipment
CN105868166B (en) Regular expression generation method and system
CN105787366A (en) Android software visualization safety analysis method based on module relations
CN106897072A (en) Traffic engineered call method, device and electronic equipment
CN104580093A (en) Processing method, device and system for notification messages of websites
CN108009435A (en) Data desensitization method, device and storage medium
US11695791B2 (en) System for extracting, classifying, and enriching cyber criminal communication data
CN104769598A (en) Systems and methods for detecting illegitimate applications
CN110119340A (en) Method for monitoring abnormality, device, electronic equipment and storage medium
CN109495479A (en) A kind of user's abnormal behaviour recognition methods and device
KR20150083627A (en) Method for detecting malignant code of android by activity string analysis
US11580220B2 (en) Methods and apparatus for unknown sample classification using agglomerative clustering
CN104640105A (en) Method and system for mobile phone virus analyzing and threat associating
KR102516454B1 (en) Method and apparatus for generating summary of url for url clustering
CN112738094A (en) Expandable network security vulnerability monitoring method, system, terminal and storage medium
CN108512822B (en) Risk identification method and device for data processing event
CN105425997B (en) A kind of user terminal restart after interface display method and user terminal
CN112437034A (en) False terminal detection method and device, storage medium and electronic device
CN103246846A (en) Method and device for detecting safety of customized ROM (read only memory)
CN105227528A (en) To detection method and the device of the attack of Web server group
EP4266200A1 (en) Generating device, generating method, and generating program
CN115423030A (en) Equipment identification method and device
CN109471920A (en) A kind of method, apparatus of Text Flag, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181102