CN114254317A - Software processing method and device based on software gene and storage medium - Google Patents

Software processing method and device based on software gene and storage medium Download PDF

Info

Publication number
CN114254317A
CN114254317A CN202111432157.8A CN202111432157A CN114254317A CN 114254317 A CN114254317 A CN 114254317A CN 202111432157 A CN202111432157 A CN 202111432157A CN 114254317 A CN114254317 A CN 114254317A
Authority
CN
China
Prior art keywords
software
family
genes
target
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111432157.8A
Other languages
Chinese (zh)
Other versions
CN114254317B (en
Inventor
刘旭
章丽娟
胡逸漪
陈鹏
李朝阳
王禹翔
张甜
陈振兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Roarpanda Network Technology Co ltd
Original Assignee
Shanghai Roarpanda Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Roarpanda Network Technology Co ltd filed Critical Shanghai Roarpanda Network Technology Co ltd
Priority to CN202111432157.8A priority Critical patent/CN114254317B/en
Publication of CN114254317A publication Critical patent/CN114254317A/en
Application granted granted Critical
Publication of CN114254317B publication Critical patent/CN114254317B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Stored Programmes (AREA)

Abstract

The application discloses a software processing method and device based on software genes and a storage medium. Wherein, the method comprises the following steps: extracting sample software genes contained in sample software of a target software family; determining family software genes of a target software family according to the extracted sample software genes; removing universal hereditary software genes from family software genes of the target software family to obtain unique hereditary software genes of the target software family, wherein the universal hereditary software genes are software genes which are commonly contained in the target software family and other sample software, the unique hereditary software genes are unique software genes of the target software family, and the unique hereditary software genes are used for indicating family gene characteristics of the target software family; and determining key software genes for identifying the target software family from the unique inherited software genes of the target software family.

Description

Software processing method and device based on software gene and storage medium
Technical Field
The present application relates to the field of software engineering and information security technologies, and in particular, to a software processing method and apparatus based on software genes, and a storage medium.
Background
With the rapid development of internet technology, various network security problems emerge endlessly. In particular, various organizations for performing long-term network attack activities based on political or economic benefits have appeared, and malware developed by these organizations continuously improves and mutates to form unique hereditary features of software codes, so that different malware families (such as APT family, leson family, industrial control malware family, etc.) are formed, and the attack behaviors of these malware families bring huge economic losses to individuals, enterprises, and even countries. Therefore, how to quickly and accurately identify the malicious software and the family information thereof has extremely important significance for guaranteeing the property safety of people and constructing the network safety and the national safety. In addition, the above-mentioned problems can also arise with non-malware families where piracy and infringement exist.
Taking a malware family as an example, in a traditional malware family analysis method, an analysis method based on a software gene tag library comprises the following steps: acquiring software to be analyzed; performing fragmentation operation on a code of the software to be analyzed to obtain a software genome of the software to be analyzed; performing normalization operation on each software gene in the software genome to obtain a target software genome; and determining preset software to which each software gene in the target software genome belongs based on the software gene library, and determining a software family to which the software to be analyzed belongs.
However, the gene label library analysis method based on software has the following defects: massive malicious software family samples need to be analyzed in advance to construct corresponding relation data from each gene to the software family samples; for a new malicious software gene, family attribution information cannot be obtained because the label library does not have corresponding data; the method has the advantages that the massive tag library is difficult to construct, iteration is not easy, and the database is huge and is not convenient for embedding products.
Aiming at the technical problems of large workload, low recognition rate and large data volume and difficult operation when software and family information are analyzed in the prior art, an effective solution is not provided at present.
Disclosure of Invention
Embodiments of the present application provide a software processing method, device and storage medium based on software genes, so as to at least solve the technical problems of large workload, low recognition rate and large data volume and difficulty in operation when analyzing software and family information thereof in the prior art.
According to an aspect of the embodiments of the present application, there is provided a software processing method based on a software gene, including: extracting sample software genes contained in sample software of a target software family; determining family software genes of a target software family according to the extracted sample software genes, wherein the family software genes are minimum inseparable and uniformly executed binary code segments contained in the sample software; removing universal hereditary software genes from family software genes of the target software family to obtain unique hereditary software genes of the target software family, wherein the universal hereditary software genes are software genes which are commonly contained in the target software family and other sample software, the unique hereditary software genes are unique software genes of the target software family, and the unique hereditary software genes are used for indicating family gene characteristics of the target software family; and determining key software genes for identifying the target software family from the unique inherited software genes of the target software family.
According to another aspect of the embodiments of the present application, there is also provided a software processing method based on a software gene, including: acquiring software to be identified; extracting a software gene of software to be identified; and comparing the software gene of the software to be identified with the key software gene of the target software family to determine whether the software to be identified belongs to the target software family, wherein the key software gene is a unique hereditary software gene for identifying the target software family.
According to another aspect of embodiments of the present application, there is also provided a storage medium including a stored program, wherein the method of any one of the above is performed by a processor when the program is run.
According to another aspect of the embodiments of the present application, there is also provided a software processing apparatus based on a software gene, including: the first extraction module is used for extracting sample software genes contained in sample software of a target software family; the first determination module is used for determining the family software genes of the target software family according to the extracted sample software genes, wherein the family software genes are the minimum inseparable and uniformly executed binary code fragments contained in the sample software; the second determination module is used for removing the universal hereditary software genes from the family software genes of the target software family to obtain the unique hereditary software genes of the target software family, wherein the universal hereditary software genes are software genes which are commonly contained in the target software family and other sample software, the unique hereditary software genes are software genes which are unique to the target software family, and the unique hereditary software genes are used for indicating the family gene characteristics of the target software family; and a third determination module for determining key software genes for identifying the target software family from the unique inherited software genes of the target software family.
According to another aspect of the embodiments of the present application, there is also provided a software processing apparatus based on a software gene, including: the first acquisition module is used for acquiring software to be identified; the second extraction module is used for extracting software genes of the software to be identified; and the fourth determination module is used for comparing the software gene of the software to be identified with the key software gene of the target software family to determine whether the software to be identified belongs to the target software family, wherein the key software gene is a unique hereditary software gene for identifying the target software family.
According to another aspect of the embodiments of the present application, there is also provided a software processing apparatus based on a software gene, including: a first processor; and a first memory coupled to the first processor for providing instructions to the first processor to process the following processing steps: extracting sample software genes contained in sample software of a target software family; determining family software genes of a target software family according to the extracted sample software genes, wherein the family software genes are minimum inseparable and uniformly executed binary code segments contained in the sample software; removing universal hereditary software genes from family software genes of the target software family to obtain unique hereditary software genes of the target software family, wherein the universal hereditary software genes are software genes which are commonly contained in the target software family and other sample software, the unique hereditary software genes are unique software genes of the target software family, and the unique hereditary software genes are used for indicating family gene characteristics of the target software family; and determining key software genes for identifying the target software family from the unique inherited software genes of the target software family.
According to another aspect of the embodiments of the present application, there is also provided a software processing apparatus based on a software gene, including: a second processor; and a second memory coupled to the second processor for providing instructions to the second processor to process the following processing steps: acquiring software to be identified; extracting a software gene of software to be identified; and comparing the software gene of the software to be identified with the key software gene of the target software family to determine whether the software to be identified belongs to the target software family, wherein the key software gene is a unique hereditary software gene for identifying the target software family.
In an embodiment of the present application, a computing device identifies family gene features of a target software family by extracting key software genes of the target software family. Because the software gene has the characteristic of unified material property and information property, the method can be used for expressing the inheritance of the sample software of a software family, and the family attribution of the software for identifying the software by using the software gene is more reasonable and has better interpretability. In addition, different software families have different key software genes, so that the family attribute of unknown software is identified more accurately by using the key software genes, and misjudgment is not easy to occur. Because the key software genes can uniquely identify the corresponding target software families, compared with a software gene label library analysis method, the method does not need to construct the corresponding relation data from each gene to each software family sample, and does not need to construct a massive label library. Therefore, compared with the prior art, the technical scheme of the embodiment of the application does not need to set up the running environment of each piece of software, does not need to perform complex preprocessing operation on each piece of sample software, and does not need to perform professional manual reverse analysis on the sample software. And the technical problems of large workload, low recognition rate, large data volume and difficult operation in the process of analyzing software and family information thereof in the prior art are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a hardware block diagram of a computing device for implementing the method according to embodiment 1 of the present application;
FIG. 2 is a schematic flow chart of a software gene-based software processing method according to the first aspect of example 1 of the present application;
FIG. 3 is a schematic flow chart of a software gene-based software processing method according to the second aspect of example 1 of the present application;
FIG. 4 is a schematic diagram of a software gene-based software processing apparatus according to the first aspect of example 2 of the present application;
FIG. 5 is a schematic diagram of a software gene-based software processing apparatus according to the second aspect of example 2 of the present application;
FIG. 6 is a schematic diagram of a software gene-based software processing apparatus according to the first aspect of example 3 of the present application; and
FIG. 7 is a schematic diagram of a software gene-based software processing apparatus according to the second aspect of example 3 of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the described embodiments are merely exemplary of some, and not all, of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, some of the nouns or terms appearing in the description of the embodiments of the present disclosure are applicable to the following explanations:
general inheritance: the method refers to a situation that codes are similar because developers use the same computer architecture, operating system, programming language, common code library and the like, and the inheritance is a general characteristic of software and cannot be used for distinguishing software attributes and family affiliations.
The unique inheritance is as follows: the method refers to the situation that the codes are similar because developers in a software family use the same attack mode, a private code library, a hacker tool, a programming specification, a development habit and the like, and the inheritance is a unique characteristic of the software family and can be used for distinguishing the family affiliations of the software.
Example 1
According to the present embodiment, there is provided a method embodiment of a software processing method based on software genes, it is noted that the steps shown in the flow chart of the drawings may be executed in a computer system such as a set of computer executable instructions, and that while a logical order is shown in the flow chart, in some cases the steps shown or described may be executed in an order different from that here.
The method embodiments provided by the present embodiment may be executed in a mobile terminal, a computer terminal, a server or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computing device for implementing a software gene-based software processing method. As shown in fig. 1, the computing device may include one or more processors (which may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory for storing data, and a transmission device for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computing device may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuitry may be a single, stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computing device. As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).
The memory may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the software gene-based software processing method in the embodiments of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, implementing the software gene-based software processing method of the application program. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory may further include memory located remotely from the processor, which may be connected to the computing device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device is used for receiving or transmitting data via a network. Specific examples of such networks may include wireless networks provided by communication providers of the computing devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computing device.
It should be noted here that in some alternative embodiments, the computing device shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that FIG. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in a computing device as described above.
Under the above operating environment, according to the first aspect of the present embodiment, a software processing method based on software genes is provided. Fig. 2 shows a flow diagram of the method, which, with reference to fig. 2, comprises:
s202: extracting sample software genes contained in sample software of a target software family;
s204: determining family software genes of a target software family according to the extracted sample software genes, wherein the family software genes are minimum inseparable and uniformly executed binary code segments contained in the sample software;
s206: removing universal hereditary software genes from family software genes of the target software family to obtain unique hereditary software genes of the target software family, wherein the universal hereditary software genes are software genes which are commonly contained in the target software family and other sample software, the unique hereditary software genes are unique software genes of the target software family, and the unique hereditary software genes are used for indicating family gene characteristics of the target software family; and
s208: key software genes for identifying a target software family are determined from the unique inherited software genes of the target software family.
In particular, the computing device first obtains sample software of a target software family (e.g., malware family), after which the computing device may extract the contained sample software genes from each sample software. Wherein the sample software genes are divided into software genes of binary software and software genes of non-binary software. More specifically, the computing device cuts the binary code segment, which is the smallest inseparable, uniformly executed binary code segment, obtained by cutting from the assembly code of the binary sample (i.e., sample software) or the execution unit of the abstract syntax tree AST of the non-binary sample (i.e., sample software), into the software genes of the sample, so that the computing device can extract the contained sample software genes from each sample software (S202). The relationship between the sample software of the software family and the relationship between the software genes of the sample software can be expressed by the following formulas:
fs={s1,s2,…,sm}
sg={g1,g2,…,gn}
wherein fs represents a family sample software set, s1~smRepresenting a single family sample software; sg denotes the software genome of the individual sample software, g1~gnRepresents a single software gene in the software genome of the single sample software.
Further, the computing device merges and deduplicates the same sample software genes in the sample software genes of the sample software, and uses the processed sample software genes as family software genes of the malware family (i.e., the target software family), so that the computing device can obtain the family software genes of the malware family. Wherein the family software genes are the smallest inseparable, consistent execution binary code segments contained by the sample software. That is, the software gene (i.e., binary code segment) is the smallest functional segment that cannot be cut further; the code in the software gene is either completely executed or not executed (S204).
The relationship between family software genes can be represented by the following formula:
fg={g1,g2,…,gt}
wherein fg denotes the combined de-duplicated family software genome, g1~gtRepresenting a single software gene in a family software genome.
Further, the family software genes of the malware family (i.e., the target software family) include generic genetic software genes and unique genetic software genes. The generic genetic software genes are software genes which are commonly contained in sample software of a malware family (namely a target software family) and sample software of other software families or other non-family sample software, namely similar codes generated by using the same computer architecture, an operating system, a programming language, a common code library and the like. The generic inheritance software genes are not only present in the sample software of the malware family (i.e., the target software family), but also present in other sample software, and thus the generic inheritance software genes cannot embody software gene characteristics unique to the malware family (i.e., the target software family). The unique hereditary software genes are software genes which are different between the sample software of the malware family (namely, the target software family) and other software families and other non-family sample software, namely, codes with unique characteristics generated by the same attack mode, private code library, hacker tool, programming specification, development habit and the like among developers in the malware family (namely, the target software family). The unique inherited software genes for the malware family (i.e., the target software family) will only appear in the sample software for the malware family (i.e., the target software family), and thus the unique inherited software genes are used to indicate the family gene signature for the target software family.
The computing device screens the family software genes of the malware family (i.e., the target software family) through a preset software gene library (e.g., a global software gene library) to obtain generic and unique hereditary software genes of the malware family. The computing device then removes the generic inherited software genes to obtain unique inherited software genes for the malware family (i.e., the target software family) (S206). The relationship between unique genetic software genes can be represented by the following formula:
fg′={g1,g2,…,gt′}
wherein fg' represents the unique genetic software genome remaining after screening, g1~gt′Representing a single unique genetic software gene.
Further, the computing device obtains a number of sample software that each unique inherited software gene of the malware family (i.e., the target software family) covers the family, selects a unique inherited software gene according to the number of covers, and uses the unique inherited software gene as a key software gene. Wherein the key software gene is an identifying software gene of the malware family (i.e., the target software family) (S208). Note that "overlay" in this example means that the sample software contains a unique genetic software gene. For example, both sample software 1 and sample software 2 contain unique inheritance software gene 1, i.e. unique inheritance software gene 1 overlays sample software 1 and sample software 2. The formula is as follows:
gs={s1,s2,…,sn}
wherein gs represents the total sample software for a predetermined software gene coverage, s1~snA single sample software representing the software gene coverage. And the formula of the key software genes of the target software family is shown below:
fkg={kg1,kg2,…,kgm}
fkg denotes the key software genome of this software family, kg1~kgmRepresents a key software gene of the software family.
As described in the background, the software-based gene signature library analysis method has the following disadvantages: massive malicious software family samples need to be analyzed in advance to construct corresponding relation data from each gene to the software family samples; for a new malicious software gene, family attribution information cannot be obtained because the label library does not have corresponding data; the method has the advantages that the massive tag library is difficult to construct, iteration is not easy, and the database is huge and is not convenient for embedding products.
Aiming at the technical problems, according to the technical scheme of the embodiment of the application, the computing equipment is used for identifying the family gene characteristics of the target software family by extracting the key software genes of the target software family. Because the software gene has the characteristic of unified material property and information property, the method can be used for expressing the inheritance of the sample software of a software family, and the family attribution of the software for identifying the software by using the software gene is more reasonable and has better interpretability. In addition, different software families have different key software genes, so that the family attribute of unknown software is identified more accurately by using the key software genes, and misjudgment is not easy to occur. Because the key software genes can uniquely identify the corresponding target software families, compared with a software gene label library analysis method, the method does not need to construct the corresponding relation data from each gene to each software family sample, and does not need to construct a massive label library. Therefore, compared with the prior art, the technical scheme of the embodiment of the application does not need to set up the running environment of each piece of software, does not need to perform complex preprocessing operation on each piece of sample software, and does not need to perform professional manual reverse analysis on the sample software. And the technical problems of large workload, low recognition rate, large data volume and difficult operation in the process of analyzing software and family information thereof in the prior art are solved.
Optionally, the operation of determining family software genes of the target software family from the extracted sample software genes comprises: and merging and de-duplicating the sample software genes of each sample software of the target software family to determine the family software genes of the target software family.
Specifically, the computing device obtains all sample software fs in the malware family (i.e., the target software family) and extracts each sample software (i.e., s) using a preset software gene extraction engine1,s2,…,sm) The sample software gene in (1) is extracted. The computing device then maps all of the sample software genes (i.e., sample software s)1,s2,…,smSoftware gene sg of1={g1,g2,…,gn},sg2={g1,g3,…,gn-1},…,sgm={g4,g5,…,gn-2}) to obtain the same sample software gene (e.g., sg)1G in (1)1And sg2G in (1)1). The computing device then maps the same sample software gene (i.e., sg)1G in (1)1And sg2G in (1)1) Performing a combination and de-duplication process, and combining and de-duplicated sample software genes (i.e., g)1) And other sample software genes sg1={g2,…,gn},sg2={g3,…,gn-1},…,sgm={g4,g5,…,gn-2That is, a family software gene that collectively functions as the malware family (i.e., target software family) (i.e., fg ═ g }1,g2,…,gt})。
Therefore, the technical scheme combines and deduplicates the repeated sample software genes, so that the sample software genes are more simplified, the subsequent operation of the sample software genes is facilitated, and the working efficiency is improved.
Optionally, the operation of removing the generic heritability software genes from the family software genes of the target software family to obtain the unique heritability software genes of the target software family comprises: matching the family software genes of the target software family with the software genes of each software in a preset software gene library to determine the universal hereditary software genes of the target software family; and removing the determined universal hereditary software genes from the family software genes of the target software family to obtain the unique hereditary software genes of the target software family.
Specifically, the computing device matches the family software genes of the malware family (i.e., the target software family) with the software genes of the respective software in a preset software gene library (e.g., a global software gene library), respectively, to determine generic and unique inherited software genes of the family software genes of the malware family (i.e., the target software family). The global software gene bank stores all relevant information of all software genes and is used for inquiring information such as types or families of the software genes. The computing device then removes the determined generic inherited software genes from the family software genes of the malware family (i.e., the target software family) to obtain unique inherited software genes of the malware family (i.e., the target software family). For example, the family software gene fg ═ { g ] of the malware family (i.e., the target software family)1,g2,…,gtThe general genetic software gene in the gene is g5,g6Then calculate the generic genetic software gene g5,g6The removal is performed so that the unique hereditary software gene of the malware family (i.e., the target software family) is fg' ═ g1,g2,g3,g4,g7,…,gt′}。
Therefore, the technical scheme removes the same general hereditary software genes in the target software family as other software family samples or other non-family samples, so that unique hereditary software genes with family characteristics can be obtained, the extraction of key software genes is facilitated, and the extraction speed is accelerated.
Optionally, the operation of determining key software genes for identifying the target software family from the unique inherited software genes of the target software family comprises: determining the number of family sample software covered by the unique genetic software genes respectively; and determining key software genes of the target software family from the unique hereditary software genes according to the family sample software quantity corresponding to each unique hereditary software gene.
Specifically, the computing device first determines the family sample software quantity covered by all the unique inherited software genes of the malware family (i.e., the target software family). For example, if the sample software 1 and the sample software 2 include the unique inheritance software gene 1, the number of family sample software covered by the unique inheritance software gene 1 is 2 (i.e., the sample software 1 and the sample software 2). The computing device then sorts the unique genetic software genes in descending order according to the number of family sample software corresponding to each unique genetic software gene. For example, when the top 10 unique inherited software genes sorted in descending order exactly cover all sample software of the malware family (i.e., the target software family), the computing device will determine the 10 unique inherited software genes as key software genes of the malware family (i.e., the target software family) for identifying software genes of the malware family (i.e., the target software family).
Therefore, according to the technical scheme, the key software genes can be rapidly determined according to the family sample software quantity corresponding to each unique hereditary software gene, so that the process of determining the key software genes is more convenient and faster.
Optionally, the operation of determining key software genes of the target software family from the unique hereditary software genes according to the family sample software quantity corresponding to each unique hereditary software gene comprises: sequencing the unique hereditary software genes according to the family sample software quantity corresponding to each unique hereditary software gene, wherein the sample software quantity is used for reflecting the importance degree of the unique hereditary software genes; sequentially selecting the most important unique hereditary software genes according to the sequence until the selected unique hereditary software genes cover all sample software of the target software family; and determining all selected unique hereditary software genes as key software genes of the target software family.
Specifically, the computing device performs descending sorting of the unique inherited software genes by using a preset sorting algorithm according to the number of family sample software covered by each unique inherited software gene of the malware family (i.e., the target software family). The computing device then selects for a unique inherited software gene of the malware family (i.e., the target software family), e.g., 30, e.g., s, sample software of the malware family (i.e., the target software family)1,s2,…,s30. Wherein the unique genetic software gene g1Overlay family sample software s1,s2,…,s15Thus unique genetic software Gene g1The number of covered family samples is 15, and the unique hereditary software gene g2Overlay family sample software s16,s17,…,s25Thus unique genetic software Gene g2The number of covered family samples is 10, and the unique hereditary software gene g3Overlay family sample software s26,s27,…,s30Thus unique genetic software Gene g3The number of family samples covered was 5, … …, unique genetic software Gene gt′Overlay family sample software s1Thus unique genetic software Gene gt′The number of covered family sample software was 1. The computing equipment selects the unique hereditary software genes in turn according to the sequencing result, namely the unique hereditary software gene g with the largest number of family sample software after sequencing in a descending order1Initial selection followed by selection of unique genetic software genes g2And unique genetic software Gene g3Until the selected unique genetic software gene (i.e., unique genetic software gene g)1Unique genetic software gene g2And unique genetic software Gene g3) All sample software s that can cover this malware family (i.e., the target software family)1,s2,…,s30. The computing device will then assign the selected unique genetic software genes (i.e.,unique genetic software Gene g1Unique genetic software gene g2And unique genetic software Gene g3) The key software genes that are the malware family (i.e., the target software family) are identified.
Therefore, the key software genes can be quickly obtained and the key software genes are more simplified by sequencing the family sample software number corresponding to the unique hereditary software genes in a descending order and selecting the unique hereditary software genes which can just completely cover all the sample software.
Optionally, the method further comprises: acquiring software to be identified; extracting a software gene of software to be identified; and comparing the software gene of the software to be identified with the key software gene of the target software family to determine whether the software to be identified belongs to the target software family.
Specifically, when the computing device needs to identify a certain software and determine whether the software belongs to the malware family (i.e., the target software family), the software (i.e., the software to be identified) needs to be acquired first, and then the computing device extracts the software genes of the software to be identified and acquires the key software genes of the malware family (i.e., the target software family). Then, the computing device compares the software gene of the software to be identified with the key software gene of the malware family (i.e., the target software family), and when the software gene of the software to be identified is the same as any one or more of the key software genes, it is determined that the software gene of the software to be identified belongs to the malware family (i.e., the target software family). Otherwise, the software gene of the software to be identified does not belong to the malware family (i.e., the target software family).
Therefore, according to the technical scheme, the software genes of the software to be identified and the extracted key software of the software family are compared, the attribution of the software to be identified can be effectively identified, different software families have different key software genes, the family attribute of the unknown sample is identified more accurately by using the key software genome, and misjudgment is not easy to occur.
In addition, although the embodiment describes the process of software processing based on software genes by taking a malware family as an example, the process is also applicable to non-malware families, so that by comparing the software genes of the specified software with the key software genes of the target software family, whether the software belongs to the target software family can be determined through the method described in the application, and therefore, whether the software infringes the copyright of all parties of the target software family can be determined, and therefore, the infringement and software piracy can be effectively prevented. The detailed description of the method is omitted here.
Thus, according to the first aspect of the embodiment, the computing device identifies family gene characteristics of the target software family by extracting key software genes of the target software family. Because the software gene has the characteristic of unified material property and information property, the method can be used for expressing the inheritance of the sample software of a software family, and the family attribution of the software for identifying the software by using the software gene is more reasonable and has better interpretability. In addition, different software families have different key software genes, so that the family attribute of unknown software is identified more accurately by using the key software genes, and misjudgment is not easy to occur. Because the key software genes can uniquely identify the corresponding target software families, compared with a software gene label library analysis method, the method does not need to construct the corresponding relation data from each gene to each software family sample, and does not need to construct a massive label library. Therefore, compared with the prior art, the technical scheme of the embodiment of the application does not need to set up the running environment of each piece of software, does not need to perform complex preprocessing operation on each piece of sample software, and does not need to perform professional manual reverse analysis on the sample software. And the technical problems of large workload, low recognition rate, large data volume and difficult operation in the process of analyzing software and family information thereof in the prior art are solved.
Further, according to a second aspect of the present embodiment, there is provided a software processing method based on a software gene. Fig. 3 shows a flow diagram of the method, which, with reference to fig. 3, comprises:
s302: acquiring software to be identified;
s304: extracting a software gene of software to be identified; and
s306: and comparing the software gene of the software to be identified with the key software gene of the target software family to determine whether the software to be identified belongs to the target software family, wherein the key software gene is a unique hereditary software gene for identifying the target software family.
Specifically, when the computing device needs to identify a certain software and determine whether the software belongs to the malware family (i.e., the target software family), the software (i.e., the software to be identified) needs to be acquired first, and then the computing device extracts the software genes of the software to be identified and acquires the key software genes of the malware family (i.e., the target software family). Wherein the key software genes are software genes for identifying a target software family. Then, the computing device compares the software gene of the software to be identified with the key software gene of the malware family (i.e., the target software family), and when the software gene of the software to be identified is the same as any one or more of the key software genes, it is determined that the software gene of the software to be identified belongs to the malware family (i.e., the target software family). Otherwise, the software gene of the software to be identified does not belong to the malware family (i.e., the target software family).
In addition, the computing device can compare the software genes of the software to be identified with the key software genes of a plurality of software families, so as to determine which software family the software to be identified belongs to, and obtain the family information of the software to be identified.
Therefore, according to the second aspect of this embodiment, in the technical solution, by comparing the software genes of the software to be identified with the extracted key software of the software family, the attribution of the software to be identified can be effectively identified, and different software families have different key software genes, and the identification of family attributes of unknown samples using key software genomes is more accurate and is not prone to misjudgment.
Further, referring to fig. 1, according to a third aspect of the present embodiment, there is provided a storage medium. The storage medium comprises a stored program, wherein the method of any of the above is performed by a processor when the program is run.
Thus, according to this embodiment, the computing device identifies family gene features of the target software family by extracting key software genes of the target software family. Because the software gene has the characteristic of unified material property and information property, the method can be used for expressing the inheritance of the sample software of a software family, and the family attribution of the software for identifying the software by using the software gene is more reasonable and has better interpretability. In addition, different software families have different key software genes, so that the family attribute of unknown software is identified more accurately by using the key software genes, and misjudgment is not easy to occur. Because the key software genes can uniquely identify the corresponding target software families, compared with a software gene label library analysis method, the method does not need to construct the corresponding relation data from each gene to each software family sample, and does not need to construct a massive label library. Therefore, compared with the prior art, the technical scheme of the embodiment of the application does not need to set up the running environment of each piece of software, does not need to perform complex preprocessing operation on each piece of sample software, and does not need to perform professional manual reverse analysis on the sample software. And the technical problems of large workload, low recognition rate, large data volume and difficult operation in the process of analyzing software and family information thereof in the prior art are solved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
Fig. 4 shows a software gene-based software processing apparatus 400 according to the first aspect of the present embodiment, the apparatus 400 corresponding to the method according to the first aspect of the embodiment 1. Referring to fig. 4, the apparatus 400 includes: a first extraction module 410, configured to extract a sample software gene included in sample software of a target software family; a first determining module 420, configured to determine family software genes of a target software family according to the extracted sample software genes, where the family software genes are minimum inseparable and uniformly executed binary code fragments included in the sample software; a second determining module 430, configured to remove the generic inheritance software genes from the family software genes of the target software family to obtain unique inheritance software genes of the target software family, where the generic inheritance software genes are software genes included in the target software family together with other sample software, the unique inheritance software genes are software genes unique to the target software family, and the unique inheritance software genes are used to indicate family gene characteristics of the target software family; and a third determining module 440 for determining key software genes for identifying the target software family from the unique inherited software genes of the target software family.
Optionally, the first determining module 420 includes: and the first determining submodule is used for merging and carrying out de-duplication processing on the sample software genes of each sample software of the target software family to determine the family software genes of the target software family.
Optionally, the second determining module 430 includes: the second determining submodule is used for matching the family software genes of the target software family with the software genes of each software in a preset software gene library to determine the universal hereditary software genes of the target software family; and a third determining submodule for removing the determined universal inheritance software gene from the family software gene of the target software family to obtain a unique inheritance software gene of the target software family.
Optionally, the third determining module 440 includes: the fourth determining submodule is used for determining the number of family sample software covered by the unique hereditary software genes respectively; and a fifth determining submodule, which is used for determining key software genes of the target software family from the unique hereditary software genes according to the family sample software number corresponding to each unique hereditary software gene.
Optionally, a fifth determining submodule, comprising: the sequencing unit is used for sequencing the unique hereditary software genes according to the family sample software quantity corresponding to each unique hereditary software gene, wherein the sample software quantity is used for reflecting the importance degree of the unique hereditary software genes; the gene selection unit is used for sequentially selecting the most important unique hereditary software genes according to the sequence until the selected unique hereditary software genes cover all sample software of the target software family; and a first determination unit for determining all the selected unique hereditary software genes as key software genes of the target software family.
The apparatus 400 further comprises: the acquisition module is used for acquiring software to be identified; the extraction module is used for extracting software genes of the software to be identified; and the determining module is used for comparing the software gene of the software to be identified with the key software gene of the target software family and determining whether the software to be identified belongs to the target software family.
Furthermore, fig. 5 shows a software gene-based software processing apparatus 500 according to the second aspect of the present embodiment, the apparatus 500 corresponding to the method according to the second aspect of example 1. Referring to fig. 5, the apparatus 500 includes: a first obtaining module 510, configured to obtain software to be identified; a second extraction module 520, configured to extract a software gene of the software to be identified; and a fourth determining module 530, configured to compare the software gene of the software to be identified with a key software gene of the target software family, and determine whether the software to be identified belongs to the target software family, where the key software gene is a unique hereditary software gene for identifying the target software family.
Thus, according to this embodiment, the computing device identifies family gene features of the target software family by extracting key software genes of the target software family. Because the software gene has the characteristic of unified material property and information property, the method can be used for expressing the inheritance of the sample software of a software family, and the family attribution of the software for identifying the software by using the software gene is more reasonable and has better interpretability. In addition, different software families have different key software genes, so that the family attribute of unknown software is identified more accurately by using the key software genes, and misjudgment is not easy to occur. Because the key software genes can uniquely identify the corresponding target software families, compared with a software gene label library analysis method, the method does not need to construct the corresponding relation data from each gene to each software family sample, and does not need to construct a massive label library. Therefore, compared with the prior art, the technical scheme of the embodiment of the application does not need to set up the running environment of each piece of software, does not need to perform complex preprocessing operation on each piece of sample software, and does not need to perform professional manual reverse analysis on the sample software. And the technical problems of large workload, low recognition rate, large data volume and difficult operation in the process of analyzing software and family information thereof in the prior art are solved.
Example 3
Fig. 6 shows a software gene-based software processing apparatus 600 according to the first aspect of the present embodiment, the apparatus 600 corresponding to the method according to the first aspect of the embodiment 1. Referring to fig. 6, the apparatus 600 includes: a first processor 610; and a first memory 620, coupled to the first processor 610, for providing instructions to the first processor 610 to process the following processing steps: extracting sample software genes contained in sample software of a target software family; determining family software genes of a target software family according to the extracted sample software genes, wherein the family software genes are minimum inseparable and uniformly executed binary code segments contained in the sample software; removing universal hereditary software genes from family software genes of the target software family to obtain unique hereditary software genes of the target software family, wherein the universal hereditary software genes are software genes which are commonly contained in the target software family and other sample software, the unique hereditary software genes are unique software genes of the target software family, and the unique hereditary software genes are used for indicating family gene characteristics of the target software family; and determining key software genes for identifying the target software family from the unique inherited software genes of the target software family.
Optionally, the operation of determining family software genes of the target software family from the extracted sample software genes comprises: and merging and de-duplicating the sample software genes of each sample software of the target software family to determine the family software genes of the target software family.
Optionally, the operation of removing the generic heritability software genes from the family software genes of the target software family to obtain the unique heritability software genes of the target software family comprises: matching the family software genes of the target software family with the software genes of each software in a preset software gene library to determine the universal hereditary software genes of the target software family; and removing the determined universal hereditary software genes from the family software genes of the target software family to obtain the unique hereditary software genes of the target software family.
Optionally, the operation of determining key software genes for identifying the target software family from the unique inherited software genes of the target software family comprises: determining the number of family sample software covered by the unique genetic software genes respectively; and determining key software genes of the target software family from the unique hereditary software genes according to the family sample software quantity corresponding to each unique hereditary software gene.
Optionally, the operation of determining key software genes of the target software family from the unique hereditary software genes according to the family sample software quantity corresponding to each unique hereditary software gene comprises: sequencing the unique hereditary software genes according to the family sample software quantity corresponding to each unique hereditary software gene, wherein the sample software quantity is used for reflecting the importance degree of the unique hereditary software genes; sequentially selecting the most important unique hereditary software genes according to the sequence until the selected unique hereditary software genes cover all sample software of the target software family; and determining all selected unique hereditary software genes as key software genes of the target software family.
Optionally, the memory 620 is further configured to provide the processor 610 with instructions to process the following processing steps: acquiring software to be identified; extracting a software gene of software to be identified; and comparing the software gene of the software to be identified with the key software gene of the target software family to determine whether the software to be identified belongs to the target software family.
Furthermore, fig. 7 shows a software gene-based software processing apparatus 700 according to the second aspect of the present embodiment, the apparatus 700 corresponding to the method according to the second aspect of the embodiment 1. Referring to fig. 7, the apparatus 700 includes: a second processor 710; and a second memory 720, coupled to the second processor 710, for providing instructions to the second processor 710 to process the following steps: acquiring software to be identified; extracting a software gene of software to be identified; and comparing the software gene of the software to be identified with the key software gene of the target software family to determine whether the software to be identified belongs to the target software family, wherein the key software gene is a unique hereditary software gene for identifying the target software family.
Thus, according to this embodiment, the computing device identifies family gene features of the target software family by extracting key software genes of the target software family. Because the software gene has the characteristic of unified material property and information property, the method can be used for expressing the inheritance of the sample software of a software family, and the family attribution of the software for identifying the software by using the software gene is more reasonable and has better interpretability. In addition, different software families have different key software genes, so that the family attribute of unknown software is identified more accurately by using the key software genes, and misjudgment is not easy to occur. Because the key software genes can uniquely identify the corresponding target software families, compared with a software gene label library analysis method, the method does not need to construct the corresponding relation data from each gene to each software family sample, and does not need to construct a massive label library. Therefore, compared with the prior art, the technical scheme of the embodiment of the application does not need to set up the running environment of each piece of software, does not need to perform complex preprocessing operation on each piece of sample software, and does not need to perform professional manual reverse analysis on the sample software. And the technical problems of large workload, low recognition rate, large data volume and difficult operation in the process of analyzing software and family information thereof in the prior art are solved.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, which can store program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A software processing method based on software genes is characterized by comprising the following steps:
extracting sample software genes contained in sample software of a target software family;
determining family software genes of the target software family from the extracted sample software genes, wherein the family software genes are the smallest non-segmentable, consistently executed binary code fragments contained by the sample software;
removing universal hereditary software genes from family software genes of the target software family to obtain unique hereditary software genes of the target software family, wherein the universal hereditary software genes are software genes which are contained in the target software family together with other sample software, the unique hereditary software genes are software genes which are unique to the target software family, and the unique hereditary software genes are used for indicating family gene characteristics of the target software family; and
determining key software genes for identifying the target software family from the unique inherited software genes of the target software family.
2. The method of claim 1, wherein the act of determining family software genes of the target software family from the extracted sample software genes comprises:
and merging and de-duplicating the sample software genes of each sample software of the target software family to determine the family software genes of the target software family.
3. The method of claim 1, wherein the operation of removing generic inheritance software genes from the family software genes of the target software family to obtain the unique inheritance software genes of the target software family comprises:
matching the family software genes of the target software family with software genes of each software in a preset software gene library to determine universal hereditary software genes of the target software family; and
removing the determined generic heritability software genes from the family software genes of the target software family to obtain unique heritability software genes of the target software family.
4. The method of claim 1, wherein determining key software genes from the unique inherited software genes of the target software family for identifying the target software family comprises:
determining the number of family sample software respectively covered by the unique hereditary software genes; and
and determining key software genes of the target software family from the unique hereditary software genes according to the family sample software quantity corresponding to each unique hereditary software gene.
5. The method of claim 4, wherein the operation of determining key software genes of the target software family from the unique inherited software genes based on the family sample software quantity corresponding to each unique inherited software gene comprises:
sequencing the unique hereditary software genes according to the family sample software quantity corresponding to each unique hereditary software gene, wherein the sample software quantity is used for representing the importance degree of the unique hereditary software genes;
sequentially selecting the most important unique hereditary software genes according to the sequence until the selected unique hereditary software genes cover all sample software of the target software family; and
and determining all selected unique hereditary software genes as key software genes of the target software family.
6. The method of claim 5, further comprising:
acquiring software to be identified;
extracting a software gene of the software to be identified;
and comparing the software gene of the software to be identified with the key software gene of the target software family to determine whether the software to be identified belongs to the target software family.
7. A software processing method based on software genes is characterized by comprising the following steps:
acquiring software to be identified;
extracting a software gene of the software to be identified; and
and comparing the software gene of the software to be identified with a key software gene of a target software family to determine whether the software to be identified belongs to the target software family, wherein the key software gene is a unique hereditary software gene for identifying the target software family.
8. A storage medium comprising a stored program, wherein the method of any one of claims 1 to 7 is performed by a processor when the program is run.
9. A software processing apparatus based on a software gene, comprising:
the first extraction module is used for extracting sample software genes contained in sample software of a target software family;
a first determining module for determining family software genes of the target software family according to the extracted sample software genes, wherein the family software genes are minimum inseparable and consistent-execution binary code segments contained in the sample software;
a second determining module, configured to remove a generic inheritance software gene from a family software gene of the target software family to obtain a unique inheritance software gene of the target software family, wherein the generic inheritance software gene is a software gene included in the target software family in common with other sample software, and the unique inheritance software gene is a software gene unique to the target software family, and the unique inheritance software gene is used for indicating a family gene characteristic of the target software family; and
a third determination module for determining key software genes for identifying the target software family from the unique inherited software genes of the target software family.
10. A software processing apparatus based on a software gene, comprising:
the first acquisition module is used for acquiring software to be identified;
the second extraction module is used for extracting the software genes of the software to be identified; and
and the fourth determination module is used for comparing the software genes of the software to be identified with key software genes of a target software family to determine whether the software to be identified belongs to the target software family, wherein the key software genes are unique hereditary software genes for identifying the target software family.
CN202111432157.8A 2021-11-29 2021-11-29 Software processing method and device based on software genes and storage medium Active CN114254317B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111432157.8A CN114254317B (en) 2021-11-29 2021-11-29 Software processing method and device based on software genes and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111432157.8A CN114254317B (en) 2021-11-29 2021-11-29 Software processing method and device based on software genes and storage medium

Publications (2)

Publication Number Publication Date
CN114254317A true CN114254317A (en) 2022-03-29
CN114254317B CN114254317B (en) 2023-06-16

Family

ID=80791333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111432157.8A Active CN114254317B (en) 2021-11-29 2021-11-29 Software processing method and device based on software genes and storage medium

Country Status (1)

Country Link
CN (1) CN114254317B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604364A (en) * 2009-07-10 2009-12-16 珠海金山软件股份有限公司 Computer rogue program categorizing system and sorting technique based on file instruction sequence
CN104331436A (en) * 2014-10-23 2015-02-04 西安交通大学 Rapid classification method of malicious codes based on family genetic codes
CN107657175A (en) * 2017-09-15 2018-02-02 北京理工大学 A kind of homologous detection method of malice sample based on image feature descriptor
CN108063768A (en) * 2017-12-26 2018-05-22 河南信息安全研究院有限公司 The recognition methods of network malicious act and device based on network gene technology
CN108694319A (en) * 2017-04-06 2018-10-23 武汉安天信息技术有限责任公司 A kind of malicious code family determination method and device
CN109508546A (en) * 2018-11-12 2019-03-22 杭州安恒信息技术股份有限公司 A kind of software homology analysis method and device based on software gene
CN112084500A (en) * 2020-09-15 2020-12-15 腾讯科技(深圳)有限公司 Method and device for clustering virus samples, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604364A (en) * 2009-07-10 2009-12-16 珠海金山软件股份有限公司 Computer rogue program categorizing system and sorting technique based on file instruction sequence
CN104331436A (en) * 2014-10-23 2015-02-04 西安交通大学 Rapid classification method of malicious codes based on family genetic codes
CN108694319A (en) * 2017-04-06 2018-10-23 武汉安天信息技术有限责任公司 A kind of malicious code family determination method and device
CN107657175A (en) * 2017-09-15 2018-02-02 北京理工大学 A kind of homologous detection method of malice sample based on image feature descriptor
CN108063768A (en) * 2017-12-26 2018-05-22 河南信息安全研究院有限公司 The recognition methods of network malicious act and device based on network gene technology
CN109508546A (en) * 2018-11-12 2019-03-22 杭州安恒信息技术股份有限公司 A kind of software homology analysis method and device based on software gene
CN112084500A (en) * 2020-09-15 2020-12-15 腾讯科技(深圳)有限公司 Method and device for clustering virus samples, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114254317B (en) 2023-06-16

Similar Documents

Publication Publication Date Title
CN112583773B (en) Unknown sample detection method and device, storage medium and electronic device
CN108363686A (en) A kind of character string segmenting method, device, terminal device and storage medium
KR101582601B1 (en) Method for detecting malignant code of android by activity string analysis
US20170372069A1 (en) Information processing method and server, and computer storage medium
CN112651024B (en) Method, device and equipment for detecting malicious codes
CN111222137A (en) Program classification model training method, program classification method and device
CN112148305A (en) Application detection method and device, computer equipment and readable storage medium
US10929531B1 (en) Automated scoring of intra-sample sections for malware detection
CN112257068A (en) Program similarity detection method and device, electronic equipment and storage medium
CN113688240A (en) Threat element extraction method, device, equipment and storage medium
CN112988780A (en) Data checking method and device, storage medium and electronic equipment
CN106682056B (en) The determination method, apparatus and system of correlation between different application software
CN115357897A (en) Open source software identification method and device
JP5720536B2 (en) Information processing method and apparatus for searching for concealed data
US9754208B2 (en) Automatic rule coaching
CN114254317A (en) Software processing method and device based on software gene and storage medium
CN112749258A (en) Data searching method and device, electronic equipment and storage medium
CN111324892A (en) Software gene for generating script file and script detection method, device and medium
CN116991912A (en) Risk mining method, apparatus, device and storage medium
CN108491718B (en) Method and device for realizing information classification
CN115859273A (en) Method, device and equipment for detecting abnormal access of database and storage medium
CN114968933A (en) Method and device for classifying logs of data center
CN114254316A (en) Software identification method and device based on software gene and storage medium
CN115861606A (en) Method and device for classifying long-tail distribution documents and storage medium
Black et al. Identifying cross-version function similarity using contextual features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant