CN110837638A - Method, device and equipment for detecting lasso software and storage medium - Google Patents

Method, device and equipment for detecting lasso software and storage medium Download PDF

Info

Publication number
CN110837638A
CN110837638A CN201911087368.5A CN201911087368A CN110837638A CN 110837638 A CN110837638 A CN 110837638A CN 201911087368 A CN201911087368 A CN 201911087368A CN 110837638 A CN110837638 A CN 110837638A
Authority
CN
China
Prior art keywords
neuron
data
target
neuron network
software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911087368.5A
Other languages
Chinese (zh)
Other versions
CN110837638B (en
Inventor
张宾
肖喜
黄重庆
张伟哲
黄兴森
武化龙
阿伦·库玛·桑格亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Peng Cheng Laboratory
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University, Peng Cheng Laboratory filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN201911087368.5A priority Critical patent/CN110837638B/en
Publication of CN110837638A publication Critical patent/CN110837638A/en
Application granted granted Critical
Publication of CN110837638B publication Critical patent/CN110837638B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application discloses a method, a device, equipment and a storage medium for detecting Lesox software, wherein feature data of software to be detected are used as input data of a Lesox software classification model, and a first neuron network included in the Lesox software classification model is adjusted based on the feature data, so that a second neuron network is obtained. Obviously, through adjustment, the second neuron network can adaptively meet certain quantization error constraint, and meanwhile, the new type of the Lesox software is not learned by the Lesox software classification model under the condition that the classification result of the Lesox software classification model before adjustment is not influenced. Therefore, the method further triggers the lasso software classification model to output the lasso software classification based on the second neuron network. In conclusion, the method can identify the type of the lasso software of a known type, can also identify the type of the lasso software of a new type, and outputs the classification of the lasso software, thereby improving the accuracy of the lasso software classification.

Description

Method, device and equipment for detecting lasso software and storage medium
Technical Field
The present application relates to the field of network security, and in particular, to a method, an apparatus, a device, and a storage medium for detecting lasso software.
Background
With the development of the internet, the network security problem is increasingly highlighted. The lasso software poses a significant security threat to users by preventing them from using their computers or encrypting files in an attempt to force users to pay a fee to regain access to them.
At present, a method for detecting lasso software classifies target lasso software according to information of the lasso software of a known type to obtain the type of the lasso software. However, for new types of lemonades, the prior art cannot accurately identify their types.
Disclosure of Invention
The application provides a method, a device, equipment and a storage medium for detecting lasso software, and aims to solve the problem that the prior art cannot accurately identify the type of lasso software.
In order to achieve the above object, the present application provides the following technical solutions:
a method for detecting Lesso software comprises the following steps:
extracting characteristic data;
inputting the characteristic data into a preset lean software classification model, wherein the lean software classification model comprises a first neuron network;
adjusting the first neuron network based on the feature data to obtain a second neuron network;
and triggering the lasso software classification model to output the lasso software classification based on the second neuron network.
Optionally, adjusting the first neuron network based on the feature data to obtain the second neuron network, including:
generating new neurons for any dimension first target data in the feature data, wherein the weight of the new neurons is the first target data, and the first target data is any one-dimensional data, the distance between the first target data and any one neuron in the first neuron network is greater than a first preset threshold;
and adding the new neuron into the first neuron network to obtain the second neuron network.
Optionally, adjusting the first neuron network based on the feature data to obtain the second neuron network, further comprising:
connecting a first target neuron in the second neuron network, and setting the age of an edge connected with the first target neuron to a preset initial value; the first target neuron is two neurons in the first neuron network which are closest to the first target data;
in the second neuron network, increasing the age of a target edge by a unit value, wherein the target edge is an edge with one end connected with the first target neuron and the other end connected with a non-first target neuron;
deleting edges in the second neuron network whose age is greater than a second preset threshold.
Optionally, adjusting the first neuron network based on the feature data to obtain the second neuron network, further comprising:
for any dimension of second target data in the feature data, increasing the weight of a second target neuron by a first numerical value to obtain a second neuron network;
the second target data is any dimension data which is not the first target data in the characteristic data; the second target neuron is two neurons in the first neuron network that are closest to the second target data, and the first value is inversely related to a sum of times the neuron becomes the first target neuron or the second target neuron.
Optionally, the method for determining the first preset threshold of any neuron includes:
if the neuron is connected with other neurons, the first preset threshold value is determined according to a preset maximum intra-class distance;
if the neuron is not connected with other neurons, the first preset threshold value is determined according to a preset minimum inter-class distance.
Optionally, extracting feature data comprises:
extracting encryption API function characteristic data, API call characteristic data, registry characteristic data, file and folder operation characteristic data, memory characteristic data, message characteristic data and flow characteristic data.
Optionally, before the inputting the feature data into a preset lasso software classification model, the method further includes:
and performing dimension reduction processing on the feature data.
Optionally, after the adjusting the first neuron network based on the feature data to obtain the second neuron network, the method further includes:
clustering neurons in the second neuron network;
and removing noise neurons, wherein the noise neurons are neurons which do not belong to any one cluster.
A detecting device for lasso software, comprising:
a feature extraction unit for extracting feature data;
the data input unit is used for inputting the characteristic data into a preset Lesox software classification model, and the Lesox software classification model comprises a first neuron network;
a network adjusting unit, configured to adjust the first neuron network based on the feature data to obtain a second neuron network;
and the data output unit is used for triggering the lasso software classification model to output the lasso software classification based on the second neuron network.
A lasso software detection apparatus comprising: a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the lasso software detection method.
A storage medium having stored thereon a program which, when executed by a processor, carries out the steps of the method of detection of lasso software as described above.
According to the technical scheme, the characteristic data of the software to be detected is used as the input data of the lasso software classification model, and the first neuron network included in the lasso software classification model is adjusted based on the characteristic data, so that the second neuron network is obtained. Obviously, through adjustment, the second neuron network can adaptively meet certain quantization error constraint, and meanwhile, the new type of the Lesox software is not learned by the Lesox software classification model under the condition that the classification result of the Lesox software classification model before adjustment is not influenced. Therefore, the method further triggers the lasso software classification model to output the lasso software classification based on the second neuron network. In conclusion, the method can identify the type of the lasso software of a known type, can also identify the type of the lasso software of a new type, and outputs the classification of the lasso software, thereby improving the accuracy of the lasso software classification.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for detecting Lesog software disclosed in an embodiment of the present application;
FIG. 2 is a flowchart of a method for tuning a first neuron network disclosed in an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a detecting device of Lesog software disclosed in an embodiment of the present application;
fig. 4 is a schematic structural diagram of a detecting device of lasso software disclosed in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a flowchart of a method for detecting lasso software disclosed in an embodiment of the present application, and as shown in fig. 1, the method may specifically include:
and S101, extracting characteristic data.
Specifically, the feature data of the software to be detected is various data capable of characterizing the software characteristics, it can be understood that the feature data may include a plurality of types, which may include the same type of feature data, and optionally, the method may extract different types of feature data of the software to be detected. Therefore, the types of the feature data of the software to be detected extracted in this step may include: encryption API function characteristic data, API call characteristic data, registry characteristic data, file and folder operation characteristic data, memory characteristic data, message characteristic data and flow characteristic data.
It is to be understood that each type of feature data described above can be represented as a one-dimensional vector, and the feature data obtained in this step is a set of feature data composed of all one-dimensional vectors.
It should be noted that the software to be detected may be known lasso software or unknown software.
S102, inputting the characteristic data into a preset lean software classification model, wherein the lean software classification model comprises a first neuron network.
Specifically, the known feature data may be a feature data set X, which may include feature data of the software to be detected in multiple dimensions. If the number of the types of the feature data is n, X is { X ═ X1,x2,...,xλ,...,xnIn which xλRepresents the lambda-th one-dimensional data (1. ltoreq. lambda. ltoreq.n).
Further, any dimension data in the X is input into a preset Lessox software classification model. The Lesox software classification model is obtained by training in advance, sample data is the characteristic data of Lesox software of a known type, training input data is the characteristic data (namely the sample data) of the Lesox software of the known type, and target output data is the type of the Lesox software. The trained lean software classification model comprises a first neuron network, the first neuron network comprises a plurality of neurons, the neurons are generated according to sample data in the training process, the weight value of any neuron is the sample data for generating the neuron, and the model parameters of the lean software classification model are determined by the training of the sample data.
S103, adjusting the first neuron network based on the characteristic data to obtain a second neuron network.
Specifically, each neuron in the first neuron network is obtained by training based on sample data, the weight of each neuron network is the sample data for generating the neuron, and the neuron set in the first neuron network is written as: c1 ═ C1,c2,...,ci,...,ckAnd k is the number of neurons. The weight of a neuron may be expressed as W ═ W1,w2,...,wi,...,wkIn which wiIs the ith (1 ≦ 1)i is less than or equal to k) neurons ciThe weight of (2).
It will be appreciated that the distance of two neurons may be indicative of the proximity of two neurons, i.e. the greater the distance of two neurons the closer they are. A first preset threshold may be determined which is the maximum distance to determine whether the feature data can generate a new neuron.
The adjusting method in this step may be based on any dimension data in the feature data, and when the distance between the feature data and any one neuron in the first neuron network is greater than a first preset threshold, a new neuron is generated, and the neuron is added to the first neuron network, where the weight of the neuron is the feature data. Since the feature data may comprise multidimensional data, the above adjustment procedure is repeated for each one-dimensional data resulting in a second neural network.
Based on this, the second neuron network neuron set is C1∪ R, wherein C1R is the set of all new neurons generated from the feature data for the set of neurons in the first neuron network.
And S104, triggering a lean software classification model based on the second neuron network, and outputting the classification of the lean software.
Specifically, the first neuron network is obtained by sample data training, that is, the trained leson software classification model is based on the classification of leson software corresponding to the sample number data that the first neuron network can output. The second neuron network is obtained by adjusting the first neuron network based on the feature data of the software to be detected. Therefore, the method can trigger the Lesog software classification model to output the classification of the software to be detected based on the second neuron network.
Assuming that the sample data includes feature data of M types of lasso software of known types, the classification result of the lasso software classification model may include:
when the software to be detected is lasso software, if the type of the lasso software is one of M known types, the classification result is the type. And if the type of the lasso software is a new type, the classification result is a preset classification failure identifier. And when the software to be detected is not the Lesog software, the classification result is a preset classification failure identifier.
According to the technical scheme, the characteristic data of the software to be detected is used as the input data of the lasso software classification model, and the first neuron network included in the lasso software classification model is adjusted based on the characteristic data, so that the second neuron network is obtained. Obviously, through adjustment, the second neuron network can adaptively meet certain quantization error constraint, and meanwhile, the new type of the Lesox software is not learned by the Lesox software classification model under the condition that the classification result of the Lesox software classification model before adjustment is not influenced. Therefore, the method further triggers the lasso software classification model to output the lasso software classification based on the second neuron network. In conclusion, the method can identify the type of the lasso software of a known type, can also identify the type of the lasso software of a new type, and outputs the classification of the lasso software, thereby improving the accuracy of the lasso software classification.
Next, taking the lemo software running on the Windows system as the software to be detected as an example, the method for extracting the feature data mentioned in the above embodiment is described as follows:
the method for extracting the encryption API function characteristic data can be as follows: collecting characteristics of the encryption application program API class, wherein the characteristics optionally comprise encryption application program API class crypto, system process management class API class process, process service class APIServices, registry class APlregistry and resource class APlResource characteristic data.
The method for extracting the API call characteristic data can be as follows: and extracting the characteristic data of the API calling frequency by comparing the API calling of the Windows application programming interface of the Lesog software process with the baseline of the normal operating system behavior.
The method for extracting the characteristic data of the registry can be as follows: by monitoring the abnormal file system and registry activity, characteristic data including access, read, modification and deletion of the registry is extracted.
The method for extracting the file and folder operation characteristic data can be as follows: and acquiring file characteristics and folder characteristics by monitoring abnormal file and folder operations. Optionally, the folder characteristics data may include vectors describing sensitive files, operations of which are captured by the lasso software, wherein the sensitive files may be composed of "dll", "exe", and "jpg" extensions. The folder attribute data includes sensitive path delete, move, read, and traverse count value attribute data.
The method for extracting the memory feature data can be as follows: and scanning and analyzing the memory mirror image of the sandbox to obtain the characteristic data of the behavior tag.
The method for extracting the message characteristic data can be as follows: and acquiring the generated message characteristic data for the established http, tcp or udp connection.
The method for extracting the flow characteristic data can be as follows: file sharing traffic characterization data and session-based network traffic characterization data for accessing documents in the network shared volume are extracted.
It should be noted that the feature data extracted by the method is not limited to the above listed types of feature data, and that any one or more types of software data capable of characterizing the software to be detected can be extracted as the feature data in this step. In addition, the method for extracting the feature data is not limited.
Further, before the step S102 of inputting the feature data into the preset lesonable software classification model, the method may further include: and performing dimension reduction processing on the feature data.
Specifically, a sparse automatic encoder can be used for encoding and decoding, and the input neural network training process can be reconstructed. And the hidden layer is used for carrying out feature dimension reduction to remove redundant information generated by excessive Lesson software features.
It should be noted that the sparse automatic encoder is based on an unsupervised learning algorithm, and the encoder creates a hidden layer containing a low-dimensional vector of the meaning of the input data. The decoder then reconstructs the input data from the low-dimensional vectors of the hidden layer. Taking input data as characteristic data set X ═ { X ═ X1,x2,...,xλ,...,xnFor example, a dimension reduction method is explained. x is the number ofλRepresents the lambda-th one-dimensional data (1. ltoreq. lambda. ltoreq.n).
The vector of the hidden layer has the function of characteristic dimension reduction, and the loss function is as the following formula (1):
Figure BDA0002265829690000081
in equation (1):
Lsparsefor the overall loss function of the neural network, β is a sparse term penalty term parameter, L is the arithmetic mean of each data cost equation,
Figure BDA0002265829690000082
the average activation degree of the neurons in the network is represented by ρ, the expected activation degree of the neurons in the network is represented by KL divergence, and the KL divergence is optionally calculated according to the following formula (2):
Figure BDA0002265829690000083
in equation (2): rhojRepresents the average activation level of the jth neuron.
Figure BDA0002265829690000084
Representing the average degree of activation over the set of feature data, then,
Figure BDA0002265829690000085
the calculation method of (c) is as the following formula (3):
Figure BDA0002265829690000086
in equation (3): a isjIs one-dimensional data xλActivation value for jth hidden layer unit.
The embodiment of the present application describes an alternative specific implementation for adjusting the first neuron network based on the feature data. Fig. 2 is a schematic flow chart of an adjustment method disclosed in an embodiment of the present application, which may specifically include:
s201, generating a new neuron for any dimension first target data in the feature data. And adding the new neuron to the first neuron network.
The first target data is any one-dimensional data, and the distance between the first target data and any one neuron in the first neuron network is greater than a first preset threshold. Taking any dimension data X in the feature data set X as an example, an optional method for judging whether the feature data is the first target data is introduced.
Assuming that the set of neurons in the first neuron network is: c1 ═ C1,c2,...,ci,...,ckAnd k is the number of neurons. The weight of a neuron may be expressed as W ═ W1,w2,...,wi,...,wkIn which wiIs the ith (i is more than or equal to 1 and less than or equal to k) neuron ciThe weight of (2).
The method comprises the following steps:
a1 finding neuron set C1Two neurons in the series most similar to x, denoted as cs1,cs2
Specifically, x is encoded with a fixed set of codebook vectors, i.e., weights W ═ W of neurons, to achieve data compression1,w2,...,wi,...,wk}. At the time of fixing wiThen, the encoding process for x is to find the vector w closest to xi(x)To represent x. Thus, the problem ends up in finding a set of vectors that minimizes the expected error value e (w) of the signal reconstruction at the decoding end. As shown in the following equation (4):
Figure BDA0002265829690000091
p (x) in equation (4) is a probability density function of the signal x, and generally e (w) without an optimal solution in a closed form needs to be optimized by an iterative algorithm. In the step, formula (4) is optimized by using a gradient descent algorithm to obtain a more novel type, and the algorithm is converged to a local optimal solution by further traversing multiple times of training data, as shown in the following formula (5):
Figure BDA0002265829690000092
in formula (5)
Figure BDA0002265829690000093
Is the t-th time and input vector x(t)The closest codebook vector, α (t), is the update step size.
Further, a neighborhood function is added on the basis of the formula (5) to control the sensitivity of different codebook vectors to the input, so that a mapping with spatial ordering can be formed, and at this time, the updating formula of the codebook vectors is shown as the following formula (6):
Figure BDA0002265829690000101
σ in equation (6) is a parameter for adjusting the neighborhood function. Typically, h enables only those codebook vectors that are closest to the input pattern to participate in the competition and move towards the input data.
Based on the method, two winning codebook vectors can be obtained, and the two codebook vectors are the neuron set C1Two neurons c closest to xs1And cs2
A2, judging whether x and c are satisfied simultaneouslys1Is greater than cs1A first predetermined threshold value of, and, x and cs2Is greater than cs2If the first preset threshold value is met, judging x to be the first target data, otherwise, judging x to be the second target data.
It will be understood that cs1And cs2The distance between x and all the neurons can be larger than the first preset threshold value only by meeting the condition that the distances between x and the two neurons are larger than the first preset threshold value.
Method for determining first preset threshold provided by embodiment of applicationTo this end, the first preset threshold is adaptively updated for any one neuron. Thus, for each neuron, its corresponding first preset threshold is calculated. With cs1For example, for the preset threshold Ts1The calculation of (a) is introduced as follows:
if neuron cs1And C1The first preset threshold value is determined according to the preset maximum intra-class distance, and optionally, the first preset threshold value may be an estimated value of the maximum intra-class distance, that is, the first preset threshold value is determined according to the preset maximum intra-class distance
Figure BDA0002265829690000102
Wherein N isS1Is a and cs1A collection of connected neurons.
If neuron cs1And C1If the other neurons are not connected, the first predetermined threshold is determined according to a predetermined minimum inter-class distance, and optionally, the first predetermined threshold may be an estimated value of the minimum inter-class distance, that is, the first predetermined threshold is an estimated value of the minimum inter-class distance
Figure BDA0002265829690000103
Wherein C is1/{cs1Is C1Except for cs1A collection of other neurons than the one shown.
Based on this, c is obtaineds1First preset threshold value Ts1And cs2First preset threshold value Ts2Further judging whether | x-w is satisfieds1||>Ts1And | | | x-ws2||>Ts2And if yes, determining that the one-dimensional data x is the first target data.
It should be noted that the size of the first preset threshold has an important influence on the generation of a new neuron, and if the first preset threshold is too small, the one-dimensional data can be more easily regarded as a new pattern to generate a neuron; if the value of T is too large, the number of neurons will be too small. Therefore, the method adopts the self-adaptive mode to continuously update each neuron, so that the first preset threshold can adapt to the continuously changing input mode.
Optionally, the determining whether the one-bit data is the first target dataThe method of (1) can also include a variety of, e.g., traversing the computation of the distance T of each neuron in the first neuron network from the one-dimensional dataiAnd determining any distance TiWhether it is greater than a first preset threshold. The present application does not limit the determination method.
A3, when the one-dimensional data x is the first target data as a result of the determination, generating a new neuron r based on the first target data x, and setting the weight of the neuron r as the first target data. Adding new neuron into the first neuron network to obtain a second neuron network, wherein the neuron of the second neuron network is C1∪{r}。
S202, connecting the first target neuron, and setting the age of the edge connected with the first target neuron to be a preset initial value.
The first target neuron is two neurons in the first neuron network that are closest to the first target data, and the obtaining method may refer to step a1 described above. For example, if cs1And cs2For a first target neuron of the first target data x, this step connects cs1And cs2And c iss1And cs2Lc ofs1Is set to a preset initial value of 0, i.e., ages1,s2=0。
S203, increasing the age of the target edge by one unit value.
The target edge is an edge with one end connected with the first target neuron and the other end connected with the non-first target neuron. For example, the first target neuron cs1And non-first target neuron cqAre connected, then are connected cs1And cqThe edge of (2) is the target edge. It should be noted that the first target neuron and the non-first target neuron are directed to the same one-dimensional data.
Alternatively, the unit value is 1, i.e., age's1,q=ages1,q+1。
And S204, deleting edges with the age larger than a second preset threshold value in the second neuron network.
The second preset threshold is set according to requirements, and when new one-dimensional data is input, a new second neuron network is generated, wherein the number or the structure of the neurons is correspondingly changed. When the age of the edge of two neurons is larger than the second preset threshold, it indicates that the two neurons do not become the first target neuron at the same time in the updating process again, i.e. the two neurons are no longer close to each other, so the connection between the two neurons can be disconnected.
It is to be understood that the feature data may include a plurality of one-dimensional data, and the first neuron network is adjusted by the above method based on each one-dimensional data to obtain the second neuron network, that is, the second neuron network obtained by adjusting the first neuron network based on the current one-dimensional data may be regarded as the first neuron network of the one-dimensional data input after the current one-dimensional data. Therefore, when all the one-dimensional data are input into the Lesoh software classification model, the obtained neuron set included in the second neuron network is C2=C1∪ R, wherein C1R is the set of all new neurons generated from the feature data for the set of neurons in the first neuron network.
It should be noted that, defining data in the feature data that is not the first target data as the second target data, for the second target data in any dimension, the method provides a method for adjusting the first neuron network based on the second target data. The following were used:
when the one-dimensional data is second target data, the weight of any second target neuron is increased by a first value.
The second target neurons are two neurons closest to the second target data in the first neuron network, and the method for calculating the second target neurons may refer to step a 1. Then, for any neuron, the first value corresponding to that neuron is inversely related to the sum of the number of times that neuron becomes the first target neuron and the second target neuron.
Alternatively, the first value is a learning rate parameter ∈ (τ) for each neuron, τ being the sum of the number of times that the neuron becomes the first target neuron and the second target neuron for any neuron, and the learning rate parameter ∈ (τ) is calculated as in the following formula (7).
Figure BDA0002265829690000121
The form of ∈ (τ) needs to satisfy a certain condition, and the constraint condition calculation method is as shown in the following equation (8).
Figure BDA0002265829690000131
The formula (8) makes the movement of the neuron finally stable and finally converged without continuous oscillation. The first value of the formula (8) is satisfied, and a certain learning ability of each neuron under the condition of gradual stabilization is ensured.
Based on this, any one second target neuron c of the one-dimensional data x is selectediThe method of increasing the weight of (b) by the first value is shown in the following equation (9).
wi'=wi+ε(τi)(x-wi) (9)
Wherein, wi' is the updated weight, wiBefore update ciWeight of (d), ε (τ)i) Is the second target neuron ciI.e. the first value. It will be appreciated that x-wiThe local quantization error is indicated.
It is understood that no new neurons are generated, but the weights of the neurons of the first neuron network are changed, thereby obtaining a second neuron network.
Furthermore, because noise exists in the input feature data, the method can also cluster the neurons in the second neuron network after the second neuron network is obtained, and remove the noise neurons.
Specifically, noise or outliers tend to exist in the input feature data, so that part of neurons which may be in a low-density region can be searched and deleted. Therefore, this step treats neurons that do not belong to any one cluster as noise neurons, and deletes the noise neurons and all connections of the neurons.
In addition, at the junction of two clusters which are relatively close to each other, the density of neurons is also significantly lower than that of the central region of the cluster, so that the neurons of this part and the connections related to the neurons can also be deleted, thereby stably distinguishing two different clusters.
It should be noted that the denoising process may be performed periodically, for example, denoising may be performed once after the one-dimensional data is input to obtain the second neuron network, or denoising may be performed once when the number of the input one-dimensional data is an integer multiple of a preset period.
It should be noted that, in the training process of the lasso software classification model, the training method for obtaining the first neuron network may refer to the above adjustment method, where the initial neuron network includes two initial neurons, any initial neuron is any sample data, the weight of the initial neuron is the sample data, further sample data is input, and the initial neuron is adjusted based on the above adjustment method to obtain the first neuron network. And triggering the Lesox software classification model to be trained to output a target classification result, namely the type of Lesox software to which the sample data belongs, based on the first neuron network. Therefore, training of the Lexus software classification model is completed.
The embodiment of the present application further provides a detecting device for lasso software, which is described below, and the detecting device for lasso software described below and the detecting method for lasso software described above may be referred to correspondingly.
Referring to fig. 3, a schematic structural diagram of a detecting apparatus for lasso software according to an embodiment of the present application is shown, and as shown in fig. 3, the apparatus may include:
a feature extraction unit 301 for extracting feature data;
a data input unit 302, configured to input the feature data into a preset leso software classification model, where the leso software classification model includes a first neuron network;
a network adjusting unit 303, configured to adjust the first neuron network based on the feature data to obtain a second neuron network;
and the data output unit 304 is used for triggering the lasso software classification model to output the lasso software classification based on the second neuron network.
Optionally, the network adjusting unit is configured to adjust the first neuron network based on the feature data to obtain a second neuron network, and includes: the network adjusting unit is specifically configured to:
generating new neurons for any dimension first target data in the feature data, wherein the weight of the new neurons is the first target data, and the first target data is any one-dimensional data, the distance between the first target data and any one neuron in the first neuron network is greater than a first preset threshold;
and adding the new neuron into the first neuron network to obtain the second neuron network.
Optionally, the network adjusting unit is configured to adjust the first neuron network based on the feature data to obtain a second neuron network, and further includes: the network adjusting unit is specifically configured to:
connecting a first target neuron in the second neuron network, and setting the age of an edge connected with the first target neuron to a preset initial value; the first target neuron is two neurons in the first neuron network which are closest to the first target data;
in the second neuron network, increasing the age of a target edge by a unit value, wherein the target edge is an edge with one end connected with the first target neuron and the other end connected with a non-first target neuron;
deleting edges in the second neuron network whose age is greater than a second preset threshold.
Optionally, the network adjusting unit is configured to adjust the first neuron network based on the feature data to obtain a second neuron network, and further includes: the network adjusting unit is specifically configured to:
for any dimension of second target data in the feature data, increasing the weight of a second target neuron by a first numerical value to obtain a second neuron network;
the second target data is any dimension data which is not the first target data in the characteristic data; the second target neuron is two neurons in the first neuron network that are closest to the second target data, and the first value is inversely related to a sum of times the neuron becomes the first target neuron or the second target neuron.
Optionally, the apparatus further comprises a threshold determining unit, configured to determine the first preset threshold of any one neuron according to the following manner:
if the neuron is connected with other neurons, the first preset threshold value is determined according to a preset maximum intra-class distance;
if the neuron is not connected with other neurons, the first preset threshold value is determined according to a preset minimum inter-class distance.
Optionally, the feature extraction unit is configured to extract feature data, and includes: the feature extraction unit is specifically configured to:
extracting encryption API function characteristic data, API call characteristic data, registry characteristic data, file and folder operation characteristic data, memory characteristic data, message characteristic data and flow characteristic data.
Optionally, the apparatus further comprises: and the data dimension reduction unit is used for performing dimension reduction processing on the characteristic data before the characteristic data is input into a preset Lesoh software classification model.
Optionally, the apparatus further comprises: a network denoising unit, configured to, after the adjusting the first neuron network based on the feature data to obtain the second neuron network:
clustering neurons in the second neuron network; and removing noise neurons, which are neurons that do not belong to any one cluster.
The detection device for the lasso software provided by the embodiment of the application can be applied to detection equipment of the lasso software, such as a PC terminal, a cloud platform, a server cluster and the like. Referring to fig. 4, a schematic structural diagram of a detection device of the lasso software is shown, where the device may include: at least one processor 401, at least one communication interface 402, at least one memory 403 and at least one communication bus 404;
in the embodiment of the present application, the number of the processor 401, the communication interface 402, the memory 403 and the communication bus 404 is at least one, and the processor 401, the communication interface 402 and the memory 403 complete communication with each other through the communication bus 404;
the processor 401 may be a central processing unit CPU, or an application specific Integrated circuit asic, or one or more Integrated circuits configured to implement embodiments of the present invention, or the like;
the memory 403 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;
wherein the memory stores a program and the processor can call the program stored in the memory, the program for:
extracting characteristic data;
inputting the characteristic data into a preset lean software classification model, wherein the lean software classification model comprises a first neuron network;
adjusting the first neuron network based on the feature data to obtain a second neuron network;
and triggering the lasso software classification model to output the lasso software classification based on the second neuron network.
Alternatively, the detailed function and the extended function of the program may be as described above.
Embodiments of the present application further provide a storage medium, where a program suitable for execution by a processor may be stored, where the program is configured to:
extracting characteristic data;
inputting the characteristic data into a preset lean software classification model, wherein the lean software classification model comprises a first neuron network;
adjusting the first neuron network based on the feature data to obtain a second neuron network;
and triggering the lasso software classification model to output the lasso software classification based on the second neuron network.
Alternatively, the detailed function and the extended function of the program may be as described above.
The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A method for detecting lasso software, comprising:
extracting characteristic data;
inputting the characteristic data into a preset lean software classification model, wherein the lean software classification model comprises a first neuron network;
adjusting the first neuron network based on the feature data to obtain a second neuron network;
and triggering the lasso software classification model to output the lasso software classification based on the second neuron network.
2. The method of claim 1, wherein the adjusting the first neuron network to obtain the second neuron network based on the feature data comprises:
generating new neurons for any dimension first target data in the feature data, wherein the weight of the new neurons is the first target data, and the first target data is any one-dimensional data, the distance between the first target data and any one neuron in the first neuron network is greater than a first preset threshold;
and adding the new neuron into the first neuron network to obtain the second neuron network.
3. The method of claim 2, wherein the adjusting the first neuron network based on the feature data to obtain the second neuron network further comprises:
connecting a first target neuron in the second neuron network, and setting the age of an edge connected with the first target neuron to a preset initial value; the first target neuron is two neurons in the first neuron network which are closest to the first target data;
in the second neuron network, increasing the age of a target edge by a unit value, wherein the target edge is an edge with one end connected with the first target neuron and the other end connected with a non-first target neuron;
deleting edges in the second neuron network whose age is greater than a second preset threshold.
4. The method of claim 2, wherein the adjusting the first neuron network based on the feature data to obtain the second neuron network further comprises:
for any dimension of second target data in the feature data, increasing the weight of a second target neuron by a first numerical value to obtain a second neuron network;
the second target data is any dimension data which is not the first target data in the characteristic data; the second target neuron is two neurons in the first neuron network that are closest to the second target data, and the first value is inversely related to a sum of times the neuron becomes the first target neuron or the second target neuron.
5. The method of claim 2, wherein the first predetermined threshold of any neuron is determined by:
if the neuron is connected with other neurons, the first preset threshold value is determined according to a preset maximum intra-class distance;
if the neuron is not connected with other neurons, the first preset threshold value is determined according to a preset minimum inter-class distance.
6. The method according to any one of claims 1-5, wherein said extracting feature data comprises:
extracting encryption API function characteristic data, API call characteristic data, registry characteristic data, file and folder operation characteristic data, memory characteristic data, message characteristic data and flow characteristic data.
7. The method according to any one of claims 1-5, further comprising, prior to said inputting said feature data into a preset Lexus software classification model:
and performing dimension reduction processing on the feature data.
8. The method of claim 1, further comprising, after the adjusting the first neuron network based on the feature data to obtain the second neuron network:
clustering neurons in the second neuron network;
and removing noise neurons, wherein the noise neurons are neurons which do not belong to any one cluster.
9. A detecting device for lasso software, comprising:
a feature extraction unit for extracting feature data;
the data input unit is used for inputting the characteristic data into a preset Lesox software classification model, and the Lesox software classification model comprises a first neuron network;
a network adjusting unit, configured to adjust the first neuron network based on the feature data to obtain a second neuron network;
and the data output unit is used for triggering the lasso software classification model to output the lasso software classification based on the second neuron network.
10. A lasso software detection apparatus, comprising: a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the lasso software detection method according to any one of claims 1 to 8.
11. A storage medium having a program stored thereon, wherein the program, when executed by a processor, implements the steps of the method of detecting lasso software according to any of claims 1 to 8.
CN201911087368.5A 2019-11-08 2019-11-08 Method, device and equipment for detecting lasso software and storage medium Active CN110837638B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911087368.5A CN110837638B (en) 2019-11-08 2019-11-08 Method, device and equipment for detecting lasso software and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911087368.5A CN110837638B (en) 2019-11-08 2019-11-08 Method, device and equipment for detecting lasso software and storage medium

Publications (2)

Publication Number Publication Date
CN110837638A true CN110837638A (en) 2020-02-25
CN110837638B CN110837638B (en) 2020-09-01

Family

ID=69574759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911087368.5A Active CN110837638B (en) 2019-11-08 2019-11-08 Method, device and equipment for detecting lasso software and storage medium

Country Status (1)

Country Link
CN (1) CN110837638B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400715A (en) * 2020-06-04 2020-07-10 鹏城实验室 Classification engine diagnosis method, classification engine diagnosis device and computer-readable storage medium
CN111600893A (en) * 2020-05-19 2020-08-28 山石网科通信技术股份有限公司 Lexus software defense method, device, storage medium, processor and host

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102651088A (en) * 2012-04-09 2012-08-29 南京邮电大学 Classification method for malicious code based on A_Kohonen neural network
CN104077524A (en) * 2013-03-25 2014-10-01 腾讯科技(深圳)有限公司 Training method used for virus identification and virus identification method and device
CN104834857A (en) * 2015-03-27 2015-08-12 清华大学深圳研究生院 Method and device for detecting Android malicious software in batch
CN105989288A (en) * 2015-12-31 2016-10-05 武汉安天信息技术有限责任公司 Deep learning-based malicious code sample classification method and system
CN106203103A (en) * 2016-06-23 2016-12-07 百度在线网络技术(北京)有限公司 The method for detecting virus of file and device
CN107273747A (en) * 2017-05-22 2017-10-20 中国人民公安大学 The method for extorting software detection
CN107392024A (en) * 2017-08-08 2017-11-24 微梦创科网络科技(中国)有限公司 A kind of recognition methods of rogue program and device
CN108334781A (en) * 2018-03-07 2018-07-27 腾讯科技(深圳)有限公司 Method for detecting virus, device, computer readable storage medium and computer equipment
CN108460277A (en) * 2018-02-10 2018-08-28 北京工业大学 A kind of automation malicious code mutation detection method
CN108985055A (en) * 2018-06-26 2018-12-11 东北大学秦皇岛分校 A kind of detection method and system of Malware
US20180365420A1 (en) * 2017-06-16 2018-12-20 AO Kaspersky Lab System and method of detecting malicious files with the use of elements of static analysis
CN109165688A (en) * 2018-08-28 2019-01-08 暨南大学 A kind of Android Malware family classification device construction method and its classification method
CN109241738A (en) * 2018-07-09 2019-01-18 四川大学 It is a kind of that software detection technology is extorted based on deep learning
CN109711160A (en) * 2018-11-30 2019-05-03 北京奇虎科技有限公司 Application program detection method, device and nerve network system

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102651088A (en) * 2012-04-09 2012-08-29 南京邮电大学 Classification method for malicious code based on A_Kohonen neural network
CN104077524A (en) * 2013-03-25 2014-10-01 腾讯科技(深圳)有限公司 Training method used for virus identification and virus identification method and device
CN104834857A (en) * 2015-03-27 2015-08-12 清华大学深圳研究生院 Method and device for detecting Android malicious software in batch
CN105989288A (en) * 2015-12-31 2016-10-05 武汉安天信息技术有限责任公司 Deep learning-based malicious code sample classification method and system
CN106203103A (en) * 2016-06-23 2016-12-07 百度在线网络技术(北京)有限公司 The method for detecting virus of file and device
CN107273747A (en) * 2017-05-22 2017-10-20 中国人民公安大学 The method for extorting software detection
US20180365420A1 (en) * 2017-06-16 2018-12-20 AO Kaspersky Lab System and method of detecting malicious files with the use of elements of static analysis
CN107392024A (en) * 2017-08-08 2017-11-24 微梦创科网络科技(中国)有限公司 A kind of recognition methods of rogue program and device
CN108460277A (en) * 2018-02-10 2018-08-28 北京工业大学 A kind of automation malicious code mutation detection method
CN108334781A (en) * 2018-03-07 2018-07-27 腾讯科技(深圳)有限公司 Method for detecting virus, device, computer readable storage medium and computer equipment
CN108985055A (en) * 2018-06-26 2018-12-11 东北大学秦皇岛分校 A kind of detection method and system of Malware
CN109241738A (en) * 2018-07-09 2019-01-18 四川大学 It is a kind of that software detection technology is extorted based on deep learning
CN109165688A (en) * 2018-08-28 2019-01-08 暨南大学 A kind of Android Malware family classification device construction method and its classification method
CN109711160A (en) * 2018-11-30 2019-05-03 北京奇虎科技有限公司 Application program detection method, device and nerve network system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BIN ZHANG等: ""Ransomware classification using patch-based CNN and self-attention network on embedded N-grams of opcodes"", 《FUTURE GENERATION COMPUTER SYSTEMS》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111600893A (en) * 2020-05-19 2020-08-28 山石网科通信技术股份有限公司 Lexus software defense method, device, storage medium, processor and host
CN111400715A (en) * 2020-06-04 2020-07-10 鹏城实验室 Classification engine diagnosis method, classification engine diagnosis device and computer-readable storage medium

Also Published As

Publication number Publication date
CN110837638B (en) 2020-09-01

Similar Documents

Publication Publication Date Title
CN110958220B (en) Network space security threat detection method and system based on heterogeneous graph embedding
US20220327409A1 (en) Real Time Detection of Cyber Threats Using Self-Referential Entity Data
JP7441582B2 (en) Methods, devices, computer-readable storage media and programs for detecting data breaches
Su et al. TAP: A personalized trust-aware QoS prediction approach for web service recommendation
CN106713324B (en) Flow detection method and device
Ficco Malware analysis by combining multiple detectors and observation windows
Alazab et al. Zero-day Malware Detection based on Supervised Learning Algorithms of API call Signatures.
CN110362999B (en) Method and device for detecting account use abnormity
US9224067B1 (en) System and methods for digital artifact genetic modeling and forensic analysis
CN109840413B (en) Phishing website detection method and device
Hyvärinen Statistical models of natural images and cortical visual representation
EP4258610A1 (en) Malicious traffic identification method and related apparatus
CN110837638B (en) Method, device and equipment for detecting lasso software and storage medium
CN111614599A (en) Webshell detection method and device based on artificial intelligence
Wolfe et al. Comprehensive behavior profiling for proactive Android malware detection
CN113761359B (en) Data packet recommendation method, device, electronic equipment and storage medium
CN109033845B (en) Pretend detection method and system based on file access record space-time analysis
CN113807073B (en) Text content anomaly detection method, device and storage medium
CN115344863A (en) Malicious software rapid detection method based on graph neural network
CN116662817A (en) Asset identification method and system of Internet of things equipment
Wang et al. An evolutionary computation-based machine learning for network attack detection in big data traffic
CN109492844B (en) Method and device for generating business strategy
CN113746780B (en) Abnormal host detection method, device, medium and equipment based on host image
Atacak et al. Android malware detection using hybrid ANFIS architecture with low computational cost convolutional layers
US11017055B2 (en) Hotspots for probabilistic model testing and cyber analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant