CN111444507A

CN111444507A - Method, device, equipment and storage medium for judging whether shell-added software is misinformed

Info

Publication number: CN111444507A
Application number: CN202010540255.2A
Authority: CN
Inventors: 张伟哲; 乔延臣; 方滨兴; 张宾
Original assignee: Peng Cheng Laboratory
Current assignee: Peng Cheng Laboratory
Priority date: 2020-06-15
Filing date: 2020-06-15
Publication date: 2020-07-24
Anticipated expiration: 2040-06-15
Also published as: CN111444507B

Abstract

The application discloses a method, a device, equipment and a storage medium for judging whether shell-added software is misinformed, wherein the method comprises the following steps: when data to be processed of target software is detected, acquiring a target classification result of whether malicious codes exist in the data to be processed; determining a first target decision significant vector of the data to be processed, which maps to the target classification result, and a second target decision significant vector of the data to be processed, which maps to the result of malicious code; acquiring a mean square error baseline, and determining a judgment result of whether the shell adding software is misinformed or not based on the mean square error baseline, the first target decision significant vector and the second target decision significant vector; the mean square error baseline is determined by the shelling decision vector of each piece of shelling software with a preset false alarm tag and the corresponding malicious code decision vector. The method and the device solve the problem that in the prior art, whether the shell-added software has the malicious codes or not is not identified, so that the identification accuracy of the malicious codes is low.

Description

Method, device, equipment and storage medium for judging whether shell-added software is misinformed

Technical Field

The application relates to the technical field of artificial intelligence of financial technology (Fintech), in particular to a method, a device, equipment and a storage medium for judging whether shell adding software is misinformed.

Background

At present, for the security of software, malicious codes of software often need to be searched and killed, the modes for searching and killing malicious codes include a feature code mode, and in order to avoid searching and killing of the feature code mode, a malicious code developer uses a method such as shell adding to process malicious codes, so that a generated novel malicious code file no longer presents detectable features, in order to improve the detection capability of novel malicious codes, the features of a shell adding region in the software are often used for malicious code detection, and the features of the shell adding region in the software are used for malicious code detection, so that false detection results are increased, for example, according to research of rahbrina et al in 2014, 58% of malicious codes and 54% of normal software use known shells, 69 of the shells (including INNO, UPX and the like) are used by the malicious codes and the normal software at the same time, and 96.7% of the normal software added with the shell is determined as malicious codes, the software killing manufacturer continues to train a detection engine and the like by using the file identified as the malicious code but actually normal software, so that more false positives are caused, namely, at present, identifying whether the shell-added software has the false positives of the malicious code or not is a problem which needs to be solved urgently.

Disclosure of Invention

The application mainly aims to provide a method, a device, equipment and a storage medium for judging whether the shell-added software is misinformed, and aims to solve the technical problem that in the prior art, whether the shell-added software has the malicious codes or not is not identified, so that the identification accuracy of the malicious codes is low.

In order to achieve the above object, the present application provides a method for determining whether shell adding software is misinformed, where the method for determining whether shell adding software is misinformed includes:

when data to be processed of target software is detected, acquiring a target classification result of whether malicious codes exist in the data to be processed;

determining a first target decision significant vector of the result of the target classification result pointed to by the mapping of the data to be processed, and determining a second target decision significant vector of the result of the malicious code pointed to by the mapping of the data to be processed;

acquiring a mean square error baseline for determining whether false alarm exists, and determining a judgment result of whether the shell adding software has false alarm or not based on the mean square error baseline, the first target decision significant vector and the second target decision significant vector;

the mean square error base line is determined by the decision vector of each piece of shell adding software with a preset false alarm tag and the decision vector of the corresponding malicious code.

Alternatively,

the step of determining the mean square error baseline by the shelling decision vector of each piece of shelling software with a preset false alarm tag and the corresponding malicious code decision vector comprises the following steps:

the mean square error baseline is obtained based on a preset coding model, and the preset coding model is a target model meeting preset conditions obtained after a preset basic model is trained based on a training set which is provided with a preset false alarm label and comprises a shell decision vector and a malicious code decision vector and used for shell software.

Optionally, before the step of obtaining a mean square error baseline for determining whether there is a false positive, the method comprises:

acquiring a training set comprising a shelling decision vector and a malicious code decision vector of shelling software with a preset false alarm tag, and training a preset basic model to obtain a target model meeting preset conditions, wherein the preset conditions comprise preset loss function convergence;

and setting the target model as the preset coding model.

Optionally, the step of determining that the mapping of the data to be processed points to the first target decision significant vector of the target classification result includes:

acquiring mapping values of the data to be processed mapped to all the classification results to obtain a maximum mapping value pointing to the target classification result;

determining a first disturbance strength of each byte vector in the data to be processed to the maximum mapping value;

determining a first target decision significant vector pointing to the target classification result based on the first perturbation strength.

Optionally, before the step of obtaining a target classification result of whether malicious code exists in the data to be processed when the data to be processed of the target software is detected, the method includes:

acquiring a portable execution body PE file of target software;

determining the file volume of the PE file, and obtaining a comparison result of the file volume and a preset volume;

and preprocessing the PE file of the portable executive body based on the comparison result to obtain data to be processed.

Optionally, the step of determining a second target decision significant vector of the result of the data to be processed, which maps to point to malicious code, includes:

partitioning the data to be processed to obtain partitioned data to be processed;

and determining a second disturbance strength of each to-be-processed partitioned data to the classification result so as to determine a second target decision significant vector of the to-be-processed data, which points to the result of the malicious code.

Optionally, the step of determining a second perturbation strength of each to-be-processed partitioned data on the classification result to determine a second target decision significant vector of the to-be-processed data, which maps to a result of the malicious code, includes:

determining a second disturbance strength of each to-be-processed partition data to the classification result, and determining a second target disturbance strength with the maximum disturbance strength to obtain target to-be-processed partition data corresponding to the second target disturbance strength;

determining third disturbance strength of each byte vector pair mapping result pointing to malicious codes in target partition data to be processed, and determining a second target decision significant vector pointing to the malicious code result.

Optionally, the step of determining a result of determining whether the software for adding shells is false-positive or not based on the mean square error baseline, the first significant vector for target decision and the second significant vector for target decision includes:

determining a target mean square error of the first and second significant-of-target-decision vectors;

and if the target mean square error is smaller than the mean square error baseline, determining whether the shell adding software has false alarm or not as false alarm.

The application also provides a judging device for judging whether the shell adding software is misinformed, wherein the judging device for judging whether the shell adding software is misinformed comprises:

the first acquisition module is used for acquiring a target classification result of whether malicious codes exist in data to be processed when the data to be processed of target software is detected;

a first determining module, configured to determine a first target decision significant vector of the result that the mapping of the to-be-processed data points to the target classification result, and determine a second target decision significant vector of the result that the mapping of the to-be-processed data points to malicious code;

a second obtaining module, configured to obtain a mean square error baseline used for determining whether the software is misinformed, and determine a result of determining whether the software is misinformed based on the mean square error baseline, the first target decision significant vector, and the second target decision significant vector;

Alternatively,

the second obtaining module is configured to:

Optionally, the device for determining whether the software for adding the shell is false comprises:

a third obtaining module, configured to obtain a training set of a shelled software with a preset false alarm tag, where the training set includes a shelled decision vector and a malicious code decision vector, and train a preset base model to obtain a target model meeting a preset condition, where the preset condition includes preset loss function convergence;

and the setting module is used for setting the target model as the preset coding model.

Optionally, the first determining module includes:

a first obtaining unit, configured to obtain a mapping value that the to-be-processed data is mapped to each classification result, so as to obtain a maximum mapping value that points to the target classification result;

a first determining unit, configured to determine a first disturbance strength of each byte vector in the to-be-processed data to the maximum mapping value;

a second determining unit, configured to determine, based on the first perturbation degree, a first target decision significant vector pointing to the target classification result.

the fourth acquisition module is used for acquiring the PE file of the portable execution body of the target software;

the second determining module is used for determining the file volume of the PE file and obtaining a comparison result of the file volume and a preset volume;

and the fifth acquisition module is used for preprocessing the PE file of the portable execution body based on the comparison result to obtain data to be processed.

Optionally, the first determining module includes:

the partitioning unit is used for partitioning the data to be processed to obtain partitioned data to be processed;

and the third determining unit is used for determining a second disturbance strength of each to-be-processed partition data to the classification result so as to determine a second target decision significant vector of the result of the malicious code pointed by the mapping of the to-be-processed data.

Optionally, the third determining unit includes:

the first determining subunit is configured to determine a second disturbance strength of each to-be-processed partition data to the classification result, and determine a second target disturbance strength with the largest disturbance strength to obtain target to-be-processed partition data corresponding to the second target disturbance strength;

and the second determining subunit is used for determining a third disturbance strength of each byte vector pair in the target partition data to be processed, which is mapped to the result pointing to the malicious code, and determining a second target decision significant vector pointing to the result of the malicious code.

Optionally, the first determining module includes:

a third determining module, configured to determine a target mean square error of the first and second significant objective decision vectors;

and the fourth determining module is used for determining whether the shell adding software has a false alarm or not as a false alarm if the target mean square error is smaller than the mean square error baseline.

The application also provides a judging device whether shell adding software is misinformed, the judging device whether shell adding software is misinformed is an entity device, and the judging device whether shell adding software is misinformed comprises: the processor executes the program of the method for judging whether the shell-added software is misinformed, and the steps of the method for judging whether the shell-added software is misinformed can be realized.

The application also provides a storage medium, wherein the storage medium is stored with a program for realizing the method for judging whether the software added with the shell is misinformed, and the program for judging whether the software added with the shell is misinformed realizes the steps of the method for judging whether the software added with the shell is misinformed when being executed by a processor.

When data to be processed of target software is detected, a target classification result of whether malicious codes exist in the data to be processed is obtained; determining a first target decision significant vector of the result of the target classification result pointed to by the mapping of the data to be processed, and determining a second target decision significant vector of the result of the malicious code pointed to by the mapping of the data to be processed; acquiring a mean square error baseline for determining whether false alarm exists, and determining a judgment result of whether the shell adding software has false alarm or not based on the mean square error baseline, the first target decision significant vector and the second target decision significant vector; the mean square error base line is determined by the decision vector of each piece of shell adding software with a preset false alarm tag and the decision vector of the corresponding malicious code. In the present application, after classifying whether malicious codes exist in target software, whether a false alarm exists in a classification result is identified, specifically, a first target decision significant vector of the to-be-processed data that points to the target classification result is determined, and a second target decision significant vector of the to-be-processed data that points to a result of malicious codes is determined, so that based on a mean square error baseline for determining whether false alarm exists (determined based on a shell decision vector of each shell software having a preset false alarm tag and a corresponding malicious code decision vector, and thus whether false alarm exists can be accurately determined), the first target decision significant vector and the second target decision significant vector accurately determine a determination result whether the shell software is false alarm, that is, in the present embodiment, by identifying whether the shell software has a false alarm of malicious codes, and the identification accuracy of the malicious codes is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow chart illustrating a first embodiment of a method for determining whether shell adding software is misinformed according to the present application;

fig. 2 is a detailed flowchart illustrating a step of determining whether the shell adding software misreported to the voice data to be recognized in the first embodiment of the method for determining whether the shell adding software misreported to obtain candidate results of the voice data to be recognized;

FIG. 3 is a schematic diagram of an apparatus configuration of a hardware operating environment according to an embodiment of the present application;

fig. 4 is a scene schematic diagram of the method for determining whether the shell adding software misreports.

The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In a first embodiment of the method for determining whether the software is misinformed, referring to fig. 1, the method for determining whether the software is misinformed includes:

step S10, when the data to be processed of the target software is detected, acquiring a target classification result of whether the data to be processed has malicious codes;

step S20, determining a first objective decision significant vector of the target classification result to which the mapping of the to-be-processed data points, and determining a second objective decision significant vector of the result of malicious code to which the mapping of the to-be-processed data points;

step S30, obtaining a mean square error baseline for determining whether there is a false positive, and determining a result of determining whether there is a false positive in the shell software based on the mean square error baseline, the first target decision significant vector, and the second target decision significant vector;

in this embodiment, the method for determining whether the software is misinformed is applied to a system for determining whether the software is misinformed, and the system for determining whether the software is misinformed belongs to a device for determining whether the software is misinformed, wherein the system for determining whether the software is misinformed is in communication connection with each software through an interface to determine whether malicious codes exist in each software, so as to obtain a determination result, and after the determination result is obtained, the system for determining whether the software is misinformed also identifies whether the determination result is misinformed, so as to obtain an identification result, wherein it should be noted that the determination of whether malicious codes exist in each software may not be determined in the system for determining whether the software is misinformed, that is, based on the determination result of whether malicious codes exist in other devices or systems determined by each software, and judging whether the judgment result has false alarm or not in a judging system for judging whether the shell adding software has false alarm or not.

When the data to be processed of the target software is detected, a vector to be processed of the data to be processed is obtained, and a target classification result of the vector to be processed is obtained, wherein the target classification result of the vector to be processed is obtained, and the target classification result can be obtained by inputting the data to be processed (the vector to be processed) into a preset classification model (shown in fig. 4, a model in a preset multi-layer Perceptron (M L P) trained by the multi-layer Perceptron) in a system for judging whether the software is misreported or not when the data to be processed (the vector to be processed) of the target software is detected in a system for judging whether the software is misreported or not, the model can be a model in a multi-layer Perceptron (M L P) trained by the multi-layer Perceptron, the multi-layer Perceptron (M L P) is a feedforward artificial neural network, and the target classification result can be quickly learned by using the feature learning capability and the nonlinear feature expression capability of the deep neural network, so that how many multi-layer Perceptron (M) in the model) can be directly input into a set of multi-layer classifier, and the target classification result can be directly input by a set of the multi-layer classifier, so that the multi-layer classifier is directly input by the multi-layer Perceptron (M L) of the target classification result, the target classification result can be directly input device, and the target classification result can be directly input by the multi-layer neural network.

The preset classification model is a trained model for classifying the data to be processed, and the classification result comprises: the type of malicious code is present or the type of malicious code is not present.

In this embodiment, possible false positives include: and false reporting the type software with the malicious codes as the type software without the malicious codes, or false reporting the type software without the malicious codes as the type software with the malicious codes. In the embodiment, the specific description is given by taking the example that the type software without malicious code is misinformed as the type software with malicious code.

Before the step of obtaining the target classification result of whether the malicious code exists in the data to be processed when the data to be processed of the target software is detected, the method comprises the following steps:

step S01, acquiring a portable executive PE file of the target software;

in this embodiment, it is a specific process of obtaining data to be processed, that is, in order to improve the recognition efficiency, only the data to be processed of the target software may be recognized, instead of recognizing all the data, specifically, for example, only a code (data to be processed) of a shell region in the target software may be recognized, instead of recognizing all the codes of the software.

Specifically, a portable executable pe (portable executable) file of the target software is first acquired, and particularly, a portable executable pe (portable executable) file (to-be-processed data) of the shell area is acquired.

Step S02, determining the file volume of the PE file, and obtaining the comparison result of the file volume and the preset volume;

determining a file volume of the PE file, and obtaining a comparison result of the file volume and a preset volume, where the preset volume may be 2M (the reason for setting to 2M is that statistics is performed on the sizes of 27 ten thousand malicious code files, where a PE file ratio smaller than 1MB is 96.41%, and only 3.59% larger than 1MB is set to 2M, which can cover a certain number of features and avoid multiprocessing data), and where, for one PE file, the file volume of the PE file is determined, and a comparison result of the file volume and the preset volume is obtained, where the comparison result may be: greater than a predetermined volume, or less than a predetermined volume.

And step S03, preprocessing the PE file of the portable executive body based on the comparison result to obtain the data to be processed.

And preprocessing the PE file of the portable execution body based on the comparison result to obtain data to be processed, specifically, if the volume is less than 2MB, adding 0 byte to the back of the PE file to fill up to 2MB, and if the volume is more than 2MB, truncating the part of the PE file which is more than 2MB to obtain the data to be processed.

It should be noted that the input to the preset classification model may be a vector to be processed, and therefore, conversion processing needs to be performed on data to be processed, for example, a PE file sample S is converted into a one-dimensional vector, which is recorded as:

X_s=[x₁，x₂，…，x_i],x_i∈[0，255]

it should be noted that each PE file is composed of a large number of bytes, and each byte can be expressed as a decimal number from 0 to 255, so that each PE file S can be converted into a one-dimensional vector x₁，x₂....x_i]Wherein x is_iA value, x, representing the ith byte in the file_i∈[0,255]N represents the total of the fileByte number, the longest length of n is 2097152 (2 MB), and then the one-dimensional vector is used as an input vector of a multi-layer perceptron (preset classification model), and the class of the shell added by the PE file is output, wherein the multi-layer perceptron comprises: an input layer, a plurality of full connection layers, an output layer and the like, wherein the input layer comprises 2097152 neurons, and the activation function of the full connection layer adopts preset

Re L U function

Meanwhile, a preset Dropout function is used to avoid overfitting; the output layer uses a Softmax function to obtain the shelled classes, and the multi-layer perceptron finally uses multi-class cross entropy loss as a loss function.

in this embodiment, after obtaining data to be processed, a first target decision significant vector pointing to the target classification result mapped by the data to be processed is determined, and specifically, a preset gradient back propagation interpretation mechanism or a preset deep learning back propagation mechanism is used to propagate a prediction signal of a multilayer perceptron (a hull adding classifier) from an output layer neuron to an input layer by layer so as to derive a decision significant vector determining a hull adding category in a PE sample. The specific derivation process may be: determining a classification score of a target classification result based on a Softmax function used by an output layer, determining the influence degree or disturbance degree of each byte (vector) on the classification score of the target classification result, and further determining a first target decision significant vector pointing to the target classification result by mapping of the data to be processed.

In this embodiment, a second target decision significant vector of the result of the to-be-processed data, which is mapped to point to the malicious code, is also determined, specifically, a segment decision significant vector of each segment of the to-be-processed data is first determined, then the influence degree or the disturbance degree of each corresponding byte on the result of the to-be-processed data, which is mapped to point to the malicious code, is determined based on the segment decision significant vector, and then the second target decision significant vector of the result of the to-be-processed data is determined.

It should be noted that the root cause of the misstatement of the shelled normal software as malicious code is: the shelled decision vectors and the malicious code decision vectors have certain similarity in the probability distribution of the features of the shelled regions. In this embodiment, a mean square error baseline is determined by using the shelling decision vector of each piece of shelling software with a preset false alarm tag and the corresponding malicious code decision vector (since the mean square error baseline is obtained (or obtained by training) by using each piece of shelling software with a preset false alarm tag, whether false alarm exists can be accurately classified as a reference), and then whether false alarm exists is determined.

In this embodiment, specifically, the mean square error baseline for determining whether to make a false alarm may be preset, and for different decision systems for determining whether to make a false alarm for shell software, the preset mean square error baseline for determining whether to make a false alarm may be different, and the preset mean square error baseline is trained previously.

Wherein, the step of determining the mean square error baseline by the decision vector of each software with preset false alarm tag and the decision vector of the corresponding malicious code comprises:

In this embodiment, the mean square error baseline is obtained based on a preset coding model, the preset coding model is a target model meeting preset conditions and obtained after training a preset basic model based on a training set of a shelled software with a preset false positive label and including a shelled decision vector and a malicious code decision vector, wherein the preset coding model includes a coding layer (or a preset encoder) and a decoding layer, and integrally, the trained encoder is used to obtain the output of all malicious code decision vectors in the training set, and then the mean square error of the output and the corresponding shelled decision vector in the training set is calculated, in order to avoid noise influence, all mean square errors are sorted from small to large, the error value sorted at 99% position (preset position) can be set as the mean square error baseline of a false positive normal shelled software, the baseline is related to a soft-killing engine, the mean square error baselines of different softening engines are different.

Before the step of obtaining a mean square error baseline for determining whether a false positive exists, the method comprises:

step A1, acquiring a training set of the shelled software with a preset false alarm tag, wherein the training set comprises a shelled decision vector and a malicious code decision vector, and training a preset basic model to obtain a target model meeting preset conditions, wherein the preset conditions comprise preset loss function convergence;

step A2, setting the target model as the preset coding model.

It should be noted that, in this embodiment, the preset condition includes a loss function convergence or a preset number of times of training at this time, and the following description specifically describes the preset condition including the preset loss function convergence as an example.

Specifically, the preset coding model is a neural network model trained through unsupervised learning, and is used for inputting the vector X based on a coding layer and a decoding layer in the model

Representing coding layer functions by

Representing the function of the decoding layer, since the training goal of the pre-set coding model is to make it lose function

Minimum or convergence (preset condition), i.e.

And obtaining the trained preset coding model.

In this embodiment, since the encoder is trained, the malicious code decision vector M in the training set is obtained after the malicious code decision vector M in the training set of the shell software with the preset false alarm tag is obtained_sAs input to the coding layer, the output is

And the loss function is

The training objective using a loss function minimization or convergence, i.e.

After training, a preset coding model is obtained, and the mean square error baseline of the false alarm normal casing software (file) is determined in the preset coding model (the baseline is related to the soft killing engine, and the mean square error baselines of different soft killing engines are different). Inputting the PE file (or software) which is judged as the malicious code by any detection engine into a preset coding model, and acquiring a target malicious code decision vector M of the file by the preset coding model_s(first target decision significant vector), and target shelled decision vector P_sThe mean square error of the (second objective decision significant vector) is considered as false positive if the mean square error is lower than a preset mean square error baseline.

Further, referring to fig. 2, based on the first embodiment of the present application, in another embodiment of the present application, the step of determining a first objective decision significant vector of the target classification result to which the mapping of the to-be-processed data points includes:

step S21, obtaining the mapping value of the data to be processed to each classification result to obtain the maximum mapping value pointing to the target classification result;

in this embodiment, the classification result includes a false report or a non-false report, and obtaining a mapping value that the to-be-processed data is mapped to each classification result includes: and acquiring a mapping value of the to-be-processed data mapped to the false alarm, or acquiring a mapping value of the to-be-processed data mapped to the non-false alarm, wherein the mapping value of the to-be-processed data mapped to the false alarm can be 90%, and the mapping value of the to-be-processed data mapped to the non-false alarm can be 10%.

Note that the method can be based on the input X_sPrediction is one of class K shelling, and in particular, from the Softmax function used by the output layer, there is one function for each type of shell K ∈ K

Based on the

Will input X_sThe score of mapping to the category space, the classification result depends on which category (false alarm or non-false alarm) has the largest mapping value, and the maximum mapping value is obtained, namely:

step S22, determining a first disturbance strength of each byte vector in the data to be processed to the maximum mapping value;

in this embodiment, after obtaining the maximum mapping value, determining a first perturbation strength of each byte vector in the data to be processed to the maximum mapping value, specifically, deriving by presetting an activation function such as Re L U, Softmax, and the like

Is conductive to

Finding X_sThe derivatives of (c) are as follows:

represents X_sThe effect of per-byte perturbation on the class k mapping value (change in size, first perturbation strength).

Step S23, determining a first target decision significant vector pointing to the target classification result based on the first perturbation degree.

By means of

Solving for input X for class k_sCorresponding decision significance vector, wherein, in the process of obtaining the decision significance vector, a lot of noise is usually introduced, for this purpose, a method based on preset SmoothGrad is further used, and the method is implemented by inputting X_sAdding random noise for multiple times, and then solving the decision significant vectors by using a preset back propagation method or the back propagation method, wherein it should be noted that since a plurality of decision significant vectors may exist, averaging all the solved significant vectors to obtain an average result, and using the average result as a model for inputting the model

The first target decision significant vector, noted as:

P_s=[p₁，p₂，…，p_i],p_i∈[0，1]

wherein, P_SRepresents input X_sFirst objective decision significant vector, p_iRepresents the influence degree of the ith byte on the shell classification result, and has the range of 0,1]0 means no effect on the result and 1 means a definite effect on the result.

In this embodiment, a mapping value of the data to be processed mapped to each classification result is obtained to obtain a maximum mapping value pointing to the target classification result; determining a first disturbance strength of each byte vector in the data to be processed to the maximum mapping value; determining a first target decision significant vector pointing to the target classification result based on the first perturbation strength. In this embodiment, the first target decision significant vector pointing to the target classification result is accurately determined based on the first disturbance strength.

Further, based on the first embodiment of the present application, in another embodiment of the present application, the step of determining a second target decision significant vector of the result of the data to be processed, which is mapped to point to malicious code, includes:

step B1, partitioning the data to be processed to obtain partitioned data to be processed;

in this embodiment, the portable executable PE file is preprocessed based on the comparison result to obtain to-be-processed data, that is, the size of the to-be-processed data may be preprocessed to obtain bytes of size 2097152, where it is to be noted that a PE file (different from an image file) has a relatively fixed structural attribute, that is, a PE file generally includes a header, a node, an import/export table, a resource segment, additional data, and the like.

By using the structure of the PE file, the data to be processed is firstly decomposed into a plurality of segments or a plurality of areas according to a PE header, a block table, each block, data among blocks, resources, additional data and the like, and the data of the partition to be processed is obtained.

And step B2, determining a second disturbance strength of each to-be-processed partition data to the classification result so as to determine a second target decision significant vector of the result of the malicious code pointed by the mapping of the to-be-processed data.

Determining a second perturbation strength of each partition data to be processed on the classification result to determine a second target decision significant vector of the result of the data to be processed, wherein the mapping of the data to be processed points to the malicious code, and specifically, each partition data to be processed or each segment is represented as

Wherein k is not less than 0_s<k_e≤2097152，S_jIs originalThe subvectors of vector X then determine the effect of each segment (partition data to be processed) on the prediction (the significance of a feature determined by observing the change in the model prediction after it has been removed from the input) according to the following formula:

wherein X_s\S_jVector representing the feature of segment j removed, R_j(X_S) And representing the influence (second disturbance strength) after the j section of features are removed so as to obtain a second target decision significant vector.

The step of determining a second perturbation strength of each to-be-processed partitioned data to the classification result to determine a second target decision significant vector of the to-be-processed data mapping to the result of the malicious code includes:

step C1, determining a second disturbance strength of each to-be-processed partition data to the classification result, and determining a second target disturbance strength with the maximum disturbance strength to obtain target to-be-processed partition data corresponding to the second target disturbance strength;

and step C2, determining a third disturbance strength of each byte vector pair in the target partition data to be processed, which is mapped to the result pointing to the malicious code, and determining a second target decision significant vector pointing to the result of the malicious code.

Determining a second disturbance strength of each partition data to be processed on the classification result, and determining a second target disturbance strength with the maximum disturbance strength to obtain target partition data to be processed corresponding to the second target disturbance strength, that is, by the above-mentioned step B1, step B2, and the like, first determining target partition data to be processed with the maximum influence strength or disturbance strength, determining a third disturbance strength of each byte vector in the target partition data to be processed on the result mapped to point to the malicious code, determining a second target decision significant vector of the result to point to the malicious code, and specifically determining a third disturbance strength of each byte vector in the target partition data to be processed on the result mapped to point to the malicious code by the following formula:

wherein f (X)_s) Decision function, x, representing a model_iIs the first to be determined

Dimensional feature X_s\x_iIndicates to remove

Vector of dimensional features, R_i(X_s) Indicates to remove

And if the influence degree after the dimensional characteristic is removed and the result is different from the original input result, the influence degree after the dimensional characteristic is 1, and if the result is different from the original input result, the influence degree is 0. Finally, according to the influence of each dimension feature, generating a malicious code detection engine aiming at the input vector X_sThe decision vector predicted to be malicious code is noted as:

in this embodiment, the partition data to be processed is obtained by performing partition processing on the data to be processed; and determining a second disturbance strength of each to-be-processed partitioned data to the classification result so as to determine a second target decision significant vector of the to-be-processed data, which points to the result of the malicious code. In this embodiment, a second objective decision significant vector of the result of the data to be processed, which is mapped to point to malicious code, is accurately determined.

Referring to fig. 3, fig. 3 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present application.

As shown in fig. 3, the device for determining whether the shell adding software is false alarm may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.

Optionally, the device for determining whether the software add-on case is misinformed may further include a rectangular user interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. The rectangular user interface may comprise a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).

Those skilled in the art will appreciate that the configuration of the device for determining whether the software add-on is misinformed shown in fig. 3 does not constitute a limitation of the device for determining whether the software add-on is misinformed, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 3, a memory 1005 as a computer storage medium may include an operating system, a network communication module, and a program for determining whether the software for adding a shell is false. The operating system is a program for managing and controlling the judging device hardware and software resources whether the shell adding software has false alarm, and supports the judging program whether the shell adding software has false alarm and the running of other software and/or programs. The network communication module is used for realizing communication among components in the memory 1005 and communication with other hardware and software in the system for judging whether the software is misinformed.

In the device for determining whether the casing software is misinformed shown in fig. 3, the processor 1001 is configured to execute a program for determining whether the casing software stored in the memory 1005 is misinformed, and implement the steps of the method for determining whether the casing software is misinformed as described in any of the above.

The specific implementation of the device for determining whether the software is misinformed is basically the same as the embodiments of the method for determining whether the software is misinformed, and is not described herein again.

Alternatively,

the second obtaining module is configured to:

Optionally, the first determining module includes:

Optionally, the third determining unit includes:

Optionally, the first determining module includes:

The embodiment of the present application provides a storage medium, and the storage medium stores one or more programs, and the one or more programs are further executable by one or more processors for implementing the steps of the method for determining whether the shelled software is misinformed in any one of the above described items.

The specific implementation of the storage medium of the present application is substantially the same as the above-mentioned embodiments of the method for determining whether the software with a shell is misinformed, and is not described herein again.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. A method for judging whether shell adding software is misinformed is characterized in that the method for judging whether shell adding software is misinformed comprises the following steps:

2. The method for determining whether the software adding shell is false alarm according to claim 1,

3. The method for determining whether software adding a shell is misinformed according to claim 2, wherein the step of obtaining a mean square error baseline for determining whether the software adding a shell is misinformed is preceded by:

and setting the target model as the preset coding model.

4. The method for determining whether the software with a shell is misinformed according to claim 1, wherein the step of determining that the mapping of the data to be processed points to a first target decision significant vector of the target classification result includes:

5. The method for determining whether the software with shell is misinformed according to claim 1, wherein before the step of obtaining the target classification result of whether malicious code exists in the data to be processed when the data to be processed of the target software is detected, the method comprises:

acquiring a portable execution body PE file of target software;

6. The method for determining whether the shelled software false alarms according to claim 1, wherein the step of determining the second target decision significant vector of the mapping of the data to be processed pointing to the result of the malicious code includes:

7. The method for determining whether the shelled software false alarm exists according to claim 6, wherein the step of determining the second perturbation strength of each partition data to be processed on the classification result to determine the second target decision significant vector of the result of the data to be processed, which is mapped to point to malicious code, includes:

8. The method for determining whether the software with the shell is misinformed according to any one of claims 2-7, wherein the step of determining the result of determining whether the software with the shell is misinformed based on the mean square error baseline, the first objective decision significant vector, and the second objective decision significant vector comprises:

9. A judging device for judging whether the software added with the shell is misinformed is characterized by comprising the following steps:

10. A judging device for judging whether the software added with a shell is misinformed is characterized by comprising the following steps: a memory, a processor, and a program stored on the memory for implementing a method for determining whether the software add-on is misinformed,

the memory is used for storing a program for realizing the method for judging whether the shell adding software misreports;

the processor is configured to execute a program for implementing a method for determining whether the add-on software is misinformed, so as to implement the steps of the method for determining whether the add-on software is misinformed according to any one of claims 1 to 8.

11. A storage medium having stored thereon a program for implementing a method for determining whether shelled software is misinformed, the program being executed by a processor to implement the method for determining whether shelled software is misinformed according to any one of claims 1 to 8.