CN109726554B - Malicious program detection method and device - Google Patents

Malicious program detection method and device Download PDF

Info

Publication number
CN109726554B
CN109726554B CN201711037144.4A CN201711037144A CN109726554B CN 109726554 B CN109726554 B CN 109726554B CN 201711037144 A CN201711037144 A CN 201711037144A CN 109726554 B CN109726554 B CN 109726554B
Authority
CN
China
Prior art keywords
character string
program
random value
random
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711037144.4A
Other languages
Chinese (zh)
Other versions
CN109726554A (en
Inventor
高坤
邰靖宇
刘宇豪
潘宣辰
马志远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Antiy Information Technology Co ltd
Original Assignee
Wuhan Antiy Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Antiy Information Technology Co ltd filed Critical Wuhan Antiy Information Technology Co ltd
Priority to CN201711037144.4A priority Critical patent/CN109726554B/en
Publication of CN109726554A publication Critical patent/CN109726554A/en
Application granted granted Critical
Publication of CN109726554B publication Critical patent/CN109726554B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Stored Programmes (AREA)

Abstract

The embodiment of the invention provides a method, a device and related applications for detecting a malicious program, which are used for acquiring at least one character string of a preset position of a program to be detected; performing randomness calculation on the character string according to a predefined rule to generate a random value of the program to be detected; and when the random value of the program to be detected is greater than the preset threshold value, judging that the program to be detected is a malicious program. The technical scheme of the invention can accurately and effectively identify the malicious codes which adopt the random character strings to resist the security software, can solve the problem of extreme expansion of the feature library caused by the dependence on the feature library in the traditional searching and killing, has higher processing efficiency, does not depend on specific features, and can effectively detect the automatically generated malicious programs.

Description

Malicious program detection method and device
Technical Field
The invention belongs to the field of program detection, and particularly relates to a method and a device for detecting a malicious program.
Background
With the rapid development of the mobile internet in recent years, the platform security problem is increased day by day. Especially, the Android platform is the most prominent, and the black industry chain driven by huge benefits is hidden under the appearance of prosperous ecology circle. The whole ecology of Android grows more and more, the relevant black industrial chain grows more and more rampant, the virus on the Android platform is more and more, and the quantity grows almost exponentially.
Traditional malicious program searching and killing mainly depends on feature library patterns. The feature library is composed of feature codes of malicious program samples collected by manufacturers, wherein the feature codes can be understood as feature codes which are found from malicious programs and are distinguished from normal software. In the process of checking and killing, the engine reads the file and matches with all the feature codes in the feature library, and if the file program code is found to be hit, the file program can be judged to be a malicious program.
For example, the patent of Beijing Qihu technology GmbH, a virus APK identification method and device (application number: 201210076889.2, publication number: 102663286B) adopts opcode and class name function names as features, and hackers adopt a confusion method when killing is avoided, so that corresponding features become random character strings, and therefore a large number of random character strings replace original function names, and further expansion of a feature library is caused, the larger the volume of the feature library is, the lower the matching efficiency is, and finally, the failure of the traditional feature library is caused.
Disclosure of Invention
In view of the above problems, the present invention is proposed to provide a malicious program detection method and apparatus that overcome the above problems.
In a first aspect, an embodiment of the present invention provides a method for detecting a malicious program, including:
acquiring at least one character string of a preset position of a program to be detected;
performing randomness calculation on the character string according to a predefined rule to generate a random value of the program to be detected;
and when the random value of the program to be detected is greater than the preset threshold value, judging that the program to be detected is a malicious program.
Further, the method for presetting the threshold value comprises the following steps:
predefining two character string sets, including a non-random character string set and a malicious random character string set, and respectively performing randomness calculation on all character strings in the two character string sets according to a predefined rule, wherein the minimum random value in the non-random character string set is a first random value, and the maximum random value in the malicious random character string set is a second random value; when the first random value is less than the second random value, the threshold is: a second random value; or, an average of the first random value and the second random value.
Further, the method for presetting the threshold value further comprises the following steps:
predefining a normal random character string set, and calculating the randomness of all character strings in the character string set according to the predefined rule, wherein the maximum random value is a third random value, and when the third random value is larger than the first random value and smaller than the second random value, the threshold value is the third random value
Further, the method for acquiring at least one character string of the preset position of the program to be detected comprises the following steps: and extracting a character string from at least one of the package name, the signature, the program name, the version number, the file name and the file content of the program to be detected.
Further, the method for calculating the randomness of the character string according to the predefined rule includes: an N-Gram algorithm and an information entropy algorithm.
Further, the method for acquiring at least one character string at the preset position of the program to be detected comprises the following steps: the line feed character is used as an identifier, and a line character is a character string.
Further, the characters are at least one of English, numbers, symbols or a mixture thereof.
Further, when the program to be detected is judged to be a malicious program, adding at least one acquired character string at the preset position of the program to be detected into the malicious random character string set.
In a second aspect, an embodiment of the present invention provides an apparatus for detecting a malicious program, including:
the acquisition module is used for acquiring at least one character string at a preset position of the program to be detected;
the random value calculation module is used for carrying out randomness calculation on the character string according to a predefined rule to generate a random value of the program to be detected;
and the comparison and judgment module is used for judging the program to be detected as a malicious program when the random value of the program to be detected is greater than a preset threshold value.
In a third aspect, an embodiment of the present invention provides a device for detecting a malicious program, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring at least one character string of a preset position of a program to be detected;
performing randomness calculation on the character string according to a predefined rule to generate a random value of the program to be detected;
and when the random value of the program to be detected is greater than the preset threshold value, judging that the program to be detected is a malicious program.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer-readable storage medium, where instructions of the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method for detecting a malicious program as described above.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
the embodiment of the invention provides a method and a device for detecting a malicious program,
acquiring at least one character string at a preset position of a program to be detected, and performing randomness calculation on the character string according to a predefined rule to generate a random value of the program to be detected; and when the random value of the program to be detected is greater than the preset threshold value, judging that the program to be detected is a malicious program. The technical scheme of the invention can accurately and effectively identify the malicious codes which adopt the random character strings to resist the security software, can solve the problem of extreme expansion of the feature library caused by the dependence on the feature library in the traditional searching and killing, has higher processing efficiency, does not depend on specific features, and can effectively detect the automatically generated malicious programs.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flowchart of a method for detecting a malicious program according to an embodiment of the present invention;
FIG. 2A is a flow chart of threshold generation provided by an embodiment of the present invention;
FIG. 2B is a flow chart of another threshold generation provided by an embodiment of the present invention;
FIG. 3 is a diagram illustrating random values of character string sets according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a process of calculating randomness of a character string according to an embodiment of the present invention;
fig. 5 is a block diagram of a malicious program detection apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
According to practical experience, signatures, package names, etc. of normal applications need to be stable to maintain normal updates and iterations. And malicious codes tend to use random character strings, so that security manufacturers are prevented from searching and killing the codes by taking signatures or package names and the like as features. In addition, the signature and the package name of the normal application are often provided with personal information of the author or company, which accords with the statistical rules of the text.
According to the text statistical law (the frequency analysis theory of letters), the following results are obtained: in any one written language, different letters or combinations of letters appear with different frequencies. Moreover, any piece of text written in this language has approximately the same characteristic letter distribution. For example, in English, the letter E appears more frequently, while X appears less frequently. Similarly, ST, NG, TH, and QU combinations occur very frequently with very few NZ, QJ combinations.
Therefore, if the character string is non-random, the rule should be satisfied, and the random character string generally does not satisfy the rule. Therefore, the random value of the non-random string should not be high, while the random value of the random string would be high.
Based on the above theory, in the present application, the malicious character string is definitely a random character string; the non-malicious strings (i.e., normal random) may be random or non-random, and if random, the randomness is not higher than that of the malicious strings.
For a program for detecting randomness, only a random character string and a non-random character string are classified, in the description process of the technical scheme of the invention, the random character string comprises a normal random character string (such as Eye two a went bar) and a malicious random character string (such as gffvdugghjguysertftyfy), and the non-random character string can be understood as a normal semantic character string (such as I went to opacity).
The embodiment of the invention provides a method for detecting a malicious program, which comprises the following steps of S101 to S103:
s101, at least one character string of a preset position of a program to be detected is obtained.
In order to combat security manufacturers, malicious programs or applications are usually killed by means of shelling, obfuscation and the like, so that the previous universality and universality of the killing means which adopt fixed character strings such as class names, package names, method names, character strings and the like as feature codes are lost, and meanwhile, in order to cover typical features of malicious code authors such as signature package names and malicious files such as Android Package (APK) files, random character strings such as random package names and signatures are gradually used for avoiding the killing of security software, so that security manufacturers are extremely passive in the presence of a rapidly expanding feature library. Conventional developers keep the consistency of products stable at the level of signature, package name, code and the like. Therefore, it is preferable to extract the character string at random at a position such as a package name, a signature, a program name, a version number, a file name, or a file content of the program to be detected. To improve the accuracy of detection, the character string may be selected at several positions as much as possible.
For example, for an APK (which may be regarded as a zip-format file) after decompression, the method includes:
dex, in dex format;
arsc in arsc format;
xml in xml format;
4. other format files.
The content in the file can be acquired, and the information of the APK package can be acquired: version number, name, package name, signature, icon, etc.; the source code of the program or application may also be obtained through decompilation or other means, and the character string or the like may be obtained from the source code. The embodiment of the present disclosure does not limit what manner of obtaining the character string.
In this embodiment, the characters may be english, numbers, symbols, or a mixture thereof: such as:
"abcdefghijklmnopqrsttuvwxyz 012345679 ═ @ -" or "acegikmoqssuwy", or a character string ordered from small to large according to ASCII code, and the like, which is not limited in the embodiment of the present disclosure.
The line feed character is used as an identifier, and a line character is a character string. For a signature of an APP, it can be considered to comprise a string:
CN=JieLv,OU=HangzhouFeiniu Science&Technology Co.Ltd.,O=HangzhouFeiniu Science&TechnologyCo.Ltd.,L=HangZhou,ST=ZheJiang,C=86
for an article, which has a length of N lines, N strings are considered.
S102, performing randomness calculation on the character string according to a predefined rule to generate a random value of the program to be detected.
The method for performing randomness calculation on the character string includes an N-Gram algorithm, an information entropy algorithm, and the like, which is not limited in this embodiment.
S103, when the random value of the program to be detected is larger than a preset threshold value, judging that the program to be detected is a malicious program.
There are various methods for presetting the threshold, such as:
the first method comprises the following steps:
and S1031, predefining two character string sets according to the plurality of character strings with known attributes, wherein the two character string sets comprise a non-random character string set and a malicious random character string set.
And S1032, respectively performing randomness calculation on all the character strings in the two character string sets according to the same rule as the rule of S102, wherein the minimum random value in the non-random character string set is a first random value, and the maximum random value in the malicious random character string set is a second random value. The threshold may be: the second random value, the method has high accuracy; or the average value of the first random value and the second random value, the method has high efficiency and can cover most of the random value conditions of the character strings. The method for calculating the average value includes a simple arithmetic mean method, a weighted arithmetic mean method, a moving average method, or an exponential smooth average method, etc., and the method is not limited in the embodiments of the present disclosure.
With reference to fig. 3, it can be understood that if the first random value is greater than the second random value, it indicates that the set of non-random strings and the set of random strings as the training set are not representative and are not available.
And the second method comprises the following steps:
and S1031', predefining three character string sets according to the plurality of character strings with known attributes, wherein the three character string sets comprise a non-random character string set, a malicious random character string set and a normal random character string set.
S1032' performing stochastic calculation on all the character strings in the three character string sets according to the predefined rule, where a minimum random value in the non-random character string set is a first random value, a maximum random value in the malicious random character string set is a second random value, a maximum random value in the normal random character string set is a third random value, and the third random value is a threshold value.
Of course, in order to improve the detection accuracy, it is preferable to combine other detection means when applying the above method.
Referring to fig. 3, it can be understood that the method can be used only when the third random value is greater than the first random value and less than the second random value, otherwise, the training set is not representative and is not suitable for use.
The established character string set has a good learning function and high processing efficiency, the technical scheme of the invention does not adopt a feature library, and the purpose of effectively identifying malicious codes of the anti-security software adopting the random character strings is achieved according to the comparison of the random value of the program to be detected and the preset threshold value, the problem of extreme expansion of the feature library caused by traditional checking and killing is solved, and the method has high processing efficiency, does not depend on specific features, and can effectively detect the automatically generated malicious programs. According to the detection result, the user can be reminded to check, kill or uninstall, and the application program can be directly isolated.
The technical scheme of the invention can detect the software installed on a computer and also can detect the application installed on a mobile phone, and the programs mentioned in the embodiment of the disclosure include but are not limited to the software and the application installed on various terminals.
In one embodiment, the randomness calculation is performed by using an N-Gram algorithm for the character strings in step S101 of fig. 1 and in fig. 2A and 2B, and as shown in fig. 4, the method includes the following steps:
s201, performing word segmentation on the character string to obtain all N character word segments corresponding to the character string; n is a positive integer;
s202, matching the occurrence frequency of all N character participles of a character string in a preset characteristic array to obtain a frequency array corresponding to the character string, wherein the frequency array comprises the frequency corresponding to all the N character participles of the character string respectively;
s203, calculating the average value of the frequency array, and using the average value of the frequency array to calculate an index for a constant e to obtain a random value corresponding to the character string;
the feature array preset in step S202 may be generated as follows:
performing pattern matching calculation on a preset ordered character string to generate a feature array; the feature array contains the frequency of occurrence of the N characters in the ordered string.
When an N-Gram algorithm is adopted, for example, 2-Gram calculation is used to obtain the occurrence frequency of two adjacent characters, and the frequency is collected to generate a feature array; the 3-Gram can also be used for pattern matching, theoretically, as long as a character string is long enough, the larger N is, the better N is, the more information is considered, but data sparseness is easy to generate, the law of large numbers is not satisfied, and the calculated probability is distorted. On the other hand, if N is large, the parameter space is too large, dimension disaster is generated, and the method cannot be used practically. Assuming that the size of the character string is 100,000, the number of parameters of the N-Gram model is 100,000N. With such many parameters, the memory required for calculation is not sufficient. In the specific implementation, the problem can be solved by using 2-Gram, 3-Gram is not used generally, and the condition that N is more than or equal to 4 is less. The numerical value of N is not limited in the embodiments of the present disclosure.
For example, taking the characters contained in "abcdefghijklmnopqrstuvwxyz 012345679 ═ @ -" as the statistical reference, or taking N-Gram as an example, we count the number of occurrences of 2 adjacent characters (i.e. meaning of 2 in "2-Gram participle") in a large number of normal semantic articles, such as: aa. ab, ac …, ba, bb, bc …, etc., the frequency of occurrence of these characters, referring to table one below, the value of a in the first row and a in the first column is 31, meaning that "aa" occurs 31 times in statistics, the value of b in the second row and c in the third column is 168, meaning that "bc" occurs 168 times in statistics … as described above, and then the set of frequencies is recorded and generated to generate the feature array; thus we basically get the probability that two adjacent characters should appear in the case of normal semantics. Wherein the calculation is similar to that of 2-Gram by using 3-Gram participles. And referring to the word segmentation and word frequency result shown in the table I.
Watch 1
a b c …… @ -
a 31 7910 16166 …… 10 336 26888
b 5708 429 168 …… 10 55 642
c 17916 10 3090 …… 10 40 3023
…… …… …… …… …… …… …… ……
@ 10 10 10 …… 10 10 10
- 1058 468 605 …… 10 6049 58
119739 45051 41880 …… 10 55 34700
For another example: in the sentence "this is a dog", th "appears 1 time," hi appears 1 time, "is" appears 2 times, "_ i" appears 1 time, "s _" appears 1 time, "_ a" appears 1 time, "a _" appears 1 time, "_ d" appears 1 time, "do" appears 1 time, "og" appears 1 time, the above "______ (underlined) represents a space, which is also treated as a character in the word frequency; the frequency of occurrence of the above-mentioned double-character participle can be matched with the corresponding numerical value in the above-mentioned table one, the obtained numerical value is used for calculating the average value thereof, and then the constant e is indexed by using the average value, so that the random value corresponding to the character string of "this is a dog" can be obtained. Referring to formula one:
ex=N
(where: the constant e is about 2.71828, N represents the average, and x is an exponential, i.e., random value)
Formula one
In this embodiment, in order to obtain a clear observation output result for a small frequency change, the constant e may be exponential calculated by using a frequency average, or other manners may be adopted, for example, the frequency average is directly used as a random value, so that values corresponding to the random value are relatively large, and the idea is completely the same as that of the present scheme. The embodiments of the present disclosure do not limit this.
In one embodiment, when the program to be detected is judged to be a malicious program, at least one acquired character string of the preset position of the program to be detected is added into a malicious random character string set. Therefore, the malicious sample library can be expanded to improve the accuracy of subsequent malicious program detection.
Based on the same inventive concept, the embodiment of the present invention further provides a device for detecting a malicious program, and as the principle of the problem solved by the device is similar to that of the method for detecting a malicious program in the foregoing embodiment, reference may be made to the implementation of the foregoing method for the implementation of the device, and repeated details are not repeated.
The following is a device for detecting a malicious program according to an embodiment of the present invention, which can be used to execute the embodiment of the method for detecting a malicious program.
Referring to fig. 5, the apparatus includes:
an obtaining module 41, configured to obtain at least one character string of a preset position of a program to be detected;
a random value calculation module 42, configured to perform randomness calculation on the character string according to a predefined rule, so as to generate a random value of the program to be detected;
and the comparison and judgment module 43 is configured to judge that the program to be detected is a malicious program when the random value of the program to be detected is greater than the preset threshold.
In one embodiment, as shown in fig. 4, the method further includes: and a threshold calculation module 44, configured to predefine two character string sets, including a non-random character string set and a malicious random character string set, and perform randomness calculation on all character strings in the two character string sets according to predefined rules, where a minimum random value in the non-random character string set is a first random value, and a maximum random value in the malicious random character string set is a second random value. The threshold value may be set to the second random value or an average of the first random value and the second random value.
In one embodiment, referring to fig. 4, the threshold calculation module 44 is further configured to predefine a normal random string set, and perform randomness calculation on all strings in the string set according to the predefined rule, where the maximum random value is a third random value, that is, the third random value.
In one embodiment, the method for acquiring at least one character string of the preset position of the program to be detected by the acquisition module 41 includes: and extracting a character string from at least one of the package name, the signature, the program name, the version number, the file name and the file content of the program to be detected. The characters are at least one of English, numbers and symbols. The line feed character is used as an identifier, and a line character is a character string.
In one embodiment, the randomness calculation module 42 or/and the threshold calculation module 44 performs randomness calculations on the string according to an N-Gram algorithm, an information entropy algorithm, or the like.
In one embodiment, the random value calculation module 42 or the threshold calculation module 44 is configured to perform randomness calculation on the character string by: segmenting the character string to obtain all N character segments corresponding to the character string; n is a positive integer; matching the occurrence frequency of all N character participles of the character string in a preset characteristic array to obtain a frequency array corresponding to the character string, wherein the frequency array comprises the frequency corresponding to all the N character participles of the character string; calculating the average value of the frequency array, and using the average value of the frequency array to calculate an index for a constant e to obtain a random value corresponding to the character string;
the preset feature array is generated in the following way:
performing pattern matching calculation on a preset ordered character string to generate a feature array; the feature array contains the frequency of occurrence of N characters in the ordered string.
In one embodiment, when the comparison and judgment module 43 judges that the program to be detected is a malicious program, at least one character string of the preset position of the program to be detected acquired by the acquisition module 41 is added to the malicious random character string set of the threshold calculation module 44.
According to a third aspect of the embodiments of the present disclosure, an embodiment of the present disclosure provides a device for detecting a malicious program, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring at least one character string of a preset position of a program to be detected;
performing randomness calculation on the character string according to a predefined rule to generate a random value of the program to be detected;
and when the random value of the program to be detected is greater than the preset threshold value, judging that the program to be detected is a malicious program.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer-readable storage medium, where instructions of the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method for detecting a malicious program as described above.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. A method for detecting a malicious program, comprising:
acquiring at least one character string of a preset position of a program to be detected;
performing randomness calculation on the character string according to a predefined rule to generate a random value of the program to be detected; the method for calculating the randomness of the character string according to the predefined rule comprises the following steps: an N-Gram algorithm; wherein the N-Gram algorithm comprises the following steps:
segmenting the character string to obtain all N character segments corresponding to the character string; n is a positive integer;
matching the occurrence frequency of all N character participles of the character string in a preset characteristic array to obtain a frequency array corresponding to the character string, wherein the frequency array comprises the frequency corresponding to all the N character participles of the character string;
calculating the average value of the frequency array, and using the average value of the frequency array to calculate an index for a constant e to obtain a random value corresponding to the character string;
when the random value of the program to be detected is larger than a preset threshold value, judging the program to be detected as a malicious program;
the method for presetting the threshold value comprises the following steps: predefining two character string sets, including a non-random character string set and a malicious random character string set, and respectively performing randomness calculation on all character strings in the two character string sets according to a predefined rule, wherein the minimum random value in the non-random character string set is a first random value, and the maximum random value in the malicious random character string set is a second random value; when the first random value is less than the second random value, the threshold is: a second random value; or, an average of the first random value and the second random value; the method for presetting the threshold value further comprises the following steps: predefining a normal random string set, and performing randomness calculation on all the strings in the string set according to the predefined rule, wherein the maximum random value is a third random value, and when the third random value is larger than the first random value and smaller than the second random value, the threshold value is the third random value;
and when the program to be detected is judged to be a malicious program, adding at least one acquired character string at the preset position of the program to be detected into a malicious random character string set.
2. The method as claimed in claim 1, wherein the method of obtaining at least one string of preset positions of the program to be detected comprises: and extracting a character string from at least one of the package name, the signature, the program name, the version number, the file name and the file content of the program to be detected.
3. The method of claim 1, wherein the method of obtaining at least one string of characters at a preset position of a program to be detected comprises: the line feed character is used as an identifier, and a line character is a character string.
4. The method of claim 1, wherein the characters are at least one of english, numerals, symbols, or a mixture thereof.
5. An apparatus for detecting a malicious program, comprising: the acquisition module is used for acquiring at least one character string at a preset position of the program to be detected;
the random value calculation module is used for carrying out randomness calculation on the character string according to a predefined rule to generate a random value of the program to be detected; the method for calculating the randomness of the character string by the random value calculation module and the threshold value calculation module according to the predefined rule comprises the following steps: an N-Gram algorithm; wherein the N-Gram algorithm comprises the following steps:
segmenting the character string to obtain all N character segments corresponding to the character string; n is a positive integer;
matching the occurrence frequency of all N character participles of the character string in a preset characteristic array to obtain a frequency array corresponding to the character string, wherein the frequency array comprises the frequency corresponding to all the N character participles of the character string;
calculating the average value of the frequency array, and using the average value of the frequency array to calculate an index for a constant e to obtain a random value corresponding to the character string;
the comparison and judgment module is used for judging the program to be detected as a malicious program when the random value of the program to be detected is greater than a preset threshold value;
the device also comprises a threshold value calculation module, a judgment module and a judgment module, wherein the threshold value calculation module is used for predefining two character string sets, including a non-random character string set and a malicious random character string set, and respectively carrying out randomness calculation on all character strings in the two character string sets according to a predefined rule, wherein the minimum random value in the non-random character string set is a first random value, and the maximum random value in the malicious random character string set is a second random value; when the first random value is less than the second random value, the threshold is: a second random value; or, an average of the first random value and the second random value; the threshold value calculation module is further configured to predefine a normal random string set, and perform randomness calculation on all strings in the string set according to the predefined rule, where a maximum random value is a third random value; the threshold value is as follows: a third random value;
and when the comparison and judgment module judges that the program to be detected is a malicious program, adding at least one character string of the preset position of the program to be detected, which is acquired by the acquisition module, into the malicious random character string set of the threshold calculation module.
6. The apparatus of claim 5, wherein the method for acquiring at least one character string of the preset position of the program to be detected by the acquisition module comprises: and extracting a character string from at least one of the package name, the signature, the program name, the version number, the file name and the file content of the program to be detected.
7. The apparatus of claim 5, wherein the method for acquiring at least one character string of the preset position of the program to be detected by the acquisition module comprises: the line feed character is used as an identifier, and a line character is a character string.
8. The apparatus of claim 5, wherein the characters are at least one of English, numeric, and symbolic.
9. A non-transitory computer-readable storage medium, wherein instructions, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the method of detecting a malicious program according to claim 1.
CN201711037144.4A 2017-10-30 2017-10-30 Malicious program detection method and device Active CN109726554B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711037144.4A CN109726554B (en) 2017-10-30 2017-10-30 Malicious program detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711037144.4A CN109726554B (en) 2017-10-30 2017-10-30 Malicious program detection method and device

Publications (2)

Publication Number Publication Date
CN109726554A CN109726554A (en) 2019-05-07
CN109726554B true CN109726554B (en) 2021-05-18

Family

ID=66291896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711037144.4A Active CN109726554B (en) 2017-10-30 2017-10-30 Malicious program detection method and device

Country Status (1)

Country Link
CN (1) CN109726554B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084489A (en) * 2020-09-11 2020-12-15 北京天融信网络安全技术有限公司 Suspicious application detection method and device
CN112860958B (en) * 2021-01-15 2024-01-26 北京百家科技集团有限公司 Information display method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810998A (en) * 2013-12-05 2014-05-21 中国农业大学 Method for off-line speech recognition based on mobile terminal device and achieving method
CN104731775A (en) * 2015-02-26 2015-06-24 北京捷通华声语音技术有限公司 Method and device for converting spoken languages to written languages

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8069484B2 (en) * 2007-01-25 2011-11-29 Mandiant Corporation System and method for determining data entropy to identify malware
US8713681B2 (en) * 2009-10-27 2014-04-29 Mandiant, Llc System and method for detecting executable machine instructions in a data stream
US8918836B2 (en) * 2012-04-23 2014-12-23 Microsoft Corporation Predicting next characters in password generation
US9589129B2 (en) * 2012-06-05 2017-03-07 Lookout, Inc. Determining source of side-loaded software
CN102779249B (en) * 2012-06-28 2015-07-29 北京奇虎科技有限公司 Malware detection methods and scanning engine
CN104376260B (en) * 2014-11-20 2017-06-30 东华大学 A kind of malicious code visual analysis method based on shannon entropy
CN105975857A (en) * 2015-11-17 2016-09-28 武汉安天信息技术有限责任公司 Method and system for deducing malicious code rules based on in-depth learning method
CN105809034A (en) * 2016-03-07 2016-07-27 成都驭奔科技有限公司 Malicious software identification method
CN106599686B (en) * 2016-10-12 2019-06-21 四川大学 A kind of Malware clustering method based on TLSH character representation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810998A (en) * 2013-12-05 2014-05-21 中国农业大学 Method for off-line speech recognition based on mobile terminal device and achieving method
CN104731775A (en) * 2015-02-26 2015-06-24 北京捷通华声语音技术有限公司 Method and device for converting spoken languages to written languages

Also Published As

Publication number Publication date
CN109726554A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
CN107807987B (en) Character string classification method and system and character string classification equipment
Sun et al. SigPID: significant permission identification for android malware detection
CN107992741B (en) Model training method, URL detection method and device
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
CN108959924A (en) A kind of Android malicious code detecting method of word-based vector sum deep neural network
KR101858620B1 (en) Device and method for analyzing javascript using machine learning
KR101874373B1 (en) A method and apparatus for detecting malicious scripts of obfuscated scripts
US11775749B1 (en) Content masking attacks against information-based services and defenses thereto
CN102446255A (en) Method and device for detecting page tamper
Canfora et al. Metamorphic malware detection using code metrics
Markwood et al. Mirage: Content Masking Attack Against {Information-Based} Online Services
KR20210099886A (en) Apparatus and method for synchronization in docsis upstream system
CN106709336A (en) Method and apparatus for identifying malware
CN109726554B (en) Malicious program detection method and device
Miura et al. Macros finder: Do you remember loveletter?
CN111460803B (en) Equipment identification method based on Web management page of industrial Internet of things equipment
CN109408810A (en) A kind of malice PDF document detection method and device
Matyukhina et al. Adversarial authorship attribution in open-source projects
CN112817877A (en) Abnormal script detection method and device, computer equipment and storage medium
Prasetio et al. Cross-site Scripting Attack Detection Using Machine Learning with Hybrid Features
CN110263540B (en) Code identification method and device
Ugarte-Pedrero et al. On the adoption of anomaly detection for packed executable filtering
Luh et al. SEQUIN: a grammar inference framework for analyzing malicious system behavior
CN108875374B (en) Malicious PDF detection method and device based on document node type
Bakhshinejad et al. A new compression based method for android malware detection using opcodes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant