CN116992449B - Method and device for determining similar sample files, electronic equipment and storage medium - Google Patents

Method and device for determining similar sample files, electronic equipment and storage medium Download PDF

Info

Publication number
CN116992449B
CN116992449B CN202311254331.3A CN202311254331A CN116992449B CN 116992449 B CN116992449 B CN 116992449B CN 202311254331 A CN202311254331 A CN 202311254331A CN 116992449 B CN116992449 B CN 116992449B
Authority
CN
China
Prior art keywords
target
file
sample
name
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311254331.3A
Other languages
Chinese (zh)
Other versions
CN116992449A (en
Inventor
吕经祥
李石磊
肖新光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Antiy Network Technology Co Ltd
Original Assignee
Beijing Antiy Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Antiy Network Technology Co Ltd filed Critical Beijing Antiy Network Technology Co Ltd
Priority to CN202311254331.3A priority Critical patent/CN116992449B/en
Publication of CN116992449A publication Critical patent/CN116992449A/en
Application granted granted Critical
Publication of CN116992449B publication Critical patent/CN116992449B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for determining similar sample files, electronic equipment and a storage medium, and relates to the field of data processing, wherein the method comprises the following steps: in response to receiving a target malicious file, acquiring a plurality of target sample files; acquiring a first name string list F and a second name string list set G; determining a name matching degree list set E; determining a sample matching degree list H; and determining at least one target similar sample file corresponding to the target malicious file from the b target sample files. According to the invention, the name character strings of the target malicious files and the target sample files are extracted through each target data source, so that the corresponding sample matching degree is determined, and the efficiency of determining the matching degree of the target similar sample files can be improved while the system occupation calculation force is reduced.

Description

Method and device for determining similar sample files, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a method and apparatus for determining a similar sample file, an electronic device, and a storage medium.
Background
The current malicious file detection rule is determined according to the file characteristics of the malicious files of the same type, the malicious files of the same type are obtained by acquiring the file characteristics of each historical sample file for statistics, and because the number of the file characteristics of the historical sample files is large, the system resources occupied during acquisition and statistics are also large, so that the current method for determining the similar sample files can greatly increase the using calculation force of a system when the number of the historical sample files is large.
Disclosure of Invention
In view of the above, the present invention provides a method and apparatus for determining a similar sample file, an electronic device and a storage medium, which at least partially solve the technical problem of excessive system power consumption in the prior art, and the technical scheme adopted by the present invention is as follows:
according to one aspect of the present application, there is provided a method of determining a similar sample file, the method comprising the steps of:
in response to receiving a target malicious file, acquiring a plurality of target sample files;
acquiring name strings set by each target data source for target malicious files to obtain a first name string list F= (F) 1 ,F 2 ,...,F j ,...,F m ) The method comprises the steps of carrying out a first treatment on the surface of the Where j=1, 2, m; m is the number of target data sources; f (F) j Setting a name character string for the j-th target data source to the target malicious file;
acquiring name strings set by each target data source for each target sample file to obtain a second name string list set G= (G) 1 ,G 2 ,...,G a ,...,G b );G a =(G a1 ,G a2 ,...,G aj ,...,G am ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein a=1, 2, b; b is the number of target sample files; g a A name character string list corresponding to the a-th target sample file; g aj A name string set for the jth target data source for the jth target sample file;
determining a name matching degree list set E= (E) according to the first name string list F and the second name string list set G 1 ,E 2 ,...,E a ,...,E b );E a =(E a1 ,E a2 ,...,E aj ,...,E am ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein E is a A name matching degree list corresponding to the a-th target sample file and the target malicious file; e (E) aj Is G aj And F is equal to j The degree of name matching between the two;
according to the name matching degree list set E, determining a sample matching degree list H= (H) 1 ,H 2 ,...,H a ,...,H b ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein H is a According to E a The sample matching degree between the a-th target sample file and the target malicious file is obtained;
and determining at least one target similar sample file corresponding to the target malicious file from the b target sample files according to the sample matching degree list H.
In an exemplary embodiment of the present application, E aj Is determined by the following steps:
according to the preset character corresponding to the jth target data source, F j Splitting character strings to obtain F j Corresponding i first candidate character strings;
according to the preset character corresponding to the jth target data source, G aj Splitting character strings to obtain G aj Corresponding i second candidate character strings;
according to F j Corresponding i first candidate character strings and G aj Corresponding i second candidate character strings, determining E aj
In an exemplary embodiment of the present application, according to F j Corresponding i first candidate character strings and G aj Corresponding i second candidate character strings, determining E aj Comprising:
according to the preset character string sequence, F j The corresponding i first candidate character strings are sequenced to obtain a first candidate character string list F j1 ,F j2 ,...,F jz ,...,F ji The method comprises the steps of carrying out a first treatment on the surface of the Wherein z=1, 2, i; f (F) jz For F obtained after sequencing j A corresponding z-th first candidate string;
according to the preset character string sequence, for G aj The corresponding i second candidate character strings are sequenced to obtain a second candidate character string list G aj1 ,G aj2 ,...,G ajz ,...,G aji The method comprises the steps of carrying out a first treatment on the surface of the Wherein G is ajz For G obtained after sequencing aj A corresponding z-th second candidate string;
if G ajz And F is equal to jz Identical, then 1 is determined as G ajz And F is equal to jz Degree of string matching J between ajz The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, 0 is determined as G ajz And F is equal to jz Degree of string matching J between ajz
Determination of
In an exemplary embodiment of the present application, H a Is determined by the following formula:
in an exemplary embodiment of the present application, determining at least one target similar sample file corresponding to the target malicious file from the b target sample files includes:
traversing sample matching degree list H, if H a ≥H 0 Determining the a-th target sample file as a target similar sample file corresponding to the target malicious file; wherein H is 0 And presetting a sample matching degree threshold value.
In one exemplary embodiment of the present application, obtaining a number of target sample files includes:
acquiring file information of a target malicious file;
and determining a plurality of target sample files from the plurality of history sample files according to the file information.
In one exemplary embodiment of the present application, determining a plurality of target sample files from a plurality of history sample files includes:
traversing each history sample file, and determining the history sample file as a target sample file if the file information of the history sample file is the same as the file information of the target malicious file.
According to an aspect of the present application, there is provided a similar sample file determining apparatus including:
the sample file acquisition module is used for acquiring a plurality of target sample files when receiving the target malicious files;
a first name string obtaining module, configured to obtain name strings set by each target data source for the target malicious file, so as to obtain a first name string list f= (F) 1 ,F 2 ,...,F j ,...,F m ) The method comprises the steps of carrying out a first treatment on the surface of the Where j=1, 2, m; m is the number of target data sources; f (F) j Setting a name character string for the j-th target data source to the target malicious file;
a second name string acquisition module for acquiring name strings set for each target sample file by each target data source toObtaining a second name string list set G= (G) 1 ,G 2 ,...,G a ,...,G b );G a =(G a1 ,G a2 ,...,G aj ,...,G am ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein a=1, 2, b; b is the number of target sample files; j=1, 2, m; m is the number of target data sources; g a A name character string list corresponding to the a-th target sample file; g aj A name string set for the jth target data source for the jth target sample file;
a name matching degree determining module for determining a name matching degree list set e= (E) according to the first name string list F and the second name string list set G 1 ,E 2 ,...,E a ,...,E b );E a =(E a1 ,E a2 ,...,E aj ,...,E am ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein E is a A name matching degree list corresponding to the a-th target sample file and the target malicious file; e (E) aj Is G aj And F is equal to j The degree of name matching between the two;
the sample matching degree determining module is configured to determine a sample matching degree list h= (H) according to the name matching degree list set E 1 ,H 2 ,...,H a ,...,H b ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein H is a According to E a The sample matching degree between the a-th target sample file and the target malicious file is obtained;
and the similar sample file determining module is used for determining at least one target similar sample file corresponding to the target malicious file from the b target sample files according to the sample matching degree list H.
According to one aspect of the present application, there is provided a non-transitory computer readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the aforementioned similar sample file determination method.
According to one aspect of the present application, there is provided an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.
The invention has at least the following beneficial effects:
according to the method, a plurality of target sample files are obtained according to the received target malicious files, the name matching degree of the target malicious files corresponding to each target sample file is determined according to name character strings set by each target data source for the target malicious files and name character strings set by each target data source for each target sample file, the sample matching degree of the target malicious files corresponding to each target sample file is determined according to the plurality of name matching degrees, and the target similar sample files corresponding to the target malicious files are determined from the plurality of target sample files according to the plurality of sample matching degrees. The name character strings of the target malicious files and the target sample files are extracted by each target data source, so that the corresponding sample matching degree is determined, and the efficiency of determining the matching degree of the target similar sample files can be improved while the system occupation calculation force is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for determining a similar sample file according to an embodiment of the present invention;
fig. 2 is a block diagram of a similar sample file determining apparatus according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
A method for determining a similar sample file, as shown in fig. 1, the method comprising the steps of:
step S100, a plurality of target sample files are obtained in response to receiving the target malicious files;
the target malicious files are malicious files for searching similar sample files, and a plurality of target sample files are determined from a plurality of history sample files according to the received target malicious files.
The target sample file can be any history sample file, and can also be a history sample file which is set according to the requirement or meets the preset condition.
Further, in step S100, a plurality of target sample files are acquired, including:
step S110, acquiring file information of a target malicious file;
the file information of the target malicious file is the file format, the file type, the coding mode and the like of the target malicious file.
Step S120, determining a plurality of target sample files from a plurality of history sample files according to the file information.
The history sample files are sample files which pass detection, wherein the sample files comprise malicious sample files and non-malicious sample files, and a plurality of target sample files are determined from a plurality of history sample files by comparing file information of the history sample files with file information of target malicious files.
In step S120, a plurality of target sample files are determined from a plurality of history sample files, including:
step S121, traversing each history sample file, and if the file information of the history sample file is the same as the file information of the target malicious file, determining the history sample file as the target sample file.
And determining the historical sample file which is the same as the file information of the target malicious file as a target sample file, and primarily screening the huge historical sample file through the file information to determine the target sample file.
Further, a second embodiment of the method for determining a target sample file includes:
step S122, according to the preset characters corresponding to each target data source, splitting the character strings of the names corresponding to the target malicious files to obtain the number of target candidate character strings corresponding to each target data source;
the method comprises the steps that target data sources, namely suppliers of history sample files, are provided with a detection rule corresponding to each target data source, the target data sources perform malicious detection on files to be detected through the corresponding detection rule, each target data source is provided with a plurality of preset characters, the preset characters are represented as segmentation characters in corresponding name character strings, the name character strings are character strings of virus names of viruses in the corresponding target malicious files, and the name character strings comprise attack type character strings, virus family character strings, application platform character strings, virus variant character strings and the like of the viruses; because the extraction methods of the name strings of each target data source are different, the information sequences in the name strings of the same file extracted by different target data sources are possibly different, so that the name strings corresponding to the target malicious files are split through preset characters corresponding to the target data sources to obtain a plurality of target candidate strings corresponding to each target data source, wherein the target candidate strings are attack type strings, virus family strings, application platform strings, virus variant strings and the like.
Step S123, splitting the character strings of the names corresponding to each history sample file to obtain the number of the history sample strings corresponding to each history sample file;
because the number of the history sample files is large, if the name character strings corresponding to each history sample file are split through each target data source, the data processing amount is greatly increased, so that the name character strings corresponding to the history sample files are split only through the existing public splitting rule to obtain the number of the history sample character strings corresponding to each history sample file, the data processing amount is reduced, and meanwhile, the huge number of history sample files can be screened preliminarily.
Step S124, if the number of the history sample strings is greater than or equal to the minimum value of the number of the target candidate strings and less than or equal to the maximum value of the number of the target candidate strings, determining the history sample file corresponding to the number of the history sample strings as the target sample file.
If the number of the history sample strings is in the range of the minimum value of the number of the target candidate strings and the maximum value of the number of the target candidate strings, the corresponding history sample file is indicated to meet the standard of the number of the target candidate strings, and the history sample file is determined to be the target sample file.
Step S200, obtaining name strings set by each target data source for the target malicious file to obtain a first name string list f= (F) 1 ,F 2 ,...,F j ,...,F m ) The method comprises the steps of carrying out a first treatment on the surface of the Where j=1, 2, m; m is the number of target data sources; f (F) j Setting a name character string for the j-th target data source to the target malicious file;
step S300, obtaining name strings set by each target data source for each target sample file to obtain a second name string list set G= (G) 1 ,G 2 ,...,G a ,...,G b );G a =(G a1 ,G a2 ,...,G aj ,...,G am ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein a=1, 2, b; b is the number of target sample files; g a A name character string list corresponding to the a-th target sample file; g aj A name string set for the jth target data source for the jth target sample file;
step S400, determining a name matching degree list set E= (E) according to the first name string list F and the second name string list set G 1 ,E 2 ,...,E a ,...,E b );E a =(E a1 ,E a2 ,...,E aj ,...,E am ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein E is a A name matching degree list corresponding to the a-th target sample file and the target malicious file; e (E) aj Is G aj And F is equal to j The degree of name matching between the two;
wherein E is aj Is determined by the following steps:
step S410, according to jPreset character corresponding to target data source, F j Splitting character strings to obtain F j Corresponding i first candidate character strings;
the first candidate character strings are a plurality of character strings obtained by splitting the name character strings of the target malicious files according to preset characters.
Step S420, according to the preset character corresponding to the jth target data source, for G aj Splitting character strings to obtain G aj Corresponding i second candidate character strings;
the second candidate character strings are a plurality of character strings obtained by splitting the name character strings of the target sample file according to preset characters.
Step S430 according to F j Corresponding i first candidate character strings and G aj Corresponding i second candidate character strings, determining E aj
Further, in step S430, according to F j Corresponding i first candidate character strings and G aj Corresponding i second candidate character strings, determining E aj Comprising:
step S431, according to the preset character string sequence, F is compared with j The corresponding i first candidate character strings are sequenced to obtain a first candidate character string list F j1 ,F j2 ,...,F jz ,...,F ji The method comprises the steps of carrying out a first treatment on the surface of the Wherein z=1, 2, i; f (F) jz For F obtained after sequencing j A corresponding z-th first candidate string;
the preset character string sequence is the arrangement sequence of the preset character strings of different types, and as the arrangement sequence of a plurality of target candidate character strings of the same file corresponding to each target data source is different, in order to facilitate the matching of the character strings, the sequence of the different target candidate character strings of the same file is adjusted, so that the types of the target candidate character strings in the same position are identical.
Step S432, according to the preset character string sequence, for G aj The corresponding i second candidate character strings are sequenced to obtain a second candidate character string list G aj1 ,G aj2 ,...,G ajz ,...,G aji The method comprises the steps of carrying out a first treatment on the surface of the Wherein G is ajz Is a rowG obtained after the sequence aj A corresponding z-th second candidate string;
step S433, if G ajz And F is equal to jz Identical, then 1 is determined as G ajz And F is equal to jz Degree of string matching J between ajz The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, 0 is determined as G ajz And F is equal to jz Degree of string matching J between ajz
Step S434, confirm
Step S500, determining a sample matching degree list H= (H) according to the name matching degree list set E 1 ,H 2 ,...,H a ,...,H b ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein H is a According to E a The sample matching degree between the a-th target sample file and the target malicious file is obtained;
the sample matching degree indicates the similarity degree between the target malicious file and the corresponding target sample file, and the larger the sample matching degree is, the more similar the target malicious file and the corresponding target sample file are.
Step S600, determining at least one target similar sample file corresponding to the target malicious file from the b target sample files according to the sample matching degree list H.
Further, in step S600, determining at least one target similar sample file corresponding to the target malicious file from the b target sample files includes:
step S610, traversing the sample matching degree list H, if H a ≥H 0 Determining the a-th target sample file as a target similar sample file corresponding to the target malicious file; wherein H is 0 And presetting a sample matching degree threshold value.
Comparing the sample matching degree with a preset sample matching degree threshold, and if the sample matching degree is greater than or equal to the preset sample matching degree threshold, determining the corresponding target sample file as a target similar sample file corresponding to the target malicious file.
Further, after determining the target similar sample file, the method for determining the malicious detection rule through the target malicious file and the target similar sample file is as follows:
step S700, sorting each target similar sample file according to the descending order of the sample matching degree corresponding to each target similar sample file to obtain a sorted similar sample file list T 1 ,T 2 ,...,T n ,...,T q The method comprises the steps of carrying out a first treatment on the surface of the Wherein n=1, 2, q; q is the number of target similar sample files; t (T) n The n-th target similar sample file is sequenced according to the sample matching degree;
and sorting the target similar sample files according to the sample matching degree to obtain a sorted similar sample file list, wherein the lower the position in the sorted similar sample file list is, the lower the similarity between the target similar sample files and the target malicious files is.
Step S710, let n=1;
step S711, if n is less than or equal to q, according to the ordered similar sample file list T 1 ,...,T n The method comprises the steps that the candidate detection rules are obtained through the included file characteristics and the file characteristics included in the target malicious file;
step S712, according to the candidate detection rule, for T n+1 ,...,T q Performing malicious detection to obtain q-n corresponding malicious detection results;
step S713, if each malicious detection result represents that the corresponding target similar sample file is a malicious file, determining the candidate detection rule as an initial detection rule; otherwise, let n=n+1, and return to step S711.
In order to further reduce the data processing amount, when determining candidate detection rules, according to the sequence of the sample matching degree from high to low, taking the file characteristics of the target similar sample file and the target malicious file to obtain the corresponding candidate detection rules, and then verifying the obtained candidate detection rules, namely, T n+1 ,...,T q Performing malicious detection to obtain a corresponding malicious detection result, wherein the target similar sample file is a similar sample file of the target malicious file, so that the target isAnd if the target similar sample files are malicious files, the candidate detection rules pass verification detection, and are determined to be initial detection rules, otherwise, the file characteristics of the target similar sample files with the sample matching degree are continuously taken down to determine the candidate detection rules, and then the obtained candidate detection rules are verified until the verification passes or all the file characteristics of the target similar sample files are completely taken out.
Step S720, carrying out malicious detection on a plurality of preset verification sample files according to the initial detection rules to obtain detection accuracy corresponding to the initial detection rules;
the verification sample file is a sample file for rule verification.
Step S730, if the detection accuracy is smaller than a preset detection accuracy threshold, a supplementary sample file is obtained;
if the detection accuracy is smaller than the preset detection accuracy threshold, the detection accuracy of the initial detection rule is lower, and then the initial detection rule is redetermined by acquiring a supplementary sample file.
Further, in step S730, the method for acquiring the supplementary sample file includes:
step S731, obtaining the determination time t of the initial detection rule;
step S732, sequentially obtaining the files to be detected received from t to the current time, to obtain a set y= (Y) of files to be detected 1 ,Y 2 ,...,Y k ,...,Y u ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein k=1, 2,. -%, u; u is the number of files to be detected received from t to the current time; y is Y k The method comprises the steps that a kth file to be detected is received from t to the current time;
step S733, let k=1;
step S734, if k is less than or equal to u, according to the initial detection rule, for Y k Carrying out similarity detection on the included file characteristics to obtain corresponding similarity detection results;
step S735, if the similar detection result represents Y k Is a similar file, Y is then k Is determined as a supplement sampleThis document; otherwise, let k=k+1, and return to step S734.
The detection accuracy of the initial detection rule being smaller than the preset detection accuracy threshold may be caused by too low reference value due to too early acquisition time of the historical sample file, so the first similar file acquired after the initial detection rule determination time is selected as the supplementary sample file.
Step 740, redetermining an initial detection rule according to the supplementary sample file, the target malicious file and the plurality of target similar sample files, and determining the initial detection rule as a malicious detection rule if the detection accuracy corresponding to the initial detection rule is greater than or equal to a preset detection accuracy threshold.
And re-determining an initial detection rule through the determined supplementary sample file, the target malicious file and the plurality of target similar sample files, verifying the initial detection rule according to the verification sample file, acquiring a new supplementary sample file if the corresponding detection accuracy is still smaller than a preset detection accuracy threshold value, re-determining the initial detection rule until the detection accuracy corresponding to the initial detection rule is larger than or equal to the preset detection accuracy threshold value, indicating that the initial detection rule at the moment meets the detection verification requirement, and determining the initial detection rule as the malicious detection rule.
According to the method, a plurality of target sample files are obtained according to the received target malicious files, the name matching degree of the target malicious files corresponding to each target sample file is determined according to name character strings set by each target data source for the target malicious files and name character strings set by each target data source for each target sample file, the sample matching degree of the target malicious files corresponding to each target sample file is determined according to the plurality of name matching degrees, and the target similar sample files corresponding to the target malicious files are determined from the plurality of target sample files according to the plurality of sample matching degrees. The name character strings of the target malicious files and the target sample files are extracted by each target data source, so that the corresponding sample matching degree is determined, and the efficiency of determining the matching degree of the target similar sample files can be improved while the system occupation calculation force is reduced.
A similar sample file determining apparatus 100, as shown in fig. 2, includes:
a sample file obtaining module 110, configured to obtain a plurality of target sample files when receiving a target malicious file;
a first name string obtaining module 120, configured to obtain name strings set by each target data source for the target malicious file, so as to obtain a first name string list f= (F) 1 ,F 2 ,...,F j ,...,F m ) The method comprises the steps of carrying out a first treatment on the surface of the Where j=1, 2, m; m is the number of target data sources; f (F) j Setting a name character string for the j-th target data source to the target malicious file;
a second name string obtaining module 130, configured to obtain a name string set by each target data source for each target sample file, so as to obtain a second name string list set g= (G) 1 ,G 2 ,...,G a ,...,G b );G a =(G a1 ,G a2 ,...,G aj ,...,G am ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein a=1, 2, b; b is the number of target sample files; j=1, 2, m; m is the number of target data sources; g a A name character string list corresponding to the a-th target sample file; g aj A name string set for the jth target data source for the jth target sample file;
a name matching degree determining module 140, configured to determine a name matching degree list set e= (E) according to the first name string list F and the second name string list set G 1 ,E 2 ,...,E a ,...,E b );E a =(E a1 ,E a2 ,...,E aj ,...,E am ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein E is a A name matching degree list corresponding to the a-th target sample file and the target malicious file; e (E) aj Is G aj And F is equal to j The degree of name matching between the two;
the sample matching degree determining module 150 is configured to determine a sample matching degree list h= (H) according to the name matching degree list set E 1 ,H 2 ,...,H a ,...,H b ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein H is a According to E a The obtained a-th target sample textSample matching degree between the piece and the target malicious file;
the similar sample file determining module 160 is configured to determine, according to the sample matching degree list H, at least one target similar sample file corresponding to the target malicious file from the b target sample files.
Embodiments of the present invention also provide a computer program product comprising program code for causing an electronic device to carry out the steps of the method according to the various exemplary embodiments of the invention as described in the specification, when said program product is run on the electronic device.
Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device according to this embodiment of the invention. The electronic device is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present invention.
The electronic device is in the form of a general purpose computing device. Components of an electronic device may include, but are not limited to: the at least one processor, the at least one memory, and a bus connecting the various system components, including the memory and the processor.
Wherein the memory stores program code that is executable by the processor to cause the processor to perform steps according to various exemplary embodiments of the invention described in the "exemplary methods" section of this specification.
The storage may include readable media in the form of volatile storage, such as Random Access Memory (RAM) and/or cache memory, and may further include Read Only Memory (ROM).
The storage may also include a program/utility having a set (at least one) of program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The bus may be one or more of several types of bus structures including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device, and/or with any device (e.g., router, modem, etc.) that enables the electronic device to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface. And, the electronic device may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter. As shown, the network adapter communicates with other modules of the electronic device over a bus. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with an electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. A method for determining a similar sample file, the method comprising the steps of:
in response to receiving a target malicious file, acquiring a plurality of target sample files;
acquiring name strings extracted from each target data source for the target malicious file to obtain a first name string list F= (F) 1 ,F 2 ,...,F j ,...,F m ) The method comprises the steps of carrying out a first treatment on the surface of the Where j=1, 2, m; m is the number of target data sources; f (F) j Extracting a name character string for the j-th target data source for the target malicious file; each target data source is correspondingly provided with a detection rule, the target data source carries out malicious detection on the target malicious file through the corresponding detection rule, and each target numberThe method comprises the steps that a plurality of preset characters are corresponding to the data sources, the preset characters are expressed as segmentation characters in corresponding name character strings, and each target data source extracts the name character string of a target malicious file through the corresponding preset characters; the name character string is a character string of a virus name of a virus in the corresponding target malicious file, and comprises an attack type character string, a virus family character string, an application platform character string and a virus variant character string of the virus;
acquiring name strings set by each target data source for each target sample file to obtain a second name string list set g= (G) 1 ,G 2 ,...,G a ,...,G b );G a =(G a1 ,G a2 ,...,G aj ,...,G am ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein a=1, 2, b; b is the number of target sample files; g a A name character string list corresponding to the a-th target sample file; g aj A name string set for the jth target data source for the jth target sample file;
determining a name matching degree list set E= (E) according to the first name string list F and the second name string list set G 1 ,E 2 ,...,E a ,...,E b );E a =(E a1 ,E a2 ,...,E aj ,...,E am ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein E is a A name matching degree list corresponding to the a-th target sample file and the target malicious file; e (E) aj Is G aj And F is equal to j The degree of name matching between the two;
determining a sample matching degree list H= (H) according to the name matching degree list set E 1 ,H 2 ,...,H a ,...,H b ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein H is a According to E a The sample matching degree between the obtained a-th target sample file and the target malicious file;
and determining at least one target similar sample file corresponding to the target malicious file from b target sample files according to the sample matching degree list H.
2. The method according to claim 1, which comprisesCharacterized in that E aj Is determined by the following steps:
according to the preset character corresponding to the jth target data source, F j Splitting character strings to obtain F j Corresponding i first candidate character strings;
according to the preset character corresponding to the jth target data source, G aj Splitting character strings to obtain G aj Corresponding i second candidate character strings;
according to F j Corresponding i first candidate character strings and G aj Corresponding i second candidate character strings, determining E aj
3. The method according to claim 2, wherein said step of determining F j Corresponding i first candidate character strings and G aj Corresponding i second candidate character strings, determining E aj Comprising:
according to the preset character string sequence, F j The corresponding i first candidate character strings are sequenced to obtain a first candidate character string list F j1 ,F j2 ,...,F jz ,...,F ji The method comprises the steps of carrying out a first treatment on the surface of the Wherein z=1, 2, i; f (F) jz For F obtained after sequencing j A corresponding z-th first candidate string;
according to the preset character string sequence, G aj The corresponding i second candidate character strings are sequenced to obtain a second candidate character string list G aj1 ,G aj2 ,...,G ajz ,...,G aji The method comprises the steps of carrying out a first treatment on the surface of the Wherein G is ajz For G obtained after sequencing aj A corresponding z-th second candidate string;
if G ajz And F is equal to jz Identical, then 1 is determined as G ajz And F is equal to jz Degree of string matching J between ajz The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, 0 is determined as G ajz And F is equal to jz Degree of string matching J between ajz
Determination of
4. A method according to claim 3, characterized in that H a Is determined by the following formula:
5. the method of claim 1, wherein determining at least one target similar sample file corresponding to the target malicious file from the b target sample files comprises:
traversing the sample matching degree list H, if H a ≥H 0 Determining an a-th target sample file as a target similar sample file corresponding to the target malicious file; wherein H is 0 And presetting a sample matching degree threshold value.
6. The method of claim 1, wherein the obtaining a number of target sample files comprises:
acquiring file information of the target malicious file;
and determining a plurality of target sample files from the plurality of history sample files according to the file information.
7. The method of claim 6, wherein determining a plurality of target sample files from a plurality of history sample files comprises:
traversing each history sample file, and determining the history sample file as a target sample file if the file information of the history sample file is the same as the file information of the target malicious file.
8. A similar sample file determining apparatus, comprising:
the sample file acquisition module is used for acquiring a plurality of target sample files when receiving the target malicious files;
a first name string acquisition module for acquiring each ofName strings extracted from the target malicious file by the target data source are obtained to obtain a first name string list f= (F) 1 ,F 2 ,...,F j ,...,F m ) The method comprises the steps of carrying out a first treatment on the surface of the Where j=1, 2, m; m is the number of target data sources; f (F) j Extracting a name character string for the j-th target data source to the target malicious file; each target data source corresponds to a detection rule, the target data source carries out malicious detection on the target malicious file through the corresponding detection rule, each target data source corresponds to a plurality of preset characters, the preset characters are expressed as segmentation characters in the corresponding name character string, and each target data source extracts the name character string of the target malicious file through the corresponding preset characters; the name character string is a character string of a virus name of a virus in the corresponding target malicious file, and comprises an attack type character string, a virus family character string, an application platform character string and a virus variant character string of the virus;
a second name string obtaining module for obtaining name strings set by each target data source for each target sample file to obtain a second name string list set g= (G) 1 ,G 2 ,...,G a ,...,G b );G a =(G a1 ,G a2 ,...,G aj ,...,G am ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein a=1, 2, b; b is the number of target sample files; j=1, 2, m; m is the number of target data sources; g a A name character string list corresponding to the a-th target sample file; g aj A name string set for the jth target data source for the jth target sample file;
a name matching degree determining module for determining a name matching degree list set e= (E) according to the first name string list F and the second name string list set G 1 ,E 2 ,...,E a ,...,E b );E a =(E a1 ,E a2 ,...,E aj ,...,E am ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein E is a A name matching degree list corresponding to the a-th target sample file and the target malicious file; e (E) aj Is G aj And F is equal to j The degree of name matching between the two;
sample matching degree determinationA module for determining a sample matching degree list h= (H) according to the name matching degree list set E 1 ,H 2 ,...,H a ,...,H b ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein H is a According to E a The sample matching degree between the a-th target sample file and the target malicious file is obtained;
and the similar sample file determining module is used for determining at least one target similar sample file corresponding to the target malicious file from the b target sample files according to the sample matching degree list H.
9. A non-transitory computer readable storage medium having stored therein at least one instruction or at least one program, wherein the at least one instruction or the at least one program is loaded and executed by a processor to implement the method of any one of claims 1-7.
10. An electronic device comprising a processor and the non-transitory computer readable storage medium of claim 9.
CN202311254331.3A 2023-09-27 2023-09-27 Method and device for determining similar sample files, electronic equipment and storage medium Active CN116992449B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311254331.3A CN116992449B (en) 2023-09-27 2023-09-27 Method and device for determining similar sample files, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311254331.3A CN116992449B (en) 2023-09-27 2023-09-27 Method and device for determining similar sample files, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116992449A CN116992449A (en) 2023-11-03
CN116992449B true CN116992449B (en) 2024-01-23

Family

ID=88532560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311254331.3A Active CN116992449B (en) 2023-09-27 2023-09-27 Method and device for determining similar sample files, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116992449B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446687A (en) * 2016-10-14 2017-02-22 北京奇虎科技有限公司 Detection method and device of malicious sample
CN108268772A (en) * 2016-12-30 2018-07-10 武汉安天信息技术有限责任公司 The screening technique and system of malice sample
US10243977B1 (en) * 2017-06-21 2019-03-26 Symantec Corporation Automatically detecting a malicious file using name mangling strings
CN112182569A (en) * 2019-07-03 2021-01-05 腾讯科技(深圳)有限公司 File identification method, device, equipment and storage medium
CN112818347A (en) * 2021-02-22 2021-05-18 深信服科技股份有限公司 File label determination method, device, equipment and storage medium
CN115562992A (en) * 2022-10-09 2023-01-03 北京安天网络安全技术有限公司 File detection method and device, electronic equipment and storage medium
CN116522338A (en) * 2023-04-18 2023-08-01 深圳市深信服信息安全有限公司 File processing method, equipment and computer readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090065977A (en) * 2007-12-18 2009-06-23 삼성에스디에스 주식회사 A virus detecting method to determine a file's virus infection
RU2747464C2 (en) * 2019-07-17 2021-05-05 Акционерное общество "Лаборатория Касперского" Method for detecting malicious files based on file fragments
US20230289443A1 (en) * 2022-03-11 2023-09-14 Nutanix, Inc. Malicious activity detection, validation, and remediation in virtualized file servers

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446687A (en) * 2016-10-14 2017-02-22 北京奇虎科技有限公司 Detection method and device of malicious sample
CN108268772A (en) * 2016-12-30 2018-07-10 武汉安天信息技术有限责任公司 The screening technique and system of malice sample
US10243977B1 (en) * 2017-06-21 2019-03-26 Symantec Corporation Automatically detecting a malicious file using name mangling strings
CN112182569A (en) * 2019-07-03 2021-01-05 腾讯科技(深圳)有限公司 File identification method, device, equipment and storage medium
CN112818347A (en) * 2021-02-22 2021-05-18 深信服科技股份有限公司 File label determination method, device, equipment and storage medium
CN115562992A (en) * 2022-10-09 2023-01-03 北京安天网络安全技术有限公司 File detection method and device, electronic equipment and storage medium
CN116522338A (en) * 2023-04-18 2023-08-01 深圳市深信服信息安全有限公司 File processing method, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN116992449A (en) 2023-11-03

Similar Documents

Publication Publication Date Title
US8631498B1 (en) Techniques for identifying potential malware domain names
CN109743311B (en) WebShell detection method, device and storage medium
CN113141360B (en) Method and device for detecting network malicious attack
CN115221516B (en) Malicious application program identification method and device, storage medium and electronic equipment
CN112988753B (en) Data searching method and device
JP2022120024A (en) Audio signal processing method, model training method, and their device, electronic apparatus, storage medium, and computer program
US8335757B2 (en) Extracting patterns from sequential data
CN113886821A (en) Malicious process identification method and device based on twin network, electronic equipment and storage medium
CN116992449B (en) Method and device for determining similar sample files, electronic equipment and storage medium
CN111444364B (en) Image detection method and device
CN116015861A (en) Data detection method and device, electronic equipment and storage medium
CN111353039B (en) File category detection method and device
CN113312619B (en) Malicious process detection method and device based on small sample learning, electronic equipment and storage medium
CN113239687B (en) Data processing method and device
CN115495740A (en) Virus detection method and device
CN114925365A (en) File processing method and device, electronic equipment and storage medium
CN116992448B (en) Sample determination method, device, equipment and medium based on importance degree of data source
CN116992450B (en) File detection rule determining method and device, electronic equipment and storage medium
CN116933189A (en) Data detection method and device
CN117009961B (en) Method, device, equipment and medium for determining behavior detection rule
CN116910756B (en) Detection method for malicious PE (polyethylene) files
CN117034275B (en) Malicious file detection method, device and medium based on Yara engine
CN110704617A (en) News text classification method and device, electronic equipment and storage medium
CN113890756B (en) Method, device, medium and computing equipment for detecting confusion of user account
CN116760644B (en) Terminal abnormality judging method, system, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant