WO2008098495A1 - Procédé et dispositif de détermination d'un fichier objet - Google Patents

Procédé et dispositif de détermination d'un fichier objet Download PDF

Info

Publication number
WO2008098495A1
WO2008098495A1 PCT/CN2008/070223 CN2008070223W WO2008098495A1 WO 2008098495 A1 WO2008098495 A1 WO 2008098495A1 CN 2008070223 W CN2008070223 W CN 2008070223W WO 2008098495 A1 WO2008098495 A1 WO 2008098495A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
file
detected
string
feature string
Prior art date
Application number
PCT/CN2008/070223
Other languages
English (en)
Chinese (zh)
Inventor
Jie Bai
Wei Li
Zhengyu Lu
Original Assignee
Jie Bai
Wei Li
Zhengyu Lu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jie Bai, Wei Li, Zhengyu Lu filed Critical Jie Bai
Publication of WO2008098495A1 publication Critical patent/WO2008098495A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata

Definitions

  • the present invention relates to a method and apparatus for determining whether a data set is a desired data set.
  • the image represented by the simulated portrait is found in the face image database, it is an attempt to determine whether there is a target image to be searched in the face image database; if a feature command or data of a virus is found in a suspicious program, it is an attempt. Determine if the suspicious program is a virus program. Since the image or feature instruction represented by the simulated portrait is involved in many feature data and is in a discrete state, the existing method is difficult to realize fast and efficient search and determination of the target data set. There is also a case in practice, assuming that a feature set of a virus program contains 10 features, that is, a combination of feature sets of various virus programs or variant virus programs in various known virus programs of the virus program.
  • the problem to be solved by the present invention is to provide a method and apparatus for determining an object file capable of quickly and accurately determining a detected data set, thereby realizing quick and accurate determination of whether the detected file is As the target file.
  • the method for determining an object file includes: selecting, in the storage unit, a feature string set for determining the target file, where the set includes at least one for determining The characteristic string of the target file;
  • the following step may also be used to search for the character string in the set in the file to be detected according to the rule: selecting an unselected feature string in the set until the accumulated result satisfies the first rule, for selecting Each of the feature strings scans the file to be detected, obtains a valid position of the feature string in the file to be detected, accumulates the valid position, and obtains a cumulative result. After the search for the detected file ends, the accumulated result is used as a search result. result.
  • the first rule is: in the cumulative result, the sum of the found feature strings in the valid position in the file to be detected reaches a set value; or, in the cumulative result, the feature string that is effectively found in the file to be detected The number of the characters reaches the set value; or, in the cumulative result, the positional relationship of the feature string that is effectively found in the file to be detected satisfies the set sequence feature and/or the interval feature.
  • the first condition is: in the search result, the number of feature strings that are effectively found in the file to be detected reaches a set value; or, the search result is effectively found in the file to be detected.
  • the positional relationship of the feature strings satisfies the set sequential features and/or spacing features.
  • the method further includes: determining a feature character of each feature string and a second rule for constructing a corresponding feature string according to the feature character; and scanning the file to be detected according to the following steps: in the file to be detected, Finding the feature character until the file to be detected is searched, and for each of the found feature characters, constructing a corresponding feature string according to the second rule, if The feature string is successfully built, and the feature string position of the successful construction is taken as a valid position. Moreover, if the character of the feature string constructed according to the second rule is identical or the same as the character of the feature string as the basis of the search reaches the set value, it is determined that the feature string is successfully constructed.
  • the apparatus for determining an object file includes a storage unit that stores a complete set of feature strings, and further includes:
  • a feature string selection unit that selects, in the storage unit, a set of feature strings for determining the target file, the set including at least one feature string for determining the target file;
  • a file scanning unit configured to search for a string in the set in the file to be detected according to a rule, and obtain a search result
  • a determining unit configured to determine whether the result satisfies the first condition, and,
  • the target file determining unit determines that the file to be detected is a target file when the result satisfies the first condition.
  • a feature string set for determining a target file needs to be selected, and a string in the set is searched for in the file to be detected according to a rule, and after the search result is obtained, And determining whether the search result satisfies the first condition, thereby determining that the to-be-detected file is a target file. Since the search rules and the judgment conditions are adopted in the determination process, the rules and conditions can be used to constrain the feature string search, such as fuzzy search and the nature of the detected file, and the purpose of detection, so that the search judgment is more targeted, and thus can Quickly and accurately determine the set of detected data.
  • Figure 1 is a flow chart of a first embodiment of the method of the present invention
  • FIG. 2 is a flow chart of searching for a set of feature characters in a file to be detected, which is used in the embodiment of FIG. 1;
  • FIG. 3 is a database structure diagram of a storage feature string used in the embodiment of FIG. 1.
  • FIG. 4 is a block diagram of an embodiment of the apparatus according to the present invention.
  • the method of finding the target file has a very wide range of applications. By searching for features, the method can determine whether a file is a duplicate of another file, or whether a file is infected by a virus program, and the like. Whether it is a copy of a file or a file infected by a virus program, it contains features of the active file or infected features. These features usually have the characteristics of large amount of data, uncertainty, and large data dispersion. Therefore, it is difficult to determine which of these features will appear in the file to be detected, and it is more difficult to determine the existence of these features in the file to be detected. Form and specific location.
  • a suspicious program has an instruction feature to delete data, but what parameters the instruction feature carries and what additional conditions are available to determine that the suspicious program is a virus program is uncertain. Therefore, the use of existing methods to quickly find a large number of such suspicious programs takes a lot of time, or has a lower detection success rate at the same time and separately.
  • Figure 1 is a flow chart of a first embodiment of the method of the present invention.
  • any data or file can be converted or boiled into a character string having a certain encoding rule, such as a string encoded by ASCI I, the present embodiment takes the lookup of the character string as .
  • the output of Fig. 1 is a file to be detected which is obtained in advance.
  • the set of feature strings is first determined in step 11, which is a pre-processing step and is the basis for subsequent steps.
  • the feature string is also referred to as a "fingerprint.”
  • the set of feature strings contains a feature string for determining the target file, which is the basis for finding the same or similar files. For a simpler case, sometimes a feature string can be used to determine whether the file to be detected is the target file, but more often it requires a lot of feature strings to determine whether the target file is to be detected.
  • the feature string included in the set of feature strings is at least one or more.
  • the set uses a data part feature string in the form of a two-dimensional table and its corresponding auxiliary data in this embodiment.
  • the database can also be replaced by other data structures, such as a linear queue.
  • FIG. 3 A typical database structure is shown in Figure 3. Among them, the fields “number”, “character string” and “length” are the most basic. If the length of the selected feature string is equal, the length can also be removed. Other fields are used for Figure 1. Further improvements of the described embodiments or other embodiments can be seen as auxiliary data to improve the performance of the methods described in the embodiments. It should be noted that the step 11 selects, in the database, a set of feature strings for determining the target file, and is a feature string that is used in determining a certain object file, and the feature strings are for different target files. The judgment is different.
  • the database of the structure shown in FIG. 3 it is used to store such a complete set of feature strings for detecting a certain target file, and if used to store the complete set of feature strings for detecting all of the various target files, only You need to add a field to identify the number or name of the target file that a feature string is used for detection.
  • the feature group field management can be used. and many more.
  • the characteristic string may also require manual participation. For example, whether analyzing illegal network access feature information or analyzing virus feature information, there are various methods and processes for analyzing, and there are also various ways to analyze the attribute or type of feature information, and the analysis process This can be done manually or by computer.
  • the feature string determined according to the feature information can be selected and stored according to actual conditions and needs.
  • an example of analyzing and obtaining a character string of a virus program using a computer tool uses a tool program such as DEBUG, PR0VIEW, and a special test system to analyze the characteristic information of the virus.
  • the reason why the dedicated test system is used is because the object of analysis may be a virus, and it is very likely that it will continue to spread or even attack at the stage of analysis.
  • the specific analysis process can use DEBUG or other disassembly tool program to print the virus code into a disassembled program list, analyze the code segment of the virus with characteristic information, and also use the dynamic analysis of the virus components. What calls or actions to the system, and what actions are taken Ways and processes to analyze a series of operational behaviors of the virus.
  • the characteristic information corresponding to the above-mentioned virus or the characteristic string corresponding to the operational behavior of the virus is stored in a database manner, so that it is further used to identify and determine whether other programs to be detected are virus programs.
  • Step 11 determines the set of feature strings, mainly considering the discreteness of the feature data and the uncertainty of the position in the file to be detected, and decomposing the feature data into different feature character subsets with smaller number of characters. , that is, the feature string.
  • the string in the set is searched in the file to be detected according to the rule in step 12, and the search result is obtained.
  • the rules described here are different for different target files, mainly considering the speed of search and the accuracy of the search, and the search strategy based on the nature of the target file being looked up. The simplest rule is to retrieve each feature string in the feature set in the file to be tested.
  • the following steps are used to search for the feature string in the set in the file to be detected according to the rule: each time a feature string in the set is selected, until each feature string in the set After being selected once, for each feature string selected, the file to be detected is scanned, the effective position of the feature string in the file to be detected is obtained, and the position of all the feature strings in the file to be detected is used as the search result.
  • Step 13 determines whether the search result satisfies a preset condition. If yes, it determines in step 14 that the file to be detected is a target file, and then performs subsequent processing. Otherwise, the determining operation ends in step 15.
  • step 12 A specific example of scanning a file to be detected as described in step 12 is shown in FIG. In the steps
  • step 21 reading an unselected feature string from the set of feature strings, and then scanning the file to be detected in step 22 by using the feature string, that is, searching for the feature string in the file to be detected; In step 23, it is determined whether the feature string is scanned or found. When it is not determined that the character portion that has been scanned contains the feature string, the process returns to step 22 to continue the subsequent scan; if it is determined in step 23 that the character portion that has been scanned includes the In the case of the feature string, the result of the scan is recorded in step 24. For example, in this example, the effective position of the feature string in the file to be detected is recorded, and the effective position includes the offset of the feature string in the file to be detected. , length, etc.
  • the valid position refers to a position when the feature string is completely included in the file to be detected, when the feature string does not completely contain the file in the file to be detected, and the file is not found in the file to be detected.
  • Characteristic String there is no valid position of the feature string. However, due to the need for subsequent management and judgment, "empty" is also recorded in the valid location field in the management table, and the identifier is not found in the file to be detected.
  • a scan result table contains three fields of "feature string number” "start position” "length”, for feature string ACDS numbered 1 and feature string DFS numbered 2, and file to be detected QWERTACDSYASDFGHZXCVB , based on the result of the scan of the characteristic string ACDS, that is, the description of the scan result table is: “1” "6" "4"; the result of the scan of the feature string DFS is: “2" "0” " 3” , where "0" in the middle of the scan result of DFS means “empty”, that is, the feature string DFS is not scanned.
  • step 23 further includes another judgment content, that is, whether the scan is finished, and if it is finished, the scan operation is terminated, and the process proceeds to step 26 to determine and obtain the final search result (because the branch is obvious, the figure 2 is not Draw).
  • Step 25 is used to determine whether each feature string in the feature string set is completely scanned. If not all the scans are completed, then step 21 is performed to select the next unselected feature string to continue the scanning operation, otherwise it is determined in step 26. And get the final search results. Mode selection, as long as the selected feature string is not repeated.
  • the method of finding the effective position of each feature string in the file to be detected may not be very necessary, and in some cases, the efficiency of the search is lowered.
  • the efficiency of the search is lowered.
  • eight feature strings are included in a set of feature strings, and when four feature strings have a certain order and interval feature, it can be determined whether the file to be detected is a target. The judgment conclusion of the document.
  • the second embodiment of the present invention provides another method of searching for a character string in the set in a file to be detected.
  • the method selects an unselected feature string in the set, scans the to-be-detected file for each feature string selected, and obtains a valid position of the feature string in the file to be detected, and accumulates The effective position obtains a cumulative result. If the accumulated result satisfies a predetermined rule, it indicates that the detected file can be determined based on the current accumulated result, so that the detection of the detected file can be ended, and the cumulative result is obtained. As a result of the search, further processing is performed.
  • the text described above contains The meaning is that if the accumulated result satisfies a predetermined rule, the search can be ended early without having to select each feature string in the feature set once, thereby improving the search for the set in the file to be detected. The efficiency of the feature string.
  • the result of the search should be recorded when the search result is accumulated due to management or other such as statistical needs, etc., that is, the search result is "empty", indicating that there is no Found; of course, if there is a clear search result, it is necessary to record the positional parameter of the feature string in the file to be detected.
  • the first effective position may be recorded, and all valid positions may be recorded, depending on actual needs, in one
  • the feature string in the collection has explicit position or relationship characteristics.
  • a virus program has a full set of feature strings: Al, A2, A3, A4, A5.
  • the characteristic string of the ensemble when the feature string combination and the order relationship described in the following feature set 1 appear in a to-be-detected program, the program to be detected is regarded as a virus program; when a program to be detected appears When the feature string combination and the order relationship described in feature set 2 are described, the program to be detected is considered to be a specific virus program.
  • the feature set 1 is: ⁇ A1 A3 A4 , A1 A3 A4 A5 , Al A1 A3 A4 A5 Al , A2 A4 A5 , A1 A3 A5 , A3 A4 A5 , Al A4 A5 , Al A2 A3 A4 A5 ⁇ ;
  • Feature set 2 is: ⁇ A1 A3 A5 , A3 A4 A5 , Al A4 A5 , Al A2 A3 A4 A5 ⁇ ;
  • the feature string may appear multiple times in the file to be detected to be able to determine the program to be tested Whether it is a virus program, therefore, in this case, it is necessary to record all valid positions of the feature string in the file to be detected.
  • the feature string appears once in the file to be detected to determine whether the program to be detected is a virus program.
  • the effective position includes at least an offset parameter. If the length of the feature string is different, the length parameter needs to be included.
  • the setting further includes a parameter indicating a discrete relationship of the specific character allowed by the feature string, and the like. .
  • the feature string in the set is searched in the file to be detected in an exact matching manner, which may cause a decrease in the search accuracy in some cases.
  • a feature string for a virus program may change in its variant virus program, thereby increasing the misjudgment of a program to be tested. Possibility.
  • the feature string determined from the original text file also changes. This change may be due to different parameters carried or other reasons.
  • the characteristic string of the virus is the most representative virus characteristic string selected after careful analysis of the virion, which is sufficient to distinguish the virus from other viruses and other variants of the virus, the virus program and the normal non-virus are used. Programs are separated to avoid processing non-virus programs as virus programs.
  • the specific composition of the feature string may change due to differences in carrying parameters.
  • the feature information therein may contain one to several "fuzzy" bytes. In this case, as long as the string other than the "fuzzy" byte can be perfectly matched, the feature string can also be discriminated, and then the judgment of whether the program to be detected is a virus program is obtained. For example: Given the characteristic string: "E9 7C 00 10 ?
  • the predetermined rule supporting the fuzzy lookup is: the cumulative result of the positional relationship of the feature string that is effectively found in the file to be detected satisfies the set sequence feature, the interval feature, or both Sequential features and spacing characteristics are met.
  • the sequence features include combinations and/or permutations of feature strings.
  • the order feature and the interval feature are different for the determination of object files of different purposes and different properties.
  • the interval feature reflects a discrete relationship of constituent characters of a feature string. For example, the interval between constituent characters of a feature string should not exceed a certain number of characters, depending on actual different purposes and targets of different natures.
  • the documentation determines the need.
  • the predetermined rule supporting the fuzzy search may also be: in the cumulative result, the sum of the found feature strings in the valid position in the file to be detected reaches a set value, that is, the found feature character. The number of valid positions of the string reaches the set value; or, in the cumulative result, the number of feature strings that are effectively found in the file to be detected reaches the set value.
  • the way in which that method is used as the determination rule is also determined according to actual needs.
  • Such rules may be different for the determination of object files of different purposes and different natures, and many such rules need to be predetermined.
  • the "purpose” described here is often equivalent to "requirement".
  • determining whether a file to be inspected is a target file according to the probability or possibility required, thereby achieving an efficiency, false positive rate and missed rate.
  • the "nature” also has an impact on the determination of different object files.
  • the regularity of the distribution and/or composition of feature strings that a file has may be different from the regularity of distribution and/or composition of another file. Typical regularities are different, such as discreteness, which may be due to Different factors such as carrying different parameters or operating object data are different.
  • the present embodiment can be made relatively the first embodiment. Further improve the efficiency, the miss rate and the accuracy of determining whether a file to be detected is a target file.
  • the search result can determine whether the file to be detected is the target file only after the step 13 determines whether the predetermined condition is met.
  • the condition described herein may be: in the search result, the number of feature strings that are effectively found in the file to be detected reaches a set value; or, the search result is valid in the file to be detected. The positional relationship of the found feature string satisfies the set sequence feature and/or interval feature. and many more. It should be noted that the "conditions" described here may be the same as or different from the "rules" described above. In the case of determining whether the file to be detected is a target file, the same may be the same.
  • step 13 may directly use the judgment result of the foregoing steps to draw a conclusion, or simply omit step 13.
  • the determination of whether the more complex file to be detected is the target file it may be different.
  • the preferred method of the previous "rule" is to pay attention to efficiency, and the condition of the following step 13 is to focus on accuracy. This can improve the overall performance of the method embodiment.
  • the conditions described herein may also be various depending on actual needs.
  • first and second embodiments are capable of solving efficiency and reliability problems by fuzzy search, when the feature string is large, the method of scanning the character to be detected can still be improved to further improve the search efficiency of the feature string. .
  • a third embodiment of the present invention provides such an improvement. Different from the first and second embodiments, the third embodiment adds a step of: determining a feature character of each feature string and a second rule for constructing a corresponding feature string according to the feature character, This is also a step equivalent to "initialization".
  • each feature string has a relatively stable part. For example, for a destructive instruction, such as a delete instruction, the instruction itself usually does not change. The change is usually the parameter of the carried parameter or the operation object data. The position in the feature string, therefore, by determining that the most stable part of each feature string is a feature character, since the number of feature characters is small, the feature string can be quickly located by the feature character search.
  • the feature character can also be referred to as an "anchor".
  • anchor For different feature strings, those specific characters that are suitable as feature characters are usually indeterminate, and the position in the feature string is also uncertain, so after the feature characters of a feature string are determined, The position of the feature character in the feature string and the composition characteristic of the feature string itself determine a second rule for constructing the corresponding feature string in accordance with the feature character. It can be seen that this embodiment attempts to provide a method for quickly locating feature strings through “anchors" and quickly "assembling" feature strings.
  • the feature string 1 is (represented by the ASCI I code): 23 4E 6F 55 77 09 OA 9D 34 8C, if "6F 55" is used as the feature character, the rule that the feature character constructs the corresponding feature string may be : “2, 6” , indicating that the character “6F 55" in front of the two characters “23 4E” and the following six characters “77 09 OA 9D 34 8C” can constitute the corresponding feature string; if "23 4E "As a feature character, the rule that the feature character constructs the corresponding feature string may be: “0, 8", indicating 0 characters preceding the feature character "23 4E 6F", that is, no characters and the following 7 characters" 55 77 09 OA 9D 34 8C” can form the corresponding feature string, and so on. In this way, the feature string can be quickly located in the file to be detected by the feature character, thereby quickly determining whether the located feature string is the feature string that is desired to be found.
  • the file to be detected is scanned according to the following steps:
  • the feature character is searched until the file to be detected is searched, and for each of the found feature characters, a corresponding feature string is constructed according to the second rule, and if the feature string is successfully constructed, The feature string position that was successfully built is taken as a valid location.
  • This method of scanning a file to be detected is suitable for finding all locations of the feature string in the file to be detected. For the case where only the location of the first feature string needs to be found, the file to be detected may be scanned according to the following steps: in the file to be detected, the feature character is searched, and if the feature character is found, according to the second The rule constructs a corresponding feature string. If the feature string is successfully constructed, the position of the successfully constructed feature string is taken as a valid position, and the search is ended.
  • the feature string is successfully constructed in the third embodiment, and is also referred to as an exact match, that is, if The character of the feature string constructed according to the second rule is identical to the character of the feature string that is the basis of the search.
  • fuzzy matching when the ratio of the character of the feature string constructed according to the second rule to the character of the feature string as the basis of the search reaches a set value, it is determined that the feature string is successfully constructed.
  • fuzzy matching it is further possible to add a more specific parameter indicating the discrete relationship of the characters in the second rule in the fuzzy matching.
  • FIG. 4 is a block diagram of an embodiment of the apparatus of the present invention, including:
  • a storage unit 41 storing a complete set of feature strings, in the storage unit 41, selecting a feature string selection unit 42 for determining a feature string set of the target file, the set including at least one feature string for determining the target file ;
  • the file scanning unit 43 is configured to search a character string in the set in a file to be detected according to a rule to obtain a search result; and, a determining unit 44 for determining whether the result satisfies the first condition, and
  • the target file determining unit 45 is configured to determine that the to-be-detected file is a target file when the result satisfies the first condition.
  • the storage unit 41 is configured to store a complete set of feature strings, and is usually stored for other auxiliary data of a certain or
  • the set of feature strings includes a feature string for determining the target file, which is the basis for finding the same or similar files.
  • the set uses a database in the form of a two-dimensional table in this embodiment to However, the database can also be replaced by other data structures, such as a linear queue.
  • a typical database structure can be referred to Figure 3.
  • the fields "number,”, “character string” and “length” are the most basic, and other fields improve the auxiliary data of the device performance described in the embodiment.
  • the determination of a feature string should normally satisfy the following conditions: uniqueness, length as short as possible, and so on.
  • the feature character string selecting unit 42 selects a feature character string for determining the target file in the storage unit 41 based on the storage unit 41, and determines the feature file type to be detected, and constitutes a feature character string set.
  • the file scanning unit 43 directly operates the scanned file, and searches for the string in the set in the file to be detected according to the rule to obtain the search result.
  • the rules described here are different for different target files, mainly considering the speed of search and the accuracy of the search, and the search strategy based on the nature of the target file being looked up.
  • the following steps are used to search for the feature string in the set in the file to be detected according to the rule: each time a feature string in the set is selected, until each feature string in the set After being selected once, for each feature string selected, the file to be detected is scanned, the effective position of the feature string in the file to be detected is obtained, and the position of all the feature strings in the file to be detected is used as the search result.
  • the determining unit 44 is configured to determine whether the search result satisfies a preset condition, and if yes, the target file determining unit 45 determines that the to-be-detected file is a target file.
  • a typical process for the file scanning unit 43 to scan a file to be detected is: reading an unselected feature string from the set of feature strings, and then scanning the file to be detected by using the feature string, that is, the file to be detected Finding whether the feature string exists; then determining whether the feature string is scanned or found, and continuing to perform subsequent scanning when it is determined that the scanned character portion includes the feature string; if it is determined that the scanned character portion is included In the case of the feature string, the result of the scan is recorded. For example, in this example, the effective position of the feature string in the file to be detected is recorded, and the effective position includes the offset of the feature string in the file to be detected, and the length. Wait.
  • the file scanning unit 43 uses the lookup for each feature string to be treated.
  • the method of detecting the valid position in a file may not be necessary, and in some cases, the efficiency of the search is lowered.
  • a second embodiment of the apparatus of the present invention provides another method. The method selects an unselected feature string in the set, scans the to-be-detected file for each feature string selected, and obtains a valid position of the feature string in the file to be detected, and accumulates The effective position obtains a cumulative result. If the accumulated result satisfies a predetermined rule, it indicates that the detected file can be determined based on the current accumulated result, so that the detection of the detected file can be ended, and the cumulative result is obtained. As a result of the search, further processing is performed.
  • the result of the search should be recorded when the search result is accumulated due to management or other such as statistical needs, etc., that is, the search result is "empty", indicating that there is no Found; of course, if there is a clear search result, it is necessary to record the positional parameter of the feature string in the file to be detected.
  • the first occurrence of the valid position may be recorded, or all of the valid positions may be recorded, depending on the actual It is desirable that the feature string in a collection has explicit positional or relational features.
  • the predetermined rule supporting the fuzzy lookup is: the cumulative result of the positional relationship of the feature string that is effectively found in the file to be detected satisfies the set sequence feature, interval Features, or both sequential and interval features.
  • the sequence features include combinations and/or permutations of feature strings.
  • the sequence characteristics and the interval characteristics are different for the determination of object files of different purposes and different natures.
  • the predetermined rule supporting the fuzzy search may also be: in the cumulative result, the sum of the found feature strings in the valid position in the file to be detected reaches a set value, that is, the found feature character. The number of valid positions of the string reaches the set value; or, in the cumulative result, the number of feature strings that are effectively found in the file to be detected reaches the set value.
  • the way in which that method is used as the determination rule is also determined according to actual needs.
  • Such rules may be different for the determination of object files of different purposes and different natures, and many such rules need to be predetermined.
  • the search result can determine whether the file to be detected is the target file only after the determining unit 44 determines whether the predetermined condition is met.
  • the condition described herein may be: in the search result, the number of feature strings that are effectively found in the file to be detected reaches a set value; or, the search result is valid in the file to be detected.
  • the positional relationship of the found feature string satisfies the set sequence feature and/or interval feature. and many more.
  • a third embodiment of the apparatus of the present invention provides an improvement in the efficiency of finding feature strings.
  • the feature character string selecting unit 42 determines the feature characters of each feature string and constructs the corresponding feature string according to the feature characters.
  • the second rule Based on the above characteristic characters and the second rule, the file to be detected can be scanned according to the following steps:
  • the feature character is searched until the file to be detected is searched, and for each of the found feature characters, a corresponding feature string is constructed according to the second rule, and if the feature string is successfully constructed, The feature string position that was successfully built is taken as a valid location.
  • This method of scanning a file to be detected is suitable for finding all locations of the feature string in the file to be detected. For the case where only the location of the first feature string needs to be found, the file to be detected may be scanned according to the following steps: in the file to be detected, the feature character is searched, and if the feature character is found, according to the second The rule constructs a corresponding feature string. If the feature string is successfully constructed, the position of the successfully constructed feature string is taken as a valid position, and the search is ended.
  • the checksum of the feature character ⁇ newly constructed according to the feature character can be calculated, and the checksum calculated in advance with the feature string as the basis of the search can be calculated.
  • a method of distinguishing between unwanted programs may employ the following features: converting instructions and/or parameters that destroy the system into feature strings, and using all of the feature strings as a set of feature strings, the set including at least a feature string used to determine an unwanted program;
  • distinguishing between harmful programs is actually judging whether a program that is not known or harmful is a document that has obtained specific content, and whether there are some characteristic strings that have some destructive effect, such as destructive instructions and / Or parameters.
  • a search can be made to have a certain degree of ambiguity, thereby increasing the accuracy of the program judgment and reducing false positives and missed judgments. For example, it is possible to scan a suspicious program entering the system through the network, and if it finds a predetermined command and/or parameter that destroys the system, or finds a behavior harmful to the system in the suspicious program, it can be determined as an unwanted program. And further take measures against it.
  • a program can be identified, and the identified features can be easily identified by forming the feature string by forming the feature string. and many more.

Abstract

L'invention concerne un procédé de détermination d'un fichier objet, procédé basé sur une unité de stockage permettant de stocker un ensemble universel de chaînes de caractères spécifiques. Le procédé consiste à : sélectionner l'ensemble universel de chaînes de caractères spécifiques afin de déterminer un fichier objet dans l'unité de stockage (11), l'ensemble universel comprenant au moins une chaîne de caractères spécifiques pour déterminer le fichier objet; selon la règle, consulter la chaîne de caractères spécifiques de l'ensemble universel dans le fichier qui attend d'être détecté et obtenir le résultat (12), et évaluer si le résultat de la consultation répond ou non à la première condition (13), et si oui, déterminer le fichier qui attend d'être détecté comme le fichier objet (14). L'invention concerne également le dispositif de détermination de fichier objet.
PCT/CN2008/070223 2007-02-14 2008-01-31 Procédé et dispositif de détermination d'un fichier objet WO2008098495A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNB2007100802322A CN100485691C (zh) 2007-02-14 2007-02-14 一种目标文件的确定方法和装置
CN200710080232.2 2007-02-14

Publications (1)

Publication Number Publication Date
WO2008098495A1 true WO2008098495A1 (fr) 2008-08-21

Family

ID=38700957

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/070223 WO2008098495A1 (fr) 2007-02-14 2008-01-31 Procédé et dispositif de détermination d'un fichier objet

Country Status (2)

Country Link
CN (1) CN100485691C (fr)
WO (1) WO2008098495A1 (fr)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100485691C (zh) * 2007-02-14 2009-05-06 白杰 一种目标文件的确定方法和装置
CN103678342B (zh) * 2012-09-07 2018-04-27 腾讯科技(深圳)有限公司 启动项识别方法及装置
CN103761243B (zh) * 2013-12-18 2017-07-07 深圳市大成天下信息技术有限公司 目标文档检测方法和设备
CN106709334A (zh) * 2015-11-17 2017-05-24 阿里巴巴集团控股有限公司 检测入侵脚本文件的方法、装置及系统
CN107645480B (zh) * 2016-07-22 2021-04-30 阿里巴巴集团控股有限公司 数据监控方法及系统、装置
CN109117649B (zh) * 2018-07-23 2022-10-14 合肥联宝信息技术有限公司 一种文件处理方法、装置及计算机可读存储介质
CN111144334B (zh) * 2019-12-27 2023-09-26 北京天融信网络安全技术有限公司 一种文件匹配方法、装置、电子设备及存储介质
CN113051569A (zh) * 2021-03-31 2021-06-29 深信服科技股份有限公司 一种病毒检测方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1151558A (zh) * 1994-11-22 1997-06-11 国际商业机器公司 信息检索方法和系统
CN1121655C (zh) * 1998-03-11 2003-09-17 英业达股份有限公司 实现不规则片语快速查找的方法
CN1204516C (zh) * 2001-02-09 2005-06-01 英业达股份有限公司 数据查寻方法
CN101013445A (zh) * 2007-02-14 2007-08-08 白杰 一种目标文件的确定方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1151558A (zh) * 1994-11-22 1997-06-11 国际商业机器公司 信息检索方法和系统
CN1121655C (zh) * 1998-03-11 2003-09-17 英业达股份有限公司 实现不规则片语快速查找的方法
CN1204516C (zh) * 2001-02-09 2005-06-01 英业达股份有限公司 数据查寻方法
CN101013445A (zh) * 2007-02-14 2007-08-08 白杰 一种目标文件的确定方法和装置

Also Published As

Publication number Publication date
CN100485691C (zh) 2009-05-06
CN101013445A (zh) 2007-08-08

Similar Documents

Publication Publication Date Title
US10891378B2 (en) Automated malware signature generation
WO2008098495A1 (fr) Procédé et dispositif de détermination d'un fichier objet
US9965630B2 (en) Method and apparatus for anti-virus scanning of file system
US8375450B1 (en) Zero day malware scanner
US9990583B2 (en) Match engine for detection of multi-pattern rules
US8312546B2 (en) Systems, apparatus, and methods for detecting malware
US5572590A (en) Discrimination of malicious changes to digital information using multiple signatures
Liu et al. A fast string-matching algorithm for network processor-based intrusion detection system
US20070152854A1 (en) Forgery detection using entropy modeling
US6952776B1 (en) Method and apparatus for increasing virus detection speed using a database
JP4855400B2 (ja) マルチパターン検索のための方法およびシステム
Roussev Hashing and data fingerprinting in digital forensics
US20070094734A1 (en) Malware mutation detector
US20040236884A1 (en) File analysis
RU2523112C1 (ru) Система и способ выбора оптимального типа антивирусной проверки при доступе к файлу
US20100077482A1 (en) Method and system for scanning electronic data for predetermined data patterns
WO2014082599A1 (fr) Dispositif de recherche, dispositif de gestion du nuage, procédé et système permettant de vérifier et de tuer les programmes malveillants
US20190114418A1 (en) System, method, and computer program product for identifying a file used to automatically launch content as unwanted
Breitinger et al. Evaluating detection error trade-offs for bytewise approximate matching algorithms
US7367056B1 (en) Countering malicious code infections to computer files that have been infected more than once
JP4050253B2 (ja) コンピュータウィルス情報収集装置、コンピュータウィルス情報収集方法、及びプログラム
JP2010182020A (ja) 不正検知装置およびプログラム
US8627099B2 (en) System, method and computer program product for removing null values during scanning
CN113127865B (zh) 一种恶意文件的修复方法、装置、电子设备及存储介质
WO2024065446A1 (fr) Procédé, appareil et système de reconnaissance de fichier dans un dispositif ot, et support de stockage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08706600

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION UNDER RULE 112(1) EPC, EPO FORM 1205A DATED 20/11/09

122 Ep: pct application non-entry in european phase

Ref document number: 08706600

Country of ref document: EP

Kind code of ref document: A1