TW201232320A

TW201232320A - Method and system for detecting malware

Info

Publication number: TW201232320A
Application number: TW100102219A
Authority: TW
Inventors: Shi-Jinn Horng; Yu-Cheng Liu
Original assignee: Univ Nat Taiwan Science Tech
Priority date: 2011-01-21
Filing date: 2011-01-21
Publication date: 2012-08-01
Also published as: TWI533156B

Abstract

The invention discloses a method and a system for detecting malware to determine whether a target program is a packed malicious program or not. Said method comprises following steps: training a support vector machine (SVM) with a plurality of preset programs to determine whether the target program is packed or not; if the target program is packed, unpacking the target program and performing a virus scanning process; if the target program is unpacked, performing the virus scanning process. Therefore, the present invention makes use of the technique with the static state analysis to extract the characteristic in the target program to determine whether the target program is malware or not.

Description

201232320 六、發明說明：【發明所屬之技術領域】本發明係與一種惡意程式偵測方法與系統有關，並且特別是與一種能靜態分析目標程式是否加殼之惡意程式偵測方法與系統有關。【先前技藝】隨著電子科技的快速發展和網路使用人口的快速增加，人們的生活形態已經與數十年前大不相同，網路慑然成為現今人們生財不可或缺的—部份。但是，網路也帶來了許多諸如：骇客人侵、垃圾郵件、釣魚網站、惡意程式…等等多種資訊安全危害。其中，惡意程式近年來的演變快速，㈣程式推陳出新的速度也越來越快，目前主要的惡意程式為間諜軟體、木馬程式以及寄生病毒。這些亞 f程式可以f電腦使用者毫不知情的情形下安裝於電月 : 中，而且惡讀式在執行任務時也根本無從查覺。否為==術，是藉由特定的特徵碼來判定是 3忑=，==過加殼處理的稽案來說，其之此外惡意程式並;需將，法辨識，軟體檢測出，並且能& 处1 、又，、要不被防毒目前的惡意程式有破壞，就算達到目的。然而，如何能夠準確分辨都是經過加殼處理，因此點。王式疋否有加殼係目前研究改良之重 201232320 【發明内容】本發明揭露-種惡意程式偵測二而能，分辨目標程式是否有加殼提目標程掃毒程序。供對應之脫殼與 s本發明提出—種惡意程式偵測方法，疋否為經過一加殼程序之亞音目標程式步驟:將複數個預設程式；:方法包含下列否經過-加殼斜；當觸目^ ^關斷目標程式是目標程式執行-脫殼程序後，ί’則對標程式縣_加錄料，_標^執程示範㈣财，射麵所叙多個預Μ 代入讀向量機進行爾之步财預叹201232320 VI. Description of the Invention: [Technical Field] The present invention relates to a malicious program detection method and system, and particularly to a malicious program detection method and system capable of statically analyzing whether a target program is packed. [Previous Skills] With the rapid development of electronic technology and the rapid increase in the population of Internet users, people's lifestyles have been very different from those of decades ago. The Internet has become an indispensable part of today's people's wealth. However, the Internet has also brought many information security hazards such as guest intrusion, spam, phishing sites, malicious programs, and so on. Among them, malware has evolved rapidly in recent years, and (4) programs are getting faster and faster. The main malicious programs are spyware, Trojans and parasitic viruses. These sub-f programs can be installed in the e-month without the knowledge of the computer user, and the bad-reading type can not be detected at all when performing tasks. If it is == surgery, it is judged by a specific feature code that it is 3忑=,== the case of over-carrying processing, in addition to the malicious program; it needs to be recognized, the software is detected, and the software is detected, and Can & Department 1, and, or be protected by the current malicious programs, even if it achieves the goal. However, how to accurately distinguish is processed by the shelling, so point. Wang Shiyi has a shelling system, the current research and improvement of the weight 201232320 [Summary of the Invention] The present invention discloses a kind of malware detection, and can distinguish whether the target program has a shell to improve the target process. For the corresponding shelling and s, the invention proposes a malware detection method, whether it is a subsonic target program step through a packer program: a plurality of preset programs; the method includes the following: When touching the target ^ ^ Shutdown target program is the target program execution - shelling program, ί 'is on the standard program county _ plus recording, _ standard ^ execution demonstration (four) Cai, the face of the introduction of multiple pre-orders into the reading Vector machine

自母-預設程式錄至少ρ個第 3 H 所述之ρ個第—特徵，排列成f ，將所擷取出的向量以—第一演算法，對應轉換成一&向面里上特徵具有一邊緣距離。弟一區域與第二區域之間此外，本發明也揭露一行目標程式而能準確分辨目標程貞測系統，其不需執之脫殼與掃毒程序。疋否有加殼，並提供對應本發明提出—m程式偵測系統’以判斷一目標程式 201232320 是否為經過一加殼輕序之亞咅程储存媒體與處理單元。館ϋ體f ί統包含早元係麵接至鱗媒體並 =個預讀式。處理多個預設程絲對柄向量機、/=以依據所述之至經過訓練後之支持向量機二二目=里式輸入殼程序。其中，當處理單元判斷式疋否經過-加序，則處理單核對目彳H 讀式係為經過加殻程掃毒程序;當處理單元程序後’會執行-處理單元會對目標程式執行掃毒y序式係未、_加殼程序，則般來έ兒’現今軟體論都會利用—⑨加 =正式或疋惡意程式，行加殼與加密的=體^關_^，以將可執行播案進篩選，更可另1式的特徵並進行所述特徵的偵測。藉此=4=來殼程式的訓練與 μ , ^ ^月將此有效偵測目標程式是否經過加 ί體之使=^工具進行脫殼’而能大大提升防毒只』年進而確保大眾使用電腦之安全性。關於本發明之優點與精神可以藉由以所附圖式得到進一步的瞭解。 i月枚及【實施方式】之亞’圖一係緣示根據本發明之一示範實施例偵測系統的功能方塊圖。如圖一所示，本發明偵測系統1包含有儲存媒體10以及處理單元「] 6 201232320 12。其中，處理單tl 12軸接至f轉雜1()並具有量機m以及特徵分析單元124。以下分別描述惡意程式= 測系統1之中的元件。谓儲存媒體10具有複數個預設程式，於實務上，所述之多個預設程式包含加殼與未加殼的程式，且加殼的程 ^的方法及/或技術麵行加殼。觀，所述 :可將，的資訊提供給支持向量機122以進行訓= 母自-個目標程式經過分析靖之後，不論目標程式是 =加殼過的程式’都可被輸人儲存媒體1G以更新支持向量機122的判斷標準。 |又付门里 m 依據所述之多個預設程式來對支持向量機 124，以判斷目目標程式輸入經過訓練後之支持向量機一 Μ目W式是否經過—加殼程序。其中，理 3 12 #_標程式係經過加殼程序後，則處理單田元η 于—脫殼程序後，會執行-掃毒程序；當過加殼程序’則處理單徵分單元12更包含—特徵分析單元122，特特徵，處:單元,2用Γ自每一預設程式擷取至少Ρ個第一列成—第序將齡出的ρ個第—特徵，排量以-第—、、二:里’接著處理單70 12會將第—特徵向肩异法對應轉換成一超平面上之一第一座桿，多個預設程式的所述之多個第-丄上&分為-第-區域與-第二區域，該第-區域 201232320 與第二區域之間係具有—邊緣距離。於實務上旦 =的=利用所述多個預設程式的特徵= ’、。練出—個超平面(H)TerPlane)，並藉由所訓練出來的超平面來區分資料分則於何種卿。 °The self-mother-preset program records at least ρ the third features described in the 3rd H, arranged as f, and converts the extracted vector into a &#; An edge distance. In addition, the present invention also discloses a target program for accurately distinguishing the target path detection system, which does not require a shelling and virus removal program.疋 No, there is a case, and the corresponding method proposed by the present invention is provided to determine whether a target program 201232320 is a sub-process storage medium and processing unit. The library body f ί system contains the early element system to the scale media and = pre-reading. The plurality of preset wire-to-stalk vector machines are processed, /= according to the trained support vector machine according to the description, the second-order input circuit program. Wherein, when the processing unit determines whether the mode is passed-and-ordered, the processing of the single-core target H-reading system is a process of adding a shell cleaning program; when the processing unit program is executed, the processing unit performs a sweep on the target program. Poison y-sequence is not, _ packer, then the general έ儿's current software theory will use -9 plus = formal or sly malware, line pack and encrypted = body ^ _ ^, to be executable The screening is carried out, and the features of the other type can be detected and the detection of the features can be performed. This ================================================================================================= Security. The advantages and spirit of the present invention can be further understood from the drawings. BRIEF DESCRIPTION OF THE DRAWINGS [Embodiment] FIG. 1 is a functional block diagram of a detection system according to an exemplary embodiment of the present invention. As shown in FIG. 1, the detection system 1 of the present invention includes a storage medium 10 and a processing unit "] 6 201232320 12. Among them, the processing unit 12 12 is connected to the f-turn 1 () and has a measuring machine m and a feature analyzing unit. 124. The following describes the components in the malicious program=measurement system 1. The storage medium 10 has a plurality of preset programs, and in practice, the plurality of preset programs include a packed and unpacked program, and The method and/or technical method of packing the package ^. The above description: the information can be provided to the support vector machine 122 for training = the parent-target program is analyzed, regardless of the target program Yes = the packaged program ' can be input to the storage medium 1G to update the judgment standard of the support vector machine 122. | and the door m is determined by the support vector machine 124 according to the plurality of preset programs described. The target program enters the trained support vector machine to see if the W-type passes through the packer. Among them, the ruler 3 12 #_ is after the packer, then the single-field η is used in the shelling program. After that, it will be executed - the anti-virus program; The sequence 'single semaphore unit 12 further includes a feature analysis unit 122, a feature, a unit, and a node 撷撷撷 Γ 每一 Γ Γ Γ Γ Γ Γ Γ Γ Ρ Ρ Ρ Ρ Ρ Ρ Ρ Ρ Ρ The first feature, the displacement is - -, , and 2: then the processing of the single 70 12 will convert the first feature to the shoulder different method into a first seatpost on a super plane, a plurality of preset programs The plurality of first-tops & is divided into - a first region and a second region, and the first region 201232320 has a - edge distance between the second region and the second region. The characteristics of multiple preset programs = ', .. practice - a hyperplane (H) TerPlane), and use the trained hyperplane to distinguish which data is classified.

詳細來說，支持向量機124想要解決以下的問題：找出-個超平面’利用超平面將兩個不同的集合分開。由於 ^際貧料可能是屬於高維度的資料，而超平面意指在高維中的平面。為了降低測試時所發生的錯誤，我們希到在支持向錢m巾祕分隔__的最大邊緣距離 (margm；)。因此’最佳的超平面的意騎是指除了將資料正確分類外，_具有最A邊緣麟。舉例來說，最佳的超平面可制Lafgangian歧與Lagmngian算子求得，於所屬技術領域具有通常知識者可自由變換上述運算方法，本發明並限於此。於實務上，係由特徵分析單元122自儲存媒體1〇中，把未加殼程式與加殼程式裡的特徵萃取出來，再轉送給支持向量機124進行訓練。其中，特徵分析單元122在將這些程式取出特徵後，再依序將取出的特徵值，排列成特徵向量的形式’支持向量機124則會將每個特徵的值，轉換成·1至1的對應值然後進行訓練，再由所訓練出來的處理單元12，做目標程式的判別。在處理單元12判斷目標程式是否經過加殼時，特徵为析早元122更會自目標程式擷取至少ρ個第二特徵，處理單元12會依序將所擷取出的Ρ個第二特徵，排列成一 201232320 第二特徵向量，接著處理單元12合一演算法，對應轉換成超平面上之二將第一特徵向量以第二座標於超平面上區屬於第― ，一座標，再判斷第 ^ °°场1或第二區域。 122會將這P個特徵徵’特徵分析單元個程式，那處理單元12就會得到°=彻意即，若今天有N 由支持向量機124將這N個特特徵向量，然後再面。，向篁去訓練出一個超平In detail, the support vector machine 124 wants to solve the problem of finding a hyperplane' to separate two different sets using a hyperplane. Since the poor material may be a high-dimensional material, the hyperplane means a plane in a high dimension. In order to reduce the errors that occurred during the test, we hoped to support the maximum marginal distance (margm;) of the __. Therefore, the 'best plane of the super plane' means that in addition to correctly classifying the data, _ has the most A edge. For example, the best hyperplane can be obtained by Lafgangian and Lagmngian operators, and those skilled in the art can freely change the above operation method, and the present invention is not limited thereto. In practice, the feature analysis unit 122 extracts the features in the unpacked program and the packer from the storage medium 1 and transfers it to the support vector machine 124 for training. The feature analyzing unit 122 sequentially extracts the extracted feature values into the form of the feature vector after extracting the features from the programs. The support vector machine 124 converts the value of each feature into a range of 1 to 1. The corresponding value is then trained, and then the trained processing unit 12 performs the discrimination of the target program. When the processing unit 12 determines whether the target program has been packed, the feature is that the early element 122 extracts at least ρ second features from the target program, and the processing unit 12 sequentially extracts the second features that are extracted. Arranged into a 201232320 second eigenvector, and then the processing unit 12 combines an algorithm, correspondingly converted into two on the hyperplane, the first eigenvector with the second coordinate on the superplane upper area belongs to the first ―, a standard, and then judges ^ °° field 1 or second area. 122 will analyze the P features' feature analysis units, and the processing unit 12 will get °=completely, if there are N today, the N feature vectors are obtained by the support vector machine 124, and then re-surfaced. Going to train a super flat

加殼於二判別為經過應的工具以進行脫殼。最後，可再m^路上找到對，’並將加殼的目標程式放入健存媒體：毒軟=貞 ,.,At ^ 疋否有將未加忒程式誤判為經加殼程The shelling is judged as the passing tool for shelling. Finally, you can find the pair on the m^, and put the target program into the health media: poisonous soft = 贞, ., At ^ 疋 No misrepresentation of the untweeted program

If 存媒體1(3的更新’否則把未加殼的加设程式放人儲存媒體1G卜將會影響惡意程式债測系 ^的偵測效能。另一方面，若目標程式未加殼，則處理單兀12可直接將其送到防毒軟體去做偵測。以下將搭配本發明之惡意程式偵測方法來加以說明。請一併參閱圖一與圖二，圖二係繪示根據本發明之一示範實施例之惡意程式偵測方法的流程圖。如圖所示，於步驟S20中’處理單元12會將儲存於儲存媒體10的複數個預設程式’代入至支持向量機124中以進行訓練。在此，支持向量機124實際上的訓練過程，係屬於所屬技術領域具有通常知識者都能明瞭之内容，本發明在此不予贅述。 201232320 ，考，於步驟SU巾’支持向量機m在訓練完成之後，處理單元12會將目標程式輸入訓練後之支持向量機内’以判斷目標程式是碰過-加殼程序。其中，當處凡12判斷目標程式係經過加殼程序，則將如步驟所述’處理單元12可對目標料執行脫殼鱗，接如步驟S26中所述，執行一掃毒程序以進行目標程式之掃另-方面，當處理單元12判斷目標程式係未經過加设程序，則將會直接進行步驟S26，而由處理單元12 目標程式執行該掃毒程序。實務上，在偵測加殼程式的特徵之過程中，特徵分析單元122可從槽案標頭（File Header)、選擇性標頭 (Optional Header)以及段落標頭(Section Header)中，:選出複數類特徵，並從所述多類特徵中，再進行特徵選取。' 舉例來說，特徵分析單元122可用庫貝克—李柏散度 (Kullback-Leibler divergence)進行特徵的篩選，再由支持向量機124來進行訓練以及偵測。綜上所述，近年來惡意程式因為演變與傳播速度太快而被重視，各大防毒軟體廠商的偵測技巧上雖略有不同，但是大部分還是依賴特徵碼比對(Pattern Match)的技術，不過特徵碼比對無法在沒有特徵碼的情況下，比對出新型或經過加殼程序的惡意程式，本發明之惡意程式偵測方法與系統能有效偵測目標程式是否經過加殼，並知道其可使用何種工具進行脫设，而在藉由搭配現有的防毒軟體下，將可大大提升防毒軟體之偵測率，以確保大眾使用電腦之安全 201232320 藉由以上她具财軸之詳述 ===!與精神，而並非以上仏= 體貫_來對本發社範.加以限制。相反地，/佳，希望能涵蓋各觀變及具相等_安排縣是之專利範圍的範轉内。。月圖式簡單說明】If save media 1 (3 update 'otherwise, putting the unpacked add-on program on the storage medium 1G will affect the detection performance of the malicious program debt detection system. On the other hand, if the target program is not packed, then The processing unit 12 can be directly sent to the anti-virus software for detection. The following will be described in conjunction with the malware detection method of the present invention. Please refer to FIG. 1 and FIG. 2 together, and FIG. 2 is a diagram illustrating the present invention. A flowchart of a malware detection method in an exemplary embodiment. As shown, in step S20, the processing unit 12 substitutes a plurality of preset programs stored in the storage medium 10 into the support vector machine 124. In this case, the actual training process of the support vector machine 124 is known to those of ordinary skill in the art, and the present invention will not be described herein. 201232320, test, in the step SU towel support vector After the training is completed, the processing unit 12 inputs the target program into the trained support vector machine to determine that the target program is a touch-packing program. The shell program, as described in the step, 'the processing unit 12 can perform the shelling scale on the target material, and as described in step S26, executing a virus-sweeping program to perform the sweeping of the target program, when the processing unit 12 determines the target If the program is not subjected to the add-on program, step S26 is directly performed, and the virus-sweeping program is executed by the processing unit 12 target program. In practice, in the process of detecting the features of the packer, the feature analyzing unit 122 can In the File Header, the Optional Header, and the Section Header, select a complex class feature and select features from the plurality of features. The feature analysis unit 122 can use the Kullback-Leibler divergence to perform feature selection, and then the support vector machine 124 performs training and detection. In summary, in recent years, malicious programs have evolved and The speed of transmission is too fast and is valued. Although the detection techniques of major anti-virus software vendors are slightly different, most of them rely on the technology of Pattern Match. The code detection comparison can not compare the malicious program with the new type or the packaged program without the signature code. The malicious program detection method and system of the present invention can effectively detect whether the target program has been packed and knows its Which tool can be used for detachment, and by using the existing anti-virus software, the detection rate of the anti-virus software can be greatly improved to ensure the safety of the public using the computer 201232320. ==! With the spirit, and not the above 仏 = 体 _ to limit the scope of this hair salon. Conversely, / good, hoping to cover the various changes and equal _ arrangement of the county is the scope of the patent. . Simple illustration of the month

圖一係繪示根據本發明之一示範實施例之惡音測系統的功能方塊圖。程式偵圖二係繪示根據本發明之一示範實施例之惡意程式偵測方法的流程圖。BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a functional block diagram of an acoustic sound measurement system in accordance with an exemplary embodiment of the present invention. Program Diagram 2 is a flow chart showing a malware detection method in accordance with an exemplary embodiment of the present invention.

【主要元件符號說明】 1 .惡思程式债測系統 12 :處理單元 124 :支持向量機 S20〜S26 :流程步驟 10 :儲存媒體 122 :特徵分析單元[Description of main component symbols] 1. Deceptive debt testing system 12: Processing unit 124: Support vector machine S20~S26: Flow step 10: Storage medium 122: Feature analysis unit

Claims

201232320 VII. The scope of application for patents: • Γϋϊ ϊϋ ϊϋ Γ Γ — — — — — — — 目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标目标Explain whether the target program has passed a packer program; the machine is judged to be the order of the target program, and then executed - the shelling program is executed again - the virus is detected; The target program executes the virus removal program without going through the stomach. The force 4 sequence' is the target program detection method, wherein in the step of training, the second step includes, for example, the default program described in claim 1 of the patent scope is substituted into the support vector machine. The following steps: extracting at least ρ first features from each of the preset programs; and selecting, by the f, the first first features that are extracted by f, and arranging into a first quantity;

Converting the first feature vector to a first coordinate of the first one; and converting the algorithm correspondingly into a super-plane seat ^, in the upper area of the δHai super plane, the first area and the second corresponding to the preset The first division of the program has an edge distance between a first area and a second area. 3. The method for detecting malware according to item 2 of the patent application, wherein the target program passes through the step of the packer, and further comprises the following steps: extracting at least one second feature from the target program; The second feature extracted from the sequence is arranged into a second feature to [S] 12 201232320; the second feature vector is converted into a second coordinate on the hyperplane by the first algorithm And judging that the second coordinate on the hyperplane belongs to the first region or the first region ^4 〇4, as described in claim 1, wherein the target program is determined When the packer is passed, the type of the packer is further determined and the corresponding shelling procedure is provided. A malicious program detection system for determining whether a target program is a malicious program that passes through a packer program, the malicious program detection system comprising: a storage medium having a plurality of preset programs; and a a processing unit coupled to the storage medium and having a support vector machine, training the support vector machine according to the preset programs, and inputting the target program into the trained support vector machine to determine Whether the target program passes through a packer program; wherein, when the processing unit determines that the target program passes the shelling process, after the processing unit executes a shelling program on the target program, executing a virus removal program; The unit determines that the target program has not passed the packer, and the processing unit executes the virus sweep program for the target program. 6. The malware detection system of claim 5, wherein the processing unit further comprises a feature analysis unit, wherein the feature analysis unit extracts at least p first features from each of the preset programs. The processing unit sequentially arranges the extracted P first features into a first feature vector, and then the processing unit converts the first feature vector into a hyperplane corresponding to a first algorithm. a first coordinate, and the first m 13 201232320 coordinates corresponding to the preset programs are divided on the hyperplane as a first region and a first region and the second region have an edge distance . The seventh, as described in claim 6 of the patent program, the analysis unit extracts at least p second feature units from the target program and sequentially extracts the p first The second feature is arranged into an eigenvector, and then the processing unit converts the second feature vector into a second coordinate on the hyperplane by the first algorithm, and then determines the second coordinate on the hyperplane. The upper system belongs to the first area or the second area. 8. The method for detecting malware according to claim 5, wherein the processing unit determines the type of the packer when determining that the target program passes the packer, and provides a corresponding Shelling procedure. 14