WO2019027106A1

WO2019027106A1 - System for analyzing degree of risk for malicious code distribution site by using machine learning

Info

Publication number: WO2019027106A1
Application number: PCT/KR2017/014233
Authority: WO
Inventors: 이대호; 최성수; 신경아; 박승필; 이형; 진세민; 이준호
Original assignee: 주식회사 에프원시큐리티
Priority date: 2017-08-01
Filing date: 2017-12-06
Publication date: 2019-02-07
Also published as: KR101809159B1

Abstract

The present invention detects a malicious code distribution site and analyzes whether malicious code distribution is actually activated in a website discovered to be the distribution site, so as to calculate a degree of risk, and uses, as variables, repeatability, up-to-dateness, malicious code activation/inactivation, and the like of the malicious code distribution site, so as to continuously perform learning on the basis of the degree of risk, thereby enabling analysis of the degree of risk to be accurate.

Description

Risk Analysis System for Malicious Code Utilization Using Machine Learning

The present invention relates to a risk analysis system for malicious code, and more particularly, to a risk analysis system for malicious code by analyzing a malicious code in a web site And it is periodically checked whether the code that distributes malicious program is inserted into the website and detected before the malicious code is distributed and informed to the system administrator or the control system so that malicious code And a risk analysis system.

With the development of networks and computers, malicious codes are also showing explosive growth. Along with the emergence of new malicious codes, variants using existing malicious codes also play a large part.

Malicious code refers to software designed to infiltrate, install, damage the system or network, and illegally obtain information from the computer system without the user's knowledge. In order to cope with such malicious code threats, currently various malicious code analysis and detection researches are actively carried out, but it is a reality that there are many limitations to cope with malicious codes that become more intelligent and sophisticated day by day.

The methods of analyzing malicious code can be roughly divided into static analysis and dynamic analysis. Static analysis is a method of analyzing without executing malicious code. Binary pattern matching, data flow and code flow analysis are typical static analysis techniques. The static analysis technique has the merit that safe and quick analysis is easy because it excludes the execution of malicious code, but it has a disadvantage that accurate analysis is not easy.

A new type of analytical approach proposed to overcome the drawbacks of static analysis is dynamic analysis. Dynamic analysis is a technique to analyze malicious code by operating in a controllable environment such as a virtual machine. It has an advantage that accurate actual behavior can be seen regardless of code obfuscation such as execution compression. The dynamic analysis technique has a disadvantage that it takes a lot of time to observe the contamination possibility and behavior of the experimental environment according to actual malicious code execution.

In the case of such a conventional technology, there is a problem that hackers can not detect and analyze new techniques such as hacking a web site and planting and distributing malicious code. Internet users are infected with malicious code that is embedded on a web site. Therefore, if a malicious code occurs on a website or a homepage, the image of the company may be lost, the number of connected customers may decrease, Follow.

Therefore, a first object of the present invention to solve such a problem is to insert malicious code distribution code into a web site, thereby detecting malicious code in a target web server, And to provide a code analysis system for risk analysis.

The second object of the present invention is to provide a risk analysis system for malicious code that can periodically check whether a code for distributing a malicious program is inserted into a web site and detect it before a malicious code is distributed to a system administrator or a control system .

The third object of the present invention is to detect the malicious code bubble and analyze the risk of analyzing whether the actual malicious code distribution is activated on the web site found as the bubble, and based on this, the repeatability of the malicious code bubble, And the malicious code is activated, the risk probability is derived, and by learning it continuously, it is possible to provide a risk analysis system of malicious code which can bring accuracy to the risk analysis.

In order to achieve the above first to third objects, the present invention provides a malicious code distribution risk analysis system, comprising: a web crawling system for generating a copy of a link page up to a predetermined depth of a URL accessed by a user, A malicious code distribution pattern analyzing unit for analyzing a malicious code distribution pattern using HTML parsed by the HTML parsing unit; a malicious code distribution pattern analyzing unit for analyzing the malicious code distribution pattern analyzed by the malicious code distribution pattern analysis unit; A malicious code DB detection unit for detecting malicious code distribution URLs using the malicious code distribution URL, a malicious code DB unit for storing contents of the malicious code distribution URLs, and a malicious code distribution URL, And a malicious code URL access blocking unit that blocks the malicious code.

The risk analysis system for malicious code may further include a script obfuscation processing unit for decoding the malicious code distribution pattern to analyze the malicious code distribution pattern.

The risk analysis system for malicious code may further include a DOM and a BOM generator for generating a DOM or a BOM using HTML parsed by the HTML parser.

The risk analysis system for malicious code may further include a script engine unit for generating a scenario script according to a copy generated by the web crawler and performing an operation according to the generated script.

The risk analysis system for malicious code may further include a risk analysis unit for analyzing the risk of malicious code using at least one of a malicious code distribution pattern and a malicious code distribution URL.

The risk analysis unit includes an RNN learning module that uses a recurrent neural network (RNN) to derive a risk probability according to variables including the latestness, repeatability, and malicious code activation of a malicious code bubble .

The malicious code DB unit may store the risk probability derived by the RNN learning module and update the content of the malicious code distribution URL by reflecting the risk probability.

According to the risk analysis system for malicious code using the machine learning of the present invention described above, by inserting a code for spreading malicious code on a web site, it is possible to detect beforehand that the target web server is abused as a malicious code / Can respond.

It also periodically inspects websites for malicious program distribution code, which can be detected and reported to the system administrator or the control system before malicious code is distributed.

In addition, the malicious code is detected, and the web site that is detected as the e-mail is analyzed to determine whether the actual distribution of malicious code is active. Based on the result, the repeatability of malicious code, , The risk probability is derived, and by learning it continuously, the accuracy of the risk analysis can be obtained.

In addition, it can prevent damage to corporate website and website users and improve corporate reliability and customer satisfaction.

In addition, synergy effect can be expected in connection with security control service, and it is effective to increase customer trust and direct / indirect profit through secure internet environment and stable web service provision of company / public institution.

FIG. 1 is a diagram showing a schematic configuration of a risk analysis system for malicious code, which is an embodiment of the present invention.

2 is a diagram showing an example of codes before and after decoding by the script obfuscation processing unit, which is a constitution of the present invention.

3 is a diagram showing a schematic configuration of a risk analysis unit, which is an embodiment of the present invention.

4 is a diagram schematically showing a learning method of an RNN learning module which is a constitution of the present invention.

FIG. 5 is a diagram illustrating a result of a risk probability derivation according to a learning method of an RNN learning module, which is one configuration of the present invention.

FIG. 6 is a block diagram specifically illustrating a malicious code distribution risk analysis system according to another embodiment of the present invention.

7 is a block diagram illustrating a procedure for distributing a malicious code on a web site and a process for monitoring malicious code distribution.

8 is a configuration diagram specifically showing the monitoring server shown in FIG.

FIG. 9 is a diagram illustrating a method of analyzing malicious code and malicious code distribution patterns of the MC spread pattern analyzing engine and MC breaker shown in FIG.

FIG. 10 is a flowchart illustrating an operation of analyzing HTML content of the HTML content analysis engine shown in FIGS. 8 and 9. FIG.

11 is a flowchart illustrating a malicious program distribution pattern detection operation of the MC distribution pattern analysis engine shown in FIG. 8 and FIG. 9. FIG.

12 is a diagram showing an example of detection of a malicious code distribution pattern and a malicious code distribution pattern.

FIG. 13 is a flowchart showing a malicious code decoding and detection operation of the MC decoding processing engine shown in FIGS. 8 and 9. FIG.

14 is a diagram illustrating a malicious code and a malicious code decoding process and an example of detection.

It is to be understood that the words or words used in the present specification and claims are not to be construed in a conventional or dictionary sense and that the inventor can properly define the concept of a term in order to best describe the user's invention And should be construed in light of the meanings and concepts consistent with the technical idea of the present invention.

Throughout the specification, when an element is referred to as " comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise. In addition, the term " "... "," ... Unit, "" module, "" device, "and the like refer to a unit that processes at least one function or operation, which may be implemented as a combination of hardware and / or software.

The terms used in the embodiments of the present invention will be briefly described, and these embodiments will be described in detail.

Although the terms used in the embodiments of the present invention have been selected in consideration of the functions of the present invention, the present invention is not limited thereto and can be varied depending on the intention or the precedent of the artisan skilled in the art, . Also, in certain cases, some terms are arbitrarily selected by the applicant, and in this case, the meaning thereof will be described in detail in the description of the corresponding embodiments. Therefore, the terms used in the embodiments should be defined based on the meaning of the terms, not on the names of simple terms, and on the contents of the embodiments throughout.

In an embodiment of the present invention, terms including ordinal numbers such as first, second, etc. may be used to describe various elements, but the elements are not limited to these terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

Further, in the embodiments of the present invention, the singular expressions include plural expressions unless the context clearly indicates otherwise.

Furthermore, in the embodiments of the present invention, terms such as " comprises " or " having ", etc. are intended to specify the presence of stated features, integers, steps, operations, elements, parts, or combinations thereof, Steps, operations, elements, components, or combinations of elements, numbers, steps, operations, components, parts, or combinations thereof.

Also, in the embodiments of the present invention, 'module' or 'sub' performs at least one function or operation, and may be implemented in hardware or software, or a combination of hardware and software. In addition, a plurality of 'modules' or a plurality of 'parts' may be integrated into at least one module except for 'module' or 'module' which needs to be implemented by specific hardware, and may be implemented by at least one processor.

Further, in the embodiment of the present invention, when a part is referred to as being " connected " with another part, it is not limited to a case where it is " directly connected " And the like.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

FIG. 1 is a diagram showing a schematic configuration of a risk analysis system for malicious code 10, which is an embodiment of the present invention, and FIG. 2 is a diagram showing an example of codes before and after decoding by a script obfuscation processing unit, .

1, the risk analysis system 10 includes a web crawling unit 100, an HTML parsing unit 200, a malicious code distribution pattern analyzing unit 300, a malicious code distribution URL detecting unit 400, The malicious code DB unit 500, the malicious code URL access blocking unit 600, the script obfuscation processing unit 700, the DOM and BOM generation unit 800, the script engine unit 900 and the risk analysis unit 1000 can do.

The web crawling unit 100 can generate a copy of the link page up to a predetermined depth of the URL the user has accessed and index the generated copy.

The web crawler 100 accesses a web site and a homepage operated by a web providing server (not shown) and uses contents provided from a web site and a homepage up to a predetermined depth, (E.g., script code, etc.).

When the risk probability of a specific URL is equal to or greater than a predetermined threshold value, the web crawler 100 generates a copy of all link pages including a predetermined depth as well as sub links of a specific URL, can do.

That is, when the risk probability of a specific URL is equal to or greater than a predetermined threshold value, the web crawler unit 100 sets a predetermined depth to the full depth, generates a copy of all the link pages of the specific URL, Indexing can be performed.

When the risk probability of a specific URL is equal to or greater than a predetermined threshold value, the web crawler 100 accesses a web site operated by a web providing server (not shown) and a homepage and uses all the contents provided by the entire web site and the homepage A distribution code and an execution code (for example, script code, etc.) generated when content is provided can be provided.

That is, when the risk probability of a specific URL is equal to or greater than a predetermined threshold value, the web crawler 100 notifies the entire web site (not the predetermined depth) apart from the malicious code URL blocking unit 600, The malicious code distribution pattern analyzing unit 300 analyzes malicious code distribution patterns based on the distribution codes and execution codes and analyzes the distribution patterns of malicious codes more accurately .

In addition, by detecting malicious code distribution URLs, malicious code distribution can be detected more widely.

The HTML parsing unit 200 may parse the HTML of the generated copy.

More specifically, the HTML parsing unit 200 receives the Http / Https URL of the HTML content generated according to the operation of the script engine unit 900, and sequentially parses and analyzes the HTML content.

The script obfuscation processing unit 700 may decode the malicious code distribution pattern in order to analyze the malicious code distribution pattern.

More specifically, the script obfuscation processing unit 700 sequentially loads the URLs sequentially generated according to the operation of the script engine unit 900, the scripts of the voiced code generated when the HTML contents are provided and the malicious code in real time, .

The HTML parsing unit 200 sequentially parses and analyzes the URL, the script of the distributed code, and the script of the malicious code, and the DOM and BOM generation unit 800 generates the DOM and the BOM by using the HTML parsed by the HTML parsing unit 200 Can be used to generate DOM or BOM.

The script obfuscation processing unit 700 compares the script of the DOM or the BOM with the decoding information of the dropper, the keylog, the data outflow file, the malicious program distribution script, and the malicious code script stored in the malicious code DB unit 500 And can detect the dropper, the keylog, the data leakage file, the malicious code, the malicious code access URL, and the malicious program distributed code by decoding in real time.

In addition, the script obfuscation processing unit 700 decodes at least one URL or codes of a URL generated sequentially by the script engine unit 900, a distributed code generated when providing HTML contents, and a script code in real time.

Then, the malicious URL access blocking unit 600 may block the URL generated according to the detection result, the inflow code generated when the HTML content is provided, and the script code inflow and connection.

The malicious code distribution pattern analyzing unit 300 can analyze the malicious code distribution pattern using the HTML parsed by the HTML parsing unit 200.

That is, the malicious code distribution pattern analyzing unit 300 can analyze the malicious code distribution pattern using the decoded HTML after the script obfuscation processing unit 700 decodes the HTML parsed by the HTML parsing unit 200 .

In addition, the malicious code distribution pattern analyzing unit 300 analyzes the contents provided on the web site and the homepage, the distribution codes and execution codes generated when the contents are provided, in real time, and stores the contents of the droppers, the keyboard, the data leakage file, A malicious code access URL, a malicious program distribution code, and a malicious program distribution pattern.

More specifically, the malicious code distribution pattern analyzing unit 300 analyzes execution codes such as a URL sequentially generated according to an operation of the script engine unit 900, a distributed code generated when content is provided, and a script code to the malicious code DB unit 500, the pattern information used for distributing the malicious code, and the attribute value of at least one of src, width, and height to check whether there is a malicious code distribution pattern.

The malicious code distribution URL detection unit 400 can detect a malicious code distribution URL using the malicious code distribution pattern analyzed by the malicious code distribution pattern analysis unit 300.

More specifically, the malicious code distributed URL detection unit 400 detects malicious code based on the URL, the malicious code access URL generated when the HTML content is provided, the malicious program distribution URL, Code and malware spread patterns.

In addition, the malicious code distribution pattern analysis unit 300 may use the decoding result of the script obfuscation processing unit 700 to generate an inline frame (iframe) tag, an inline frame ) Tag, an embedded tag, an object tag, a link tag, a script tag, and a JavaScript tag.

In addition, the malicious code distribution pattern analysis unit 300 refers to and compares at least one attribute value among src, width, and height in the HTML content to check for and detect the presence or absence of a pattern used for malicious code distribution.

In addition, the malicious code DB unit 500 may store the content of the malicious code distributed URL.

The content of the malicious code distribution URL stored in the malicious code DB unit 500 may include the physical location of the server, the user, the type of business information, and the like.

The malicious URL access blocking unit 600 may block access to the corresponding URL of the user terminal (not shown) when the specific URL is determined to be a malicious code distribution URL.

In addition, the malicious URL access blocking unit 600 can transmit, through wireless communication, that a specific URL is determined to be a malicious code distribution URL by an administrator terminal (not shown).

This allows administrators to know in advance whether or not malicious code that distributes malicious programs has been injected into a Web site, thereby allowing the malicious code to take precautions before a problem occurs.

Here, a user terminal (not shown) and an administrator terminal (not shown) may be implemented by various electronic devices such as a smart phone, a smart watch, a desktop PC, a tablet PC, a notebook PC and the like.

More specifically, the malicious code URL access interception unit 600 intercepts the malicious code distribution pattern analysis result of the malicious code distribution pattern analysis unit 300 and the detection result of the malicious program distribution code of the malicious code distribution URL detection unit 400 You can block URLs and malicious code access URLs accordingly.

The script engine unit 900 may generate the scenario script according to the copy generated by the web crawler unit 100 and may perform an operation according to the generated script.

The script engine unit 900 can also use the distributed code and the executable code (for example, script code) provided from the web providing server (not shown) by the web crawler unit 100, Can be sequentially performed.

The risk analysis unit 1000 can analyze the risk of malicious code using at least one of a malicious code distribution pattern and a malicious code distribution URL.

Also, the risk analysis unit 1000 can use the Recurrent Neural Network (RNN) to derive a risk probability according to variables including the latestness, repeatability, and malicious code activation of malicious code have.

The malicious code DB unit 500 may store the risk probability derived by the risk analysis unit 1000 and update the content of the malicious code distribution URL by reflecting the risk probability.

If the risk probability is equal to or greater than the predetermined threshold, the risk analysis unit 1000 controls the web crawler 100 to perform a copy creation and indexing of the link page with respect to the IP band, the host, and the similar industry related to the malicious code distribution URL .

If the risk probability of a specific URL is equal to or greater than a predetermined threshold value, the risk analysis unit 1000 can derive IP information for a similar domain, and the malicious code URL access blocking unit 600 can block access to a specific URL .

FIG. 3 is a diagram showing a schematic configuration of a risk analysis unit, which is a constitution of the present invention. FIG. 4 is a diagram schematically showing a learning method of an RNN learning module, which is a constitution of the present invention. FIG. 4 is a diagram showing a result of a risk probability derivation according to a learning method of an RNN learning module as a constituent; FIG.

Referring to FIG. 3, the risk analysis unit may include an RNN learning module 1010 and a spreadsheet risk classification module 1020.

The RNN learning module 1010 can derive a risk probability according to a variable including the latestness, repeatability, and malicious code activation of a malicious code bubble using a recurrent neural network (RNN).

Here, the latest malware epidemic is to determine whether malicious code has been distributed for a predetermined period based on the current time, and the repeatability of the malicious code spool means the number of times that malicious code is distributed, Whether the actual malicious program is downloaded (file verification by comparing the hash value) or whether the malicious program is executed or not is determined.

Referring to FIG. 4, the RNN learning module 1010 inputs to the input layer the variables (X1, X2, and X3) of the above malicious code, the repeatability and the malicious code activation The output values of y1, y2, and y3 in the output layer for a specific URL can be derived using a predetermined algorithm or pre-input function.

Referring to FIG. 5, the RNN learning module 1010 can derive a risk probability using the derived output value, and determine whether the risk probability is higher or lower than the predetermined threshold Or low.

The spreadsheet risk classification module 1020 can classify a particular URL as high risk or low risk according to the risk determination result of the RNN learning module 1010. [

The malicious code DB unit 500 may control the malicious code DB unit 500 to store the risk probability derived from the malicious code DB unit 500 and the malicious code DB unit 500 may reflect the risk probability You can control to update the contents of malicious code distribution URLs.

When the risk probability is equal to or greater than a preset threshold value, the risk classification module 1020 controls the web crawler 100 to perform a copy creation and indexing of the link page with respect to the IP band, the host, and the similar industry related to the malicious code distribution URL can do.

In addition, when the risk probability of a specific URL is equal to or greater than a predetermined threshold value, the malware risk classification module 1020 can derive IP information for a similar domain, and the malicious code URL access blocking unit 600 can prevent access to a specific URL Can be controlled.

Since the risk can be calculated using machine learning through the above-described risk analysis unit 1000, the accuracy and efficiency of the risk calculation are improved and the malicious code There is an effect that the seeking rate can be remarkably lowered.

In addition, while the existing technology is manually updated in the process of calculating the risk of the risk analysis unit 1000, the risk analysis unit 1000 has an advantage that it can automatically update using the machine learning.

FIG. 6 is a block diagram specifically illustrating a malicious code distribution risk analysis system according to another embodiment of the present invention. 7 is a block diagram illustrating a process for distributing a malicious code on a web site and a process for monitoring malicious code distribution.

Referring to FIG. 6, the risk analysis system for malicious code according to the present invention includes a web providing server 1200, a web providing server 1200 for providing a web site and a homepage service to a communication terminal 1100 of an Internet user, A malicious code access URL, a malicious program distribution code, and a malicious program distribution pattern distributed from the zombie server 1400 or the hacker server 1500 are monitored in real time by monitoring a website and a homepage provided by the malicious program, A monitoring server 1300 for blocking a malicious code, a malicious code access URL, and a malicious program distribution code and generating a notification message.

The web providing server 1200 may be an Internet hosting engine or a hosting server operated by a company or an individual, and distributes a web site and a homepage address through the Internet, and operates a web site and a homepage. When an Internet user accesses a website or a homepage through a communication terminal 1100 such as a notebook computer, a tablet mobile communication device, or a PC, a web site and a homepage service are provided.

The communication terminal 1100 may be a notebook, a smart phone, a tablet PC, a tablet mobile communication device, a personal PC, etc. A customer or an Internet user using the Internet may form a network with the communication terminal 1100, You will access the homepage and use it.

However, hackers who distribute malicious code may use a zombie server (1400), a hacker server (1500) or a PC to embed a dropper on a website or a homepage, It embeds patterns of spreading codes and malicious codes of code or malicious programs. Here, the malicious code or malicious code distribution pattern of the malicious program and the malicious code distribution pattern may be included as the i-frame tag, the embed, the object tag, the link tag, the script tag, and the JavaScript tag.

As shown in FIG. 7, when a web site or a homepage provided by the web providing server 1200 is accessed through a communication terminal 1100 such as a notebook computer, a tablet mobile communication device, or a PC, the malicious behavior For example, key logging, data leakage). In this state, the communication terminal 1100 downloads the malicious code by an instruction to download the information, and the downloaded file is executed to periodically transmit the personal information and various information stored in the communication terminal 1100 It is exposed to malicious acts such as leakage.

The monitoring server 1300 real-time monitors various contents, dubbed codes, execution codes, and the like provided by the web providing server 1200 while being connected to the web sites and homepages provided by the web providing server 1200 in real time. At this time, the monitoring server 1300 detects and analyzes various contents, a distributed code, an execution code, and the like provided by the web providing server 1200 and transmits the detected contents, the drop codes, and the execution codes distributed from the zombie server 1400 or the hacker server 1500 , Data leakage file, malicious code, malicious code access URL, malicious program distribution code, and malicious program distribution pattern in real time. Then, the web server 1200 and the zombie server 200 generate a notification message by blocking the dropper, the keylog, the data leakage file, the malicious code, the malicious code access URL, the malicious program distribution code and the malicious program distribution pattern according to the detection result, (1400) and the hacker server (1500).

8 is a configuration diagram specifically showing the monitoring server shown in FIG. FIG. 9 is a diagram illustrating a malicious code and malicious code distribution pattern analysis method of the MC spread pattern analyzing engine and the MC breaker shown in FIG. 8 in detail.

Referring to FIG. 8, the monitoring server 1300 accesses a web site operated by the web providing server 1200 and a homepage, uses contents provided from a web site and a homepage, and transmits a distributed code and an execution code (For example, script code, etc.), the web site and the contents provided on the web site and the homepage, and the distributed code and the execution code generated when the contents are provided are analyzed in real time and the zombie server 1400 and the hacker An MC analysis unit 1320 for detecting a dropper, a keylog, a data leakage file, a malicious code, a malicious code access URL, a malicious program distribution code and a malicious program distribution pattern by the server 1500; URL, a malicious program distribution code, and a malicious program distribution pattern and outputs the dropper, the keyword, and the malicious program according to the detection result of the MC analysis unit 1320, The web server 1200, the zombie server 1400, and the hacker server 1500 generate a notification message and block the malicious program access URL, malicious code, malicious code access URL, malicious program distribution code, The malicious program distribution pattern information, the malicious code access URL information, the access information of the communication terminal 1100, the communication terminal 1100, the malicious program distribution code, And a database unit 1340 for storing and updating the user information of the web server 1200 and the operator information for operating the web providing server 1200 and updating and sharing the information with the MC analyzing unit 1320 and the MC blocking unit 1330 in real time.

The web agent 1310 accesses a web site and a homepage operated by the web providing server 1200 in real time to maintain a real-time connection state. Various behaviors that can be used on websites and homepages, such as by sequentially using various contents provided on a website and a homepage, receiving distributed codes and execution codes (for example, script codes, etc.) Are sequentially executed.

As shown in FIG. 9, the MC analyzer 1320 analyzes and analyzes the contents provided on the web site and the homepage according to the execution operation of the web agent 1310, Detects a dropper, a keylog, a data leakage file, a malicious code, a malicious code access URL, a malicious program distribution code, and a malicious program distribution pattern by the zombie server 1400 or the hacker server 1500. To this end, the MC analysis unit 1320 includes an HTML content analysis engine 1321 for sequentially detecting and analyzing HTML content to detect a malicious program distribution code using HTML, And an MC spread pattern analyzing engine 1322 for detecting a malicious program distribution pattern used when a malicious program is distributed.

Referring to FIG. 10, the HTML content analysis engine 1321 receives the Http / Https URL of the HTML content through the web agent 1310 and sequentially parses and analyzes the HTML content to construct the HTML DOM. And, it analyzes and analyzes Http / Https URL and existence of HTML contents and Http / Https URL, and detects and analyzes malicious program spread code using HTML to be included in Http / Https URL.

11 is a flowchart illustrating a malicious program distribution pattern detection operation of the MC distribution pattern analysis engine shown in FIG. 8 and FIG. 9. FIG. 12 is a diagram showing an example of detection of a malicious code distribution pattern and a malicious code distribution pattern.

Referring to FIGS. 11 and 12, the MC distribution pattern analysis engine 1322 receives HTML contents through the web agent 1310 and generates URLs and HTML contents sequentially generated according to the operation of the web agent 1310 And decodes at least one URL or codes of the distributed code and the script code in real time. Then, at least one of an inline frame (iframe) tag, an inline frame tag, an embedded tag, an object tag, a link tag, a script tag, and a JavaScript tag used for distributing malicious code is detected do. Next, at least one attribute value of src, width, height is referenced and compared in the HTML contents to check whether there is a pattern used for distributing malicious code or not.

As shown in FIG. 12, HTML contents or HTML codes used for distributing malicious codes are included in an execution code such as a URL, a distributed code generated in providing contents, and a script code in an encoded state. Accordingly, the MC dissemination pattern analysis engine 1322 decodes in real time the execution codes such as a URL generated sequentially in accordance with the operation of the web agent 1310, a distributed code generated in providing contents, and a script code, height of the malicious code can be checked by referring to at least one attribute value among the patterns.

Meanwhile, the MC dissemination pattern analysis engine 1322 decodes the execution code such as the URL, the dubbing code and the script code generated in providing the content in real time, and then analyzes the malicious program distribution pattern information provided from the database unit 1340 And malicious code access URL information, respectively, to check whether there is a pattern used for distributing malicious code.

The MC interception unit 1330 decodes the malicious code, the malicious code access URL, the malicious program distribution code, and the malicious program distribution pattern and outputs the dropper, the keylog, the data outflow file, and the malicious program distribution pattern according to the detection result of the MC analysis unit 1320, Block malicious code, malicious code access URL, malicious program distribution code, and malicious program distribution pattern. Then, a notification message is generated and transmitted to the web providing server 1200, the zombie server 1400, and the hacker server 1500. To this end, the MC interception unit 1330 decodes in real time the URLs generated in sequence according to the operation of the web agent 1310, the distributed codes generated in providing the HTML contents, and the script codes to generate droppers, An MC decoding processing engine 1331 for detecting a malicious code, a malicious code access URL, and a malicious program distribution code, respectively, and for blocking a URL generated according to the detection result, The malicious code access URL generated when the HTML content is provided, the malicious program distribution code and the malicious program distribution pattern are blocked according to the result of the malicious code distribution pattern check of the distribution pattern analysis engine 1322, An MC distribution pattern processing engine 1332 that transmits the malicious code to the malicious code distribution server 1200, the zombie server 1400, and the hacker server 1500; The URL and the malicious code access URL are blocked according to the result of checking the existence of the pattern and the malicious program distribution code detection result, and a notification message is generated and transmitted to the web providing server 1200, the zombie server 1400, and the hacker server 1500 And an MC-based URL blocking engine 1333 that performs the MC-based URL blocking process.

8, the database unit 1340 stores operator information for operating the web providing server 1200 and login information for a web site operated by the web providing server 1200 or users accessing the homepage. The information of the user information DB, the dropper, the keylog, the data leakage file, the malicious program distribution script, the malicious code script, and the decoding information of the dropper, the keylog, the data leakage file, the malicious program distribution script, A MC distributed URL DB for storing, updating and sharing URL information generated according to detection results of malicious code access URLs and malicious program distribution codes, malicious program distribution pattern information and malicious code distribution And an MC distribution pattern DB for storing, updating, and sharing the pattern information used in do.

13 is a flowchart showing a malicious code decoding and detection operation of the MC decoding processing engine shown in Figs. 8 and 9. Fig. 14 is a diagram illustrating a malicious code and a malicious code decoding process and an example of detection.

13 and 14, the MC decoding processing engine 1331 of the MC interceptor 1330 receives a URL sequentially generated according to the operation of the web agent 1310, a script of a distribution code generated when HTML contents are provided, The script of the malicious code is loaded in real time and stored in memory. Then, the HTML DOM and the BOM are constructed by sequentially parsing and analyzing the URL, the script of the distributed code, and the script of the malicious code.

Thereafter, the MC decoding processing engine 1331 compares the scripts of the HTML DOM and the BOM with the decoding information of the dropper, the keylog, the data leakage file, the malicious program distribution script, and the malicious code script from the MC decoding information DB A malicious code, a malicious code access URL, and a malicious program distribution code, and detects a URL generated according to the detection result, a distribution code generated when providing HTML contents, and a script Block code entry and connection.

The MC distribution pattern processing engine 1332 transmits execution codes such as a URL sequentially generated according to the operation of the web agent 1310, a distribution code generated when contents are provided, and a script code to malicious program distribution pattern information , Pattern information used for distributing malicious code, and attribute value of at least one of src, width, and height to check presence / absence of a malicious code distribution pattern. The URL, the malicious code access URL generated when the HTML content is provided, the malicious program distribution code, and the malicious program distribution pattern are blocked according to the result of checking whether the malicious code distribution pattern exists or not, and a notification message is generated, To the zombie server (1400) and the hacker server (1500).

The MC distributed URL blocking engine 1333 blocks the URL and the malicious code access URL access according to the result of the malicious code distribution pattern detection engine 1332 and the malicious program distribution code detection result of the MC distribution pattern processing engine 1332, And transmits it to the web providing server 1200, the zombie server 1400, and the hacker server 1500.

As described above, according to the risk analysis system for malicious code of the present invention, malicious codes in the encoded state inserted in the web site and malicious codes of malicious programs are detected and blocked in advance, It is possible to prevent the malicious code infection damage inside the organization. In addition, it can prevent damages to corporate website and website users and improve corporate reliability and customer satisfaction.

In addition, synergy effects can be expected in connection with security management services, and customer trust and direct and indirect profits can be increased through secure internet environment and stable web service provision of corporate / public institutions.

As described above, the configuration and operation of the risk analysis system for malicious code can be performed using machine learning according to the embodiment of the present invention. While the present invention has been described with respect to specific embodiments thereof, Can be carried out without departing from the scope.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, and that various modifications and changes may be made by those skilled in the art.

It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, disclosure methods should be considered from an illustrative point of view, not from a restrictive point of view. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

The risk analysis system of malicious code of the present invention can detect and respond to abuse of a target web server as a malicious code site via a web site by inserting a malicious code distribution code into the web site, The system administrator or the control system can be notified before the malicious code is distributed and the malicious code can be detected by detecting the malicious code and the actual malicious code is distributed to the website detected as the malicious code And whether or not the malicious code is activated. Based on the results, the risk probability is derived from the repeatability, the latest malicious code activation, etc. of the malicious code, and the risk probability is calculated. Accuracy can be achieved by introducing artificial intelligence technology , It is possible to generate revenue by improving service reliability by providing malicious code distribution detection service. By securing a new demand for cloud service, it is possible to establish a base for spreading the cloud market and secure a business model that can grow, It can help strengthen the competitiveness of SMEs entering industrial complexes such as IT cost reduction and productivity improvement of small and medium-sized enterprises moving in.

A malicious code distribution risk analysis system that can generate these various effects can be used throughout the security industry.

Claims

In a malicious code spread risk analysis system,

A web crawler for generating a copy of the link page up to a predetermined depth of the URL accessed by the user and indexing the generated copy;

An HTML parser for parsing the HTML of the copy;

A malicious code distribution pattern analyzing unit for analyzing a malicious code distribution pattern using the HTML parsed by the HTML parsing unit;

A malicious code distributed URL detection unit for detecting a malicious code distribution URL using the malicious code distribution pattern analyzed by the malicious code distribution pattern analysis unit;

A malicious code DB unit for storing contents of the malicious code distribution URL;

A malicious code URL access blocking unit for blocking access to a URL of the user terminal when the malicious code distribution URL is determined;

A risk analysis system for malicious code.
The method according to claim 1,

A script obfuscation processing unit for decoding the malicious code distribution pattern to analyze the malicious code distribution pattern;

Wherein the malicious code is a malicious code.
The method according to claim 1,

A DOM and BOM generating unit for generating a DOM or a BOM using HTML parsed by the HTML parsing unit;

Wherein the malicious code is a malicious code.
The method according to claim 1,

A script engine unit for generating a scenario script according to a copy generated by the web crawler unit and performing an operation according to the generated script;

Wherein the malicious code is a malicious code.
The method according to claim 1,

A risk analysis unit for analyzing the risk of malicious code using at least one of a malicious code distribution pattern and a malicious code distribution URL;

Wherein the malicious code is a malicious code.
6. The method of claim 5,

The risk analysis unit

RNN learning module that derives risk probability according to variables including recency, repeatability and activation of malicious code by using recurrent neural network (RNN).

The risk analysis system for malicious code.
The method according to claim 6,

The malicious code DB unit

Stores the risk probability derived by the RNN learning module, and updates the contents of the malicious code distribution URL by reflecting the risk probability.