KR101767589B1

KR101767589B1 - Web address extraction system for checking malicious code and method thereof

Info

Publication number: KR101767589B1
Application number: KR1020150146240A
Authority: KR
Inventors: 김도형; 이형관
Original assignee: 에스케이플래닛 주식회사
Priority date: 2015-10-20
Filing date: 2015-10-20
Publication date: 2017-08-11
Also published as: KR20170046000A

Abstract

The present invention relates to a system and method for automatically extracting web addresses for malicious code checking, comprising: (a) collecting connection logs from a relay device and extracting web addresses from the collected connection logs; (b) A step of extracting a check target address by filtering the duplicated address and the allowable address from the web address, (c) extracting a sub address from the extracted web address of the check target address, And generating an address list to be checked.

Description

WEB ADDRESS EXTRACTION SYSTEM FOR CHECKING MALICIOUS CODE AND METHOD THEREOF BACKGROUND OF THE INVENTION 1. Field of the Invention < RTI ID =

The present invention relates to a web address automatic extraction system and method for malicious code checking, and more particularly, to a Web proxy that is accessed by a client to use a web service, a connection log The present invention relates to a web address automatic extraction system and method for malicious code checking that automatically extracts a web address for malicious code checking.

The web is very convenient for us, and is used almost daily by almost everybody in the world, but it is frequently exploited as a malicious code infector. If a website visited by a large number of users is exploited for spreading malicious code, the damage may spread widely, and special attention should be paid. Preemptive detection and action against malicious sites can minimize the spread of malicious code damage.

In recent years, attack techniques such as exploiting unknown vulnerability exploit and detection avoiding technology have evolved, and it is necessary to upgrade detection technology. There are low-involvement Web crawling detection methods that depend on signatures, detection methods that are wide range of detection and can detect unknown attacks but are slow and high interaction action-based detection methods.

The number of websites operated on the Internet is large, and the number of URLs to be checked is increased to one million units, ten million units or more in consideration of the lower page.

However, companies that do not have an enterprise-wide IT infrastructure management system create handwritten URLs (IPs, URLs) of web sites that are subject to malicious code checking, There was a problem that the object was missing.

Prior Art 1: Korean Patent No. 1,200,906: Network-based high-performance harmful site blocking system and method

SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and it is an object of the present invention to provide a Web address for checking a malicious code, which can collect, in real time, And an automatic extraction system and method.

Another object of the present invention is to provide a method and system for automatically detecting a web site where a malicious code is hidden by checking a web page of a sub web address connected to the web page through analysis of a web page source of the web address to be checked, System and method.

Another purpose of this incidence is to provide a method for distinguishing between stopover points and ejaculation points through website malware inspection.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.

According to an aspect of the present invention, there is provided a method for collecting access logs from a relay device, the method comprising: (a) collecting access logs from a relay device and extracting web addresses from the collected access logs; (b) (C) extracting a lower address from the web page of the extracted inspection target address, and extracting a check target address list including the check target address and the lower address A method of automatically extracting a Web address for malicious code checking of an inspection target extraction device is provided.

The method of automatically extracting web addresses for malicious code checking includes checking whether or not connection to the addresses to be checked in the address list to be inspected is possible after the step (c), checking for malicious code The method comprising the steps of:

Wherein the step (c) includes the steps of: crawling a web page corresponding to the check target address to extract a portion of the web page source where the link exists, as a lower address; And generating a checklist of addresses to be included.

According to another aspect of the present invention, there is provided a web browsing system comprising: a collecting unit for collecting a connection log from a relay device; a web address extracting unit for extracting web addresses from the collected access log; Extracting a lower address from a web page of the extracted inspection target address and generating an inspection target address list including the inspection target address and the lower address is provided .

The inspection object extraction unit may crawl a web page corresponding to the inspection target address and extract a portion of the web page source in which a link exists, as a lower address.

In addition, the inspection object extracting unit may collect a lower address by analyzing a header part of an HTTP request and an HTTP response generated when a web page corresponding to the inspection target address is visited.

According to another aspect of the present invention, there is provided a relay apparatus for relaying an access request to a web site, the relay apparatus comprising: Extracts an address to be checked, extracts a lower address from the extracted web address of the address to be checked, generates an address list to be checked including the address to be checked and a lower address A web address automatic extraction system for malicious code checking including an inspection target extraction device is provided.

The relay device may be a web proxy server or an L7 switch.

The system for automatically extracting web addresses for malicious code checking comprises: receiving a check target address list from the check target extraction device; confirming whether or not the check target addresses in the check target address list are connectable; And a malicious code checking device for performing malicious code checking by accessing the address.

The inspection target extraction device may crawl a web page corresponding to the inspection target address and extract a portion of the web page source in which a link exists, as a lower address.

Meanwhile, the above-mentioned 'web address automatic extraction system and method for malicious code checking' can be recorded in a recording medium readable by an electronic device after being implemented in the form of a program, or can be recorded in a program download management device Can be distributed.

According to the present invention, when a log is processed in accordance with the purpose of checking by collecting malicious code checking objects by utilizing a relay device such as a Web proxy or an L7 switch that the client accesses to use the web service, Can be extracted.

Also, by analyzing the web page source of the web address to be checked, the web page of the lower web address connected to the web page can be checked to detect the web site where the malicious code is hidden.

In addition, malicious URLs in malicious sites identified as malicious sites can be extracted through web site visits and malicious code discrimination.

The effects of the present invention are not limited to the above-mentioned effects, and various effects can be included within the scope of what is well known to a person skilled in the art from the following description.

FIG. 1 is a diagram illustrating a web address automatic extraction system for malicious code checking according to an embodiment of the present invention. Referring to FIG.
FIG. 2 is a block diagram schematically showing a configuration of a check target extraction apparatus according to an embodiment of the present invention.
3 is a block diagram schematically showing the configuration of a malicious code checking apparatus according to an embodiment of the present invention.
4 is a diagram illustrating a method for automatically extracting web addresses for malicious code checking according to an embodiment of the present invention.
5 is a diagram illustrating a method of automatically extracting a Web address for malicious code checking according to another embodiment of the present invention.
6 is a diagram illustrating a malicious code checking method of a malicious code checking apparatus according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a system and method for automatically extracting web addresses for malicious code checking according to the present invention will be described in detail with reference to the accompanying drawings. The embodiments are provided so that those skilled in the art can easily understand the technical spirit of the present invention, and thus the present invention is not limited thereto. In addition, the matters described in the attached drawings may be different from those actually implemented by the schematic drawings to easily describe the embodiments of the present invention.

In the meantime, each constituent unit described below is only an example for implementing the present invention. Thus, in other implementations of the present invention, other components may be used without departing from the spirit and scope of the present invention.

In addition, each component may be implemented solely by hardware or software configuration, but may be implemented by a combination of various hardware and software configurations performing the same function. Also, two or more components may be implemented together by one hardware or software.

Also, the expression " comprising " is intended to merely denote that such elements are present as an expression of " open ", and should not be understood to exclude additional elements.

FIG. 1 is a diagram illustrating a web address automatic extraction system for malicious code checking according to an embodiment of the present invention. Referring to FIG.

Referring to FIG. 1, a web address automatic extraction system for malicious code checking includes at least one client 110, a relay apparatus 200, a check target extraction apparatus 300, a malicious code check apparatus 400, a DNS server 500, and a service server 600. The DNS server 500 and the service server 600 are located in a service network (DMZ network). The DNS server 500 and the service server 600 are located in the service network (DMZ network) . However, it is to be understood that the present invention is not limited thereto and that some configurations may be deleted or added as necessary.

The client 100 requests data from the service server 600 through a communication network such as the Internet or receives data from the service server 600.

The client 100 may be implemented in the form of an electronic device such as a smart phone, a tablet, a PC, a notebook computer, a PDA, or the like. In addition, various types of electronic devices . &Lt; / RTI >

The relay apparatus 200 is a device that receives a request from the client 100 for searching for a resource of the service server 600 and may be a Web proxy or an L7 switch. The L7 switch is a device for recognizing and switching contents transmitted from the client 100. [

The relay apparatus 200 temporarily stores an access log corresponding to the connection request in the cache 210 and relays communication between the web site and the client when the client 100 is requested to access the web site. The access log is stored in the cache 210 of the relay apparatus 200, and the access log may include client identification information, a URL of a website to be accessed, IP, and the like.

The relay apparatus 200 may be located anywhere on the network between the client 100 and the service server 600. [ The client 100 requests resources based on the address of the remote destination server, and the relay server 200 receives the address, connects to the destination server, and obtains resources.

The inspection target extraction device 300 selects a malicious code inspection target address using the access log recorded in the relay device 200. [

That is, the inspection target extraction apparatus 300 collects connection logs from the relay apparatus 200, extracts web addresses from the collected connection logs, filters duplicate addresses and allowed addresses from the extracted web addresses, Create a list. The allowed address is an address that allows access from the firewall and filters the allowed address using the public address database stored in the firewall. At this time, the created list of addresses to be checked consists of URL or IP of website.

The check target extraction apparatus 300 transmits the check target address list to the malicious code checking apparatus 400 that performs the malicious code check. At this time, the inspection target extraction apparatus 300 can transmit the check target address list to the malicious code checking apparatus 400 using various protocols such as FTP, SNMP, and syslog.

As described above, the inspection target extraction apparatus 300 automatically collects the inspection target web address for malicious code inspection.

According to another embodiment of the present invention, the check target extraction apparatus 300 extracts the sub address from the web page of the address in the check target address list, and generates the check target address list including the check target address and the sub address .

That is, the check target extraction apparatus 300 crawls the web page of the check target URL in the check target address list and extracts a portion of the web page source where the link exists. Here, the link portion includes the src portion of the script, the URL of the A href, the URL contained in the URL tag, the src portion of the img, and the like. In this way, when sub-URLs are collected through web page source analysis, it is possible to check the sub-pages used for link clicks or page movement that require user's action.

In addition, the inspection target extraction device 300 can extract a sub-URL according to the depth connected to the inspection target URL. Accordingly, the malicious code checking apparatus 400 can check malicious behavior of each web address connected to the inspection target web address in addition to the inspection target web address.

In addition, the inspection target extraction apparatus 300 may collect a sub-URL by analyzing a header part of an HTTP request and an HTTP response generated when a web page is visited. The relationship between the two URLs can be confirmed by using the referrer URL information and the request URL (GET URL) information of the HTTP response. The waypoint URL may be a case where a link is included in the screen of the Iframe or the request URL, and the request URL may have a plurality of waypoint URLs, and may be conceptually an upper level URL.

Accordingly, the check target extraction apparatus 300 may generate a check target address list including a lower address of the check target address.

A plurality of clients 100 connected to the Internet can access one or a plurality of target Web sites via the Internet to obtain desired data and freely use them. In this case, the contents of some web sites unconsciously used by the user may include malicious codes and viruses. In the inspection target extraction device 300 according to the present invention, a web site having a possibility of being related to these malicious code information, Extract the address.

As described above, the inspection object extraction apparatus 300 collects malicious code inspection objects using the relay device 200 such as Web Proxy and L7 Switch that the client 100 accesses to use the web service, Processing the connection log can extract the URL required for real-time detection.

In addition, the inspection target extraction device 300 can check a web page of a sub-web address connected to the web page through analysis of a web page source of the web address to be inspected, thereby detecting a web site where the malicious code is hidden.

Meanwhile, the inspection target extraction device 300 may be implemented as a single calculation device or in the form of an aggregate device in which two or more calculation devices are connected to each other. For example, the check target extraction apparatus 300 may be implemented as a single server or two or more servers connected together.

The malicious code checking apparatus 400 performs a malicious code check on the addresses to be checked in the check target address list sent from the check target extraction apparatus 300. At this time, the malicious code checking apparatus 400 confirms whether or not the web site to be inspected can be accessed, and performs the visit check only for the web sites confirmed to be accessible (alive).

The malicious code checking device 400 confirms whether or not a response is received after transmitting a DNS (domain name system) query to confirm whether or not the website to be checked can be accessed at a high speed. When a DNS response is received, it is determined that a web service is provided to the TCP port 80 when an acknowledgment signal is received after transmitting a synchronization signal to the TCP 80 port. Here, the malicious code checking apparatus 400 can confirm whether or not a plurality of web sites can be accessed simultaneously using a multithread thread.

When the malicious code checking apparatus 400 receives the address list to be checked, the malicious code checking apparatus 400 accesses a plurality of web sites to be checked simultaneously using multiple browsers. Here, the list of web sites to be inspected consists of URLs of a large-scale web site to be inspected. Then, the malicious code checking apparatus 400 executes the browser in a unit that can be set at the same time and accesses the website to be checked through the browser. For example, if 100 browsers can be executed at the same time, the malicious code checking apparatus 400 accesses the check target web sites in the check target address list in 100 units.

On the other hand, if the check target address list is composed of the main address and the sub address, the malicious code checking device 400 visits the corresponding website according to the main address and the sub address.

If the web site to be checked is the main address, the malicious code checking device executes a predetermined number of multiple browsers and simultaneously visits each inspection target web site. For example, the malicious code checking apparatus 400 executes 30 multiple browsers and simultaneously visits 30 different inspection target web sites through each browser.

If the web site to be checked is a subordinate address, the malicious code checking apparatus 400 amplifies the speed by using the multiframe visiting technique simultaneously in multiple browsers. For example, if you open 20 browsers with 5 frames inserted at the same time and visit the website to be checked, it is possible to check 100 (5 × 20) sites with a single check.

If malicious code infection attempts are not detected by using multiple browsers and multi-frames at the same time, a visit is made to the next inspection target group. If an infection attempt is confirmed, &Lt; / RTI > At this time, when tracing the site in question, the tree search can be used to quickly locate the site with a minimum number of checks.

The malicious code checking apparatus 400 checks whether there is an attempt to infect malicious code among a plurality of websites to be checked. At this time, the malicious code checking apparatus 400 can confirm whether or not an attack that infects malicious code occurs through a correlation analysis of a file, a process, and a registry phenomenon that occurs after visiting a web site to be inspected.

The malicious code checking apparatus 400 extracts a malicious site when a malicious code infection attempt is detected among a plurality of websites to be checked. At this time, the malicious code checking apparatus 400 extracts a malicious site from a plurality of inspection target web sites when the inspection range is narrowed to a predetermined ratio using a tree search.

When the malicious site is extracted, the malicious code checking apparatus 400 accesses the malicious site and tracks malicious URLs that distribute the malicious code. Here, the malicious code checking device 400 extracts connection URLs in which additional connections are made when the malicious sites are visited, blocks the extracted connection URLs one by one, and tracks the vulnerability attack URLs by revisiting the malicious sites.

The malicious code checking device 400 monitors whether malicious code is infiltrated into the internal network and an internal PC generates a reverse connection to the outside when a hacker attacks a vulnerability of the internal employee's PC in a manner such as phishing.

In addition, the malicious code checking apparatus 400 monitors malicious code linking and injection to a web server when a hacker attacks a vulnerability of a server for an external service (commercial). In this case, due to the difference between the internal network and the commercial network configuration, it is possible to grasp and check the target (changeable phenomenon) of the web server operated by a specific company or organization with the DNS and the firewall ACL in real time.

Meanwhile, the malicious code checking apparatus 400 may be implemented as a single computing device or in the form of an aggregate device in which two or more computing devices are connected to each other. For example, the malicious code checking apparatus 400 may be implemented as a single server or two or more servers connected together.

FIG. 2 is a block diagram schematically showing a configuration of a check target extraction apparatus according to an embodiment of the present invention.

2, the object extraction apparatus 300 includes a communication unit 310, a collecting unit 320, an object extraction unit 330, a storage unit 340, and a control unit 350. However, it is to be understood that the present invention is not limited thereto and that some configurations may be deleted or added as necessary.

The communication unit 310 is a structure for transmitting and receiving data with various electronic devices. Specifically, the communication unit 310 can be connected to the relay apparatus 200 and the malicious code checking apparatus 400 via a wired communication network, a wireless communication network, a local area network, and the like. Based on the connection, And various types of data can be transmitted and received. The communication unit 310 may include various wired communication modules or wireless communication modules, and may transmit or receive data through various wireless or wired communication standards. For example, the communication unit 310 may be implemented in a form including various standard communication modules such as ITU, IEEE, ISO, and IEC, and may include various communication modules in addition to the standard communication module .

The collecting unit 320 collects the access logs from the relay device through the communication unit 310. The access log may include client identification information, a URL of a web site to be accessed, IP, and the like. That is, the collecting unit 320 may collect the access logs stored in the cache of the relay apparatus in real time or periodically.

The inspection object extracting unit 330 extracts web addresses from the access log collected by the collecting unit 320 and generates an inspection object address list by filtering the duplicated addresses and the allowable addresses from the extracted web addresses. The allowed address is an address that allows access from the firewall and filters the allowed address using the public address database stored in the firewall. At this time, the created list of addresses to be checked consists of URL or IP of website.

According to another embodiment of the present invention, the check target extraction unit 330 extracts the sub address from the web page of the address in the check target address list and generates the check target address list including the check target address and the sub address . That is, the check target extracting unit 330 crawls the web page of the check target URL in the check target address list and extracts a portion of the web page source where the link exists. Here, the link portion includes the src portion of the script, the URL of the A href, the URL contained in the URL tag, the src portion of the img, and the like. In this way, when sub-URLs are collected through web page source analysis, it is possible to check the sub-pages used for link clicks or page movement that require user's action.

In addition, the check target extracting unit 330 can extract a sub-URL according to the depth connected with the check target URL. This makes it possible for the malicious code checking device to check the malicious behavior of each web address connected to the target web address in addition to the target web address.

In addition, the inspection object extracting unit 330 may collect a sub-URL by analyzing a header part of an HTTP request and an HTTP response generated when the web page is visited. The relationship between the two URLs can be confirmed by using the referrer URL information and the request URL (GET URL) information of the HTTP response. The waypoint URL is the website information visited before the requesting URL, and the requesting URL becomes the subordinate URL of the waypoint URL.

The inspection target extracting unit 330 may generate the check target address list including the lower address of the inspection target address.

Meanwhile, the collecting unit 320 and the inspection object extracting unit 330 may be implemented by a processor or the like necessary for executing a program on a computing device, respectively. As described above, the collecting unit 320 and the inspection object extracting unit 330 may be implemented by physically independent configurations, or may be implemented in a functional manner in one processor.

The storage unit 340 stores the data related to the operation of the check target extraction apparatus 300. For example, the storage unit 340 may store various data including an access log collected by the collection unit 320, a list of addresses to be checked generated by the check target extraction unit 330, and the like. The storage unit 340 may use a known storage medium, and may use any one or more of known storage media such as ROM, PROM, EPROM, EEPROM, RAM, and the like.

The control unit 350 controls the operation of various components of the inspection target extraction device 300 including the communication unit 310, the collecting unit 320, the inspection target extraction unit 330, and the storage unit 340 .

The control unit 350 may include at least one computing unit, which may be a general purpose central processing unit (CPU), programmable device elements (CPLDs, FPGAs) suitably implemented for a particular purpose, Device (ASIC) or a microcontroller chip.

The check target extraction apparatus 300 may be realized as a single calculation apparatus or in the form of an aggregate apparatus in which two or more calculation apparatuses are connected to each other. For example, the check target extraction apparatus 300 may be implemented as a single server or two or more servers connected together.

3 is a block diagram schematically showing the configuration of a malicious code checking apparatus according to an embodiment of the present invention.

Referring to FIG. 3, the malicious code checking apparatus 400 includes a communication unit 410, a checking unit 420, a storage unit 430, and a control unit 440. However, it is to be understood that the present invention is not limited thereto and that some configurations may be deleted or added as necessary.

The communication unit 410 is a structure for transmitting and receiving data with various electronic devices. Specifically, the communication unit 410 can connect the malicious code checking apparatus 400 and the inspection target extraction apparatus 300 through a wired communication network, a wireless communication network, a local area network, or the like. Based on the connection, And the like. The communication unit 410 may include various wired communication modules or wireless communication modules, and may transmit or receive data through various wireless or wired communication standards. For example, the communication unit 410 may be implemented in a form including various standard communication modules such as ITU, IEEE, ISO, and IEC, and may include various communication modules in addition to the standard communication module .

The checking unit 420 performs a malicious code check on the addresses to be checked in the check target address list transmitted from the check target extraction apparatus. At this time, the checking unit 420 confirms whether or not the web site to be checked is connectable, and performs the visit check only for the websites that are confirmed as connectable (alive). In order to confirm whether or not connection of the website to be checked is possible at a high speed, the checking unit 420 transmits a DNS (domain name system) query and confirms whether or not a response is received. When a DNS response is received, it is determined that a web service is provided to the TCP port 80 when an acknowledgment signal is received after transmitting a synchronization signal to the TCP 80 port. Here, the checking unit 420 can confirm whether or not a plurality of web sites can be accessed at the same time by using a multithread thread.

When the checking target address list is received, the checking unit 420 accesses a plurality of web sites to be checked simultaneously using multiple browsers. Here, the list of web sites to be inspected consists of URLs of a large-scale web site to be inspected. Then, the checking unit 420 executes the browser in a unit which can be set at the same time and accesses the inspection target web site through the browser. For example, if 100 browsers can be run at the same time, the inspection department connects the inspection target web site in 100 addresses.

On the other hand, if the check target address list is composed of the main address and the sub address, the checking unit 420 visits the corresponding web site according to the main address and the sub address.

If the Web site to be checked is the main address, the checking unit 420 executes a predetermined number of multiple browsers and simultaneously visits each check target Web site. For example, the checking unit 420 executes 30 multiple browsers and simultaneously visits 30 different inspection target web sites through each browser.

If the web site to be checked is a sub-address, the checking unit 420 simultaneously amplifies the speed by using the multi-browser visiting technique. For example, if you open 20 browsers with 5 frames inserted at the same time and visit the website to be checked, it is possible to check 100 (5 × 20) sites with a single check.

The checking unit 420 checks whether there is an attempt to infect malicious code among a plurality of web sites to be checked. At this time, the checking unit 420 can check whether an attack that infects a malicious code occurs by analyzing a correlation between a file, a process, and a registry phenomenon that occurs after visiting a Web site to be inspected.

The inspection unit 420 extracts a malicious site when a malicious code infection attempt is detected among a plurality of inspection target web sites. At this time, the checking unit 420 extracts malicious sites from a plurality of inspection target web sites when the inspection range is narrowed to a predetermined ratio by using the tree search.

When the malicious site is extracted, the checking unit 420 accesses the malicious site and tracks the malicious URL that distributes the malicious code. Here, when the malicious site is visited, the check box 420 extracts the connection URL in which the connection is additionally generated, blocks the extracted connection URLs one by one, and tracks the vulnerability attack URL by revisiting the malicious site.

On the other hand, the checking unit 420 may be implemented by a processor or the like necessary for executing a program on a computing device, respectively. As described above, the checking unit 420 may be implemented by each physically independent configuration, or may be implemented in a functionally separated form in one processor.

The storage unit 430 stores data related to the operation of the malicious code checking apparatus 400. In particular, the storage unit 440 may store various data including an inspection target address list transmitted from the inspection target extraction apparatus. The storage unit 440 may use a known storage medium, and may use any one or more of known storage media such as ROM, PROM, EPROM, EEPROM, RAM, and the like.

The control unit 440 controls the operation of various components of the malicious code checking apparatus 400 including the communication unit 410, the checking unit 420, and the storage unit 430.

The controller 440 may include at least one computing device, which may be a general purpose central processing unit (CPU), programmable device elements (CPLDs, FPGAs) suitably implemented for a particular purpose, Device (ASIC) or a microcontroller chip.

The malicious code checking device 400 may be implemented as a single computing device or in the form of an aggregate device in which two or more computing devices are connected to each other. For example, the malicious code checking apparatus 400 may be implemented as a single server or two or more servers connected together.

4 is a diagram illustrating a method for automatically extracting web addresses for malicious code checking according to an embodiment of the present invention. It is to be understood that this is only one embodiment including preferred steps in achieving the object of the present invention, and it goes without saying that some steps may be modified, added or deleted.

Referring to FIG. 4, the check target extraction apparatus collects connection logs from the relay device (S402), and extracts web addresses from the collected connection logs (S404).

Then, the check target extraction device filters the duplicated address and the allowed address from the extracted web address (S406) and creates the check target address list (S408). At this time, the inspection target extraction device generates an inspection target address list by filtering the firewall public IP, and the created inspection target address list is composed of the URL or IP of the website.

When the step S408 is performed, the check target extracting device not only stores the check target address list but also transmits the check target address list to the malicious code checking device which performs malicious code checking (S410). At this time, the inspection target extraction device can transmit the check target address list to the malicious code checking device using various protocols such as FTP, SNMP, and syslog.

A method for automatically extracting a Web address for malicious code checking according to an embodiment of the present invention may be implemented in the form of a program. In such a state, the program may be stored in a computer-readable recording medium, It can also be distributed via a provisioning server.

5 is a diagram illustrating a method of automatically extracting a Web address for malicious code checking according to another embodiment of the present invention. It is to be understood that this is only one embodiment including preferred steps in achieving the object of the present invention, and it goes without saying that some steps may be modified, added or deleted.

Referring to FIG. 5, the check target extraction apparatus collects connection logs from the relay device (S502), and extracts web addresses from the collected connection logs (S504).

Then, the inspection target extraction device filters the duplicated address and the allowed address in the extracted web address (S506) and extracts the inspection target address (S508). At this time, the inspection target extraction device extracts the inspection target address by filtering the firewall public IP.

When the step S508 is performed, the check target extraction device extracts the lower address from the web page of the extracted check target address (S510), and creates the check target address list including the check target address and the lower address (S512). That is, the inspection target extraction device crawls the web page of the inspection target URL in the inspection target address list and extracts the portion of the web page source where the link exists, as a lower address. Here, the link portion includes the src portion of the script, the URL of the A href, the URL contained in the URL tag, the src portion of the img, and the like. In this way, when sub-URLs are collected through web page source analysis, it is possible to check the sub-pages used for link clicks or page movement that require user's action.

In addition, the inspection target extraction device can extract a sub-URL according to the depth connected with the inspection target URL. This makes it possible for the malicious code checking device to check the malicious behavior of each web address connected to the target web address in addition to the target web address.

In addition, the inspection target extraction apparatus can collect a sub-URL by analyzing a header part of an HTTP request and a HTTP response generated when a web page is visited. The relationship between the two URLs can be confirmed by using the referrer URL information and the request URL (GET URL) information of the HTTP response. The waypoint URL is the website information visited before the requesting URL, and the requesting URL becomes the subordinate URL of the waypoint URL.

In this way, the inspection target extraction device can generate a check target address list including the sub address of the inspection target address.

Thereafter, the check target extraction apparatus not only stores the check target address list, but also transmits the check target address list to a check device that performs malicious code check (S514). At this time, the inspection target extraction device can transmit the check target address list to the malicious code checking device using various protocols such as FTP, SNMP, and syslog.

6 is a diagram illustrating a malicious code checking method of a malicious code checking apparatus according to an embodiment of the present invention. It is to be understood that this is only one embodiment including preferred steps in achieving the object of the present invention, and it goes without saying that some steps may be modified, added or deleted.

Referring to FIG. 6, when the malicious code checking device receives the check target address list from the check target extraction device (S602), the malicious code checking device accesses the website corresponding to the check target address (S604). At this time, the malicious code checking device confirms whether or not the web site to be inspected can be accessed, and performs the visit inspection only for the websites confirmed as alive. In order to check the availability of the web site to be checked at a high speed, the malicious code checking device confirms whether or not a response is received after transmitting a DNS (domain name system) query. When a DNS response is received, it is determined that a web service is provided to the TCP port 80 when an acknowledgment signal is received after transmitting a synchronization signal to the TCP 80 port. Here, the malicious code checking apparatus can confirm whether or not a plurality of web sites can be accessed simultaneously using a multithread thread.

When the malicious code checking device receives the address list to be checked, the malicious code checking device accesses a plurality of websites to be checked simultaneously using multiple browsers. Here, the list of web sites to be inspected consists of URLs of a large-scale web site to be inspected. Then, the malicious code checking device executes the browser at a predetermined and simultaneously accessible unit, and visits the web site to be checked through the browser. For example, if you can have 100 browsers running at the same time, the Malicious Code Checker will connect the Web sites to be checked in the checked address list in units of 100.

On the other hand, if the address list to be checked consists of the main address and the sub address, the malicious code checking device visits the corresponding website according to the main address and the sub address.

If the web site to be checked is the main address, the malicious code checking device executes a predetermined number of multiple browsers and simultaneously visits each inspection target web site. For example, the Malware Checker runs 30 different browsers and visits 30 different websites to be inspected at the same time through each browser.

If the web site to be checked is a sub-address, the malicious code checking device amplifies the speed by using a multi-browser multi-frame visiting technique at the same time. For example, if you open 20 browsers with 5 frames inserted at the same time and visit the website to be checked, it is possible to check 100 (5 × 20) sites with a single check.

When S604 is performed, the malicious code checking device checks whether there is an attempt to infect malicious code among a plurality of web sites to be checked (S606). At this time, the malicious code checking device can check whether an attack that infects malicious code occurs by analyzing the correlation between the file, process and registry phenomenon that occurs after visiting the website to be inspected.

If a malicious code infection attempt is detected among a plurality of web sites to be inspected, the malicious code checking device extracts malicious sites (S608). At this time, the malicious code checking apparatus extracts malicious sites from a plurality of inspection target web sites when the inspection range is narrowed to a predetermined ratio using a tree search.

When the malicious site is extracted, the malicious code checking device accesses the malicious site and traces the malicious URL that distributes the malicious code (S610). Here, the malicious code checking device extracts an access URL in which a connection is generated when a malicious site is visited, blocks the extracted access URLs one by one, and tracks the vulnerability attack URL by revisiting the malicious site.

The method of automatically extracting Web addresses for checking malicious code can be written as a program, and the codes and code segments constituting the program can be easily deduced by a programmer in the field. In addition, a program related to a method for automatically extracting web addresses for malicious code checking can be stored in an information storage medium (readable medium) that can be read by an electronic device, and can be read and executed by an electronic device.

Thus, those skilled in the art will appreciate that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. It is therefore to be understood that the above-described embodiments are illustrative only and not restrictive of the scope of the invention. It is also to be understood that the flow charts shown in the figures are merely the sequential steps illustrated in order to achieve the most desirable results in practicing the present invention and that other additional steps may be provided or some steps may be deleted .

The technical features and implementations described herein may be implemented in digital electronic circuitry, or may be implemented in computer software, firmware, or hardware, including the structures described herein, and structural equivalents thereof, . Also, implementations that implement the technical features described herein may be implemented as computer program products, that is, modules relating to computer program instructions encoded on a program storage medium of the type for execution by, or for controlling, the operation of the processing system .

The computer-readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter that affects the machine readable propagation type signal, or a combination of one or more of the foregoing.

In the present specification, the term " apparatus "or" system "includes all apparatuses, apparatuses, and machines for processing data, including, for example, a processor, a computer or a multiprocessor or a computer. The processing system may include any code that, in addition to the hardware, forms an execution environment for a computer program upon request, such as, for example, code that constitutes processor firmware, a protocol stack, a database management system, an operating system, can do.

A computer program, known as a program, software, software application, script or code, may be written in any form of programming language, including compiled or interpreted language or a priori, procedural language, Routines, or other units suitable for use in a computer environment.

On the other hand, a computer program does not necessarily correspond to a file in the file system, but may be stored in a single file provided to the requested program or in a plurality of interactive files (for example, one or more modules, File), or a portion of a file that holds another program or data (e.g., one or more scripts stored in a markup language document).

A computer program may be embodied to run on multiple computers or on one or more computers located at one site or distributed across a plurality of sites and interconnected by a wired / wireless communication network.

On the other hand, computer readable media suitable for storing computer program instructions and data include, for example, semiconductor memory devices such as EPROM, EEPROM, and flash memory devices, such as magnetic disks such as internal hard disks or external disks, And any type of non-volatile memory, media and memory devices, including CD and DVD discs. The processor and memory may be supplemented by, or incorporated in, special purpose logic circuits.

Implementations implementing the technical features described herein may include, for example, back-end components such as a data server, or may include middleware components, such as, for example, an application server, Or a client computer having a graphical user interface, or any combination of one or more of such backend, middleware or front end components. The components of the system may be interconnected by any form or medium of digital data communication, for example, a communication network.

Hereinafter, a more specific embodiment capable of implementing the configurations including the system described herein and the web address automatic extraction method for malicious code checking will be described in detail.

The system described herein and the method for automatically extracting web addresses for malicious code checking may be executed on a client device or a server associated with a web based storage system or on one or more processors included in a server to execute computer software, Lt; RTI ID = 0.0 > and / or < / RTI > The processor may be part of a computing platform, such as a server, a client, a network infrastructure, a mobile computing platform, a fixed computing platform, and the like, and may specifically be a type of computer or processing device capable of executing program instructions, code, In addition, the processor may further include a method for automatically extracting a Web address for checking a malicious code, a memory for storing instructions, a code, and a program. If the memory does not include a memory, Access to storage devices such as CD-ROMs, DVDs, memories, hard disks, flash drives, RAMs, ROMs, caches, etc. in which the instructions, codes, and programs are stored.

In addition, the system described herein and the method for automatically extracting web addresses for malicious code checking can be used in part or in whole through a server, a client, a gateway, a hub, a router, or an apparatus executing computer software on network hardware. The software may be executed in various types of servers such as a file server, a print server, a domain server, an Internet server, an intranet server, a host server, a distributed server, A storage medium, a communication device, a port, a client, and other servers via a wired / wireless network.

In addition, the automatic method of extracting web addresses for malicious code checking, commands, and codes can also be executed by the server, and other devices required for executing the method of automatically extracting web addresses for malicious code checking can be classified into a hierarchical structure &Lt; / RTI >

In addition, the server can provide an interface to other devices including, without limitation, clients, other servers, printers, database servers, print servers, file servers, communication servers, distributed servers, The remote execution of the program can be facilitated.

Further, any of the devices connected to the server via the interface may further include at least one storage device capable of storing a web address automatic extraction method, a command, and a code for malicious code checking, and the central processor of the server may be different Commands, codes, and the like to be executed on the device can be provided to the device and stored in the storage device.

Meanwhile, the system described herein and the method for automatically extracting web addresses for malicious code checking can be partially or entirely used through a network infrastructure. The network infrastructure may include both a device such as a computing device, a server, a router, a hub, a firewall, a client, a personal computer, a communication device, a routing device, etc. and a separate module capable of performing each function, In addition to one device and module, it may further include storage media such as a story flash memory, buffer, stack, RAM, ROM, and the like. In addition, the automatic method of extracting web addresses for checking malicious codes, commands, codes, and the like can be executed and stored by any of devices, modules, and storage media included in the network infrastructure. Other devices needed to implement the extraction method may also be implemented as part of the network infrastructure.

In addition, the system described in the present specification and the method of automatically extracting web addresses for malicious code checking can be implemented by hardware or a combination of hardware and software suitable for a specific application. Herein, the hardware includes both general-purpose computer devices such as personal computers, mobile communication terminals, and enterprise-specific computer devices, and the computer devices may include memory, a microprocessor, a microcontroller, a digital signal processor, an application integrated circuit, a programmable gate array, Or the like, or a combination thereof.

Computer software, instructions, code, etc., as described above, may be stored or accessed by a readable device, such as a computer component having digital data used to compute for a period of time, such as RAM or ROM Permanent storage such as semiconductor storage, optical disc, large capacity storage such as hard disk, tape, drum, optical storage such as CD or DVD, flash memory, floppy disk, magnetic tape, paper tape, Memory such as storage and dynamic memory, static memory, variable storage, network-attached storage such as the cloud, and the like. Here, the commands and codes are data-oriented languages such as SQL and dBase, system languages such as C, Objective C, C ++, and assembly, architectural languages such as Java and NET, application languages such as PHP, Ruby, Perl and Python But it is not so limited and may include all languages well known to those skilled in the art.

In addition, "computer readable media" as described herein includes all media that contribute to providing instructions to a processor for program execution. But are not limited to, transmission media such as coaxial cables, copper wires, optical fibers, and the like that transmit data to nonvolatile media such as data storage devices, optical disks, magnetic disks, etc., volatile media such as dynamic memory and the like.

On the other hand, configurations implementing the technical features of the present invention, which are included in the block diagrams and flowcharts shown in the accompanying drawings, refer to the logical boundaries between the configurations.

However, according to an embodiment of the software or hardware, the depicted arrangements and their functions may be implemented in the form of a stand alone software module, a monolithic software structure, a code, a service and a combination thereof and may execute stored program code, All such embodiments are to be regarded as being within the scope of the present invention since they can be stored in a medium executable on a computer having a processor and their functions can be implemented.

Accordingly, the appended drawings and the description thereof illustrate the technical features of the present invention, but should not be inferred unless a specific arrangement of software for implementing such technical features is explicitly mentioned. That is, various embodiments described above may exist, and some embodiments may be modified while retaining the same technical features as those of the present invention, and these should also be considered to be within the scope of the present invention.

It should also be understood that although the flowcharts depict the operations in the drawings in a particular order, they are shown for the sake of obtaining the most desirable results, and such operations must necessarily be performed in the specific order or sequential order shown, Should not be construed as being. In certain cases, multitasking and parallel processing may be advantageous. In addition, the separation of the various system components of the above-described embodiments should not be understood as requiring such separation in all embodiments, and the described program components and systems are generally integrated into a single software product, It can be packaged.

As such, the specification is not intended to limit the invention to the precise form disclosed. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is evident that many alternatives, modifications, and variations will be apparent to those skilled in the art without departing from the spirit and scope of the present invention as defined by the appended claims. It is possible to apply a deformation.

The scope of the present invention is defined by the appended claims rather than the foregoing description, and all changes or modifications derived from the meaning and scope of the claims and equivalents thereof are deemed to be included in the scope of the present invention. .

The present invention provides a system and method for automatically extracting web addresses for malicious code checking, thereby collecting malicious code checking targets in real time by utilizing a relay device such as a Web proxy and an L7 switch that a client accesses to use a web service And it is possible to extract the URL necessary for real-time detection by processing the log to meet the purpose of the check.

100: Client
200: Relay device
300: Inspection target extraction device
310, 410:
320:
330: Inspection object extraction unit
340, 430:
350, 440:
400: malicious code checking device
420:
500: DNS server
600: service server

Claims

(a) collecting an access log from a relay device, and extracting web addresses from the collected access log;
(b) filtering the duplicated address and the allowed address from the extracted web address to extract an address to be checked; And
(c) extracting a lower address from the web page of the extracted inspection target address, and creating an inspection target address list including the inspection target address and the lower address;
A method for automatically extracting a web address for malicious code checking of an inspection target extraction device including a web address extraction method.

The method according to claim 1,
After the step (c)
Checking whether or not access to the addresses to be checked in the check target address list is possible and accessing an address determined to be connectable to perform a malicious code check; Automatic address extraction method.

The method according to claim 1,
The step (c)
Crawling a web page corresponding to the check target address and extracting a portion of the web page source in which a link exists, as a lower address;
And generating an inspection target address list including the inspection target address and the extracted sub address. The method of claim 1,

A collection unit for collecting connection logs from the relay apparatus;
Extracts web addresses from the collected access logs, extracts addresses to be checked by filtering duplicated addresses and allowed addresses from the extracted web addresses, extracts sub-addresses from the extracted web addresses of the checked addresses, An inspection object extraction unit for generating an inspection object address list including the inspection object address and the lower address;
And an extracting unit for extracting the object to be inspected.

5. The method of claim 4,
Wherein the inspection object extracting unit extracts a portion of a web page source in which a link exists, as a lower address, by crawling a web page corresponding to the inspection target address.

5. The method of claim 4,
Wherein the inspection object extraction unit analyzes a header part of an HTTP request and an HTTP response generated when a web page corresponding to the inspection target address is visited and collects the lower address. Target extraction device.

A relay device for storing a connection log in response to a connection request when a connection to a web site is requested from a client;
Extracting web addresses from the relay device, extracting web addresses from the extracted web address, filtering duplicate addresses and allowed addresses from the extracted web addresses, extracting an address to be checked, Extracts a check target address list including the check target address and the sub address;
A web address automatic extraction system for malicious code checking.

8. The method of claim 7,
Wherein the relay device is a web proxy server or an L7 switch.

8. The method of claim 7,
A malicious code for performing a malicious code check by accessing an address determined to be connectable by checking whether or not access to the addresses to be checked in the address list to be checked is possible, An automatic web address extraction system for malicious code checking that further includes a check device.

8. The method of claim 7,
Wherein the inspection target extraction device crawls a web page corresponding to the inspection target address and extracts a portion of the web page source where a link exists as a lower address. Automatic extraction system.