WO2018203775A2

WO2018203775A2 - System and method for checking web resources for the presence of malicious inserts

Info

Publication number: WO2018203775A2
Application number: PCT/RU2018/000274
Authority: WO
Inventors: Илья Самуилович РАБИНОВИЧ
Original assignee: Илья Самуилович РАБИНОВИЧ
Priority date: 2017-05-05
Filing date: 2018-04-26
Publication date: 2018-11-08
Also published as: RU2662391C1; WO2018203775A3

Abstract

The invention relates to systems and methods for ensuring the safety of computer systems, and more particularly to systems and methods for checking protected web resources for the presence of malicious, potentially dangerous and undesirable inserts. A system for checking web resources for the presence of malicious inserts comprises an origin address search unit connected to an interpreter capable of interpreting all code execution directions in the event of conditional branching, a decision-making unit, a local database of safe elements, a local database of malicious elements, a data receiving and transmitting unit, a code and data correction unit, a decomposition and composition unit, and a dynamic analysis unit, wherein the interpreter is connected to the decision-making unit, which is connected to a unit of the local database of safe elements, and to a unit of the local database of malicious elements, the data receiving and transmitting unit, and the code and data correction unit, and the data receiving and transmitting unit is connected to the decomposition and composition unit which is connected to the dynamic analysis unit. A method for checking web resources for the presence of malicious inserts using the aforementioned checking system includes recursively parsing a web resource and analyzing the associated sources of information, for example the logs of the web resource, interpreting the code of the web resource and logging the interpretation step, wherein, in the event of interpretation of a conditional branching command, the interpretation step is duplicated, and interpretation of the code is continued in all directions defined by the conditional branching with one or more web pages being obtained in which potentially malicious elements are detected by comparing all elements of potentially malicious types with those contained in the local database of safe elements and in the local database of malicious elements; if unknown potentially malicious elements are detected, these elements are subjected to checking and dynamic analysis, whereupon a correspondence is established between each malicious element detected and the code on the site which introduced these elements into the source data.

Description

System and method for checking web resources for malicious insertions

The present invention relates to systems and methods for securing computer systems, and more particularly, to systems and methods for checking secure web resources for malicious, potentially dangerous and unwanted inserts.

As the technology and infrastructure of computer networks improves, the volume and speed of data transmitted between devices on a computer network is growing rapidly. Moreover, the transmitted data may contain malicious inserts - programs and elements. A malicious program is a computer software product that is capable of penetrating a computing device without the knowledge or consent of the owner of the device. Malicious programs have become a common term and mean a general class of software, which includes many malicious, intrusive or otherwise interfering forms of software or machine code. Malicious programs include various viruses, worms, Trojan horses (or Trojans), rootkits, spyware, adware, programs that exploit vulnerabilities in third-party software, and any other unwanted malicious software. Various types of malware can collect personal information related to the user and send this information back to the information collection device. Other types of malware can cause the computing device to fail or not work at all.

Widely known are methods and means of identifying and removing malicious inserts using antivirus software. Traditional antivirus software uses search sequences and a predefined analytic approach to search for known malware.

Modern anti-virus scanners of malicious inserts on web resources (web scanners), whose agents are located on the server and scan files hosted on the server using signatures of known malicious inserts, are ineffective due to the fact that they are trying to apply technologies that are well-developed on the binary code, to interpreted files on the web, which are just text. The text is it is easily modified and obfuscated, therefore, all modern web scanners have a large number of false positives, which makes it difficult to identify malicious inserts in those files that are located on the server.

Thus, the disadvantages of antivirus software include the inability to detect new or unknown viruses.

There are technologies that include some of the features of the claimed invention, however, in the prior art, the totality of the features of the claimed group of inventions is unknown, which allows to eliminate the above disadvantages of the known web scanning technologies. There is a method of increasing the security level of user tools when browsing web pages (application RU 2008103005, publication date of application: 10.08.2009. Convention priority: 06/28/2005 US 11 / 167,235. PCT publication: WO 2007/000751 20070104), according to which search the web -pages includes the stages in which web pages are classified according to the security rank and when providing a link to each web page, they provide the mentioned security rank together with the said link.

However, this method does not provide for the possibility of identifying a specific place inside the website where the malicious insertion is located, since the protection system only checks the links in the internal database and issues a verdict that is displayed to the user whether the page is trusted or not.

A known system and method of protection against malicious software based on fuzzy whitelisting (application RU 2014121249, publication date of application: 12/10/2015. Conventional priority: 02/11/2011 US 61 / 554,859; 12/06/2011 US 13 / 312,686. PCT publication: WO 2013 / 089576 20130620), including the execution in the client computer system of an initial scan of a variety of targets of the client computer system for malware in response to a preliminary determination of the initial scan for malware suspiciously Go facility for malware; generating in the client computer system a plurality of target hashes of the target objects, each target hash representing a separate code block of the target object, with each individual code block containing a sequence of processor instructions of the target object; sending multiple target hashes from the client computer system to the server computer system, connected to the client computer system by a global computer network; receiving by the client computer system from the server computer system a server indicator indicating whether the target object is malicious, and the server indicator is generated by the server computer system by obtaining a plurality of reference hashes of the reference object for at least one target hash from the set of target hashes, and the reference object are selected according to the target hash from the set of objects listed in the whitelist, and if the set of target hashes is not identical to the set of reference hashes, op determining the similarity index according to the number of hashes common to both a plurality of target hashes and a plurality of reference hashes; if the similarity indicator exceeds a predetermined threshold value, marking the target object as non-harmful.

However, this method implements a client-server scan of the network for malicious objects, in which the object is evaluated as malicious by the hash sum, and not by the functionality, and implements the signature search procedure, and if the object being scanned is not identified by its hash as malicious, then it considered harmless. This circumstance does not allow reliable and reliable web scanning.

Closest to the claimed invention is a system and method for checking web resources for the presence of malicious components, taken as prototypes of these objects (patent RU 2446459, Published: 03/27/2012.).

This system contains: (a) a checklist compiler designed to extract the addresses of web resources from client applications, determine scan parameters for the corresponding web resource address, and the checklist compiler is associated with a database of check parameters; (b) an identifier interception tool for extracting user identifiers from client applications, adding new user identifiers and storing user identifiers for the corresponding web resource address in the verification parameter database, wherein the identifier interception means is associated with the verification parameter database; (c) a verification tool designed to authorize on a web resource using user identifiers and to further check the web resource for malicious components, taking into account the scan parameters, while the scan tool is associated with a database of scan parameters; (d) said parameter database Validation, designed to store validation settings for web resources and user identifiers.

The method for checking web resources for malicious components according to the specified patent is based on the fact that the parameters for checking the web resource are set:

but. A list of web resource addresses for verification b. verification parameters are defined that correspond to each address of the web resource; at. user identifiers for authorization on a web resource are determined; Further, the web resource is checked based on the verification parameters and authorization data:

but. a connection is established with the web resource at the web resource address; b. Authorization is performed on the web resource using user identifiers corresponding to this web resource; at. The web resource is checked for malware and security risks.

However, this method does not have the ability to scan files and databases located on the server, but scans the data that they generate and displays to the user, which does not allow to identify specific places on the server where malicious code and data inserts are located. One of the obstacles that makes it difficult to reliably identify the code responsible for detecting malicious inserts is that the operation of the malicious insert may depend on many parameters determined by the developer of the malicious insert, such as the IP address of the request, request parameters, browser cookies (for example, it doesn’t re-enter exploit per page), browser language, browser status bar, and so on.

The claimed invention allows to solve this problem and to provide more reliable detection of malicious code.

The main objective of the claimed invention is to solve the problem of reliable and prompt detection of infections of web resources by fully checking the code of the website and the data that the verified code gives to the browser.

The technical result is to increase the reliability of detection of malicious inserts on sites by reducing the number of false positives web scanner by checking not only the code of the website itself, but also all the data that the website gives to the browser.

Unlike traditional scanning, in which the website code is scanned for potentially dangerous parts of the code using predefined signatures, which creates a large number of false positives, in the claimed invention, not only the website code is scanned, but also the data that it embeds in browser, in connection with which and on the basis of a larger volume of checked information more accurate verdicts are issued.

The technical result is achieved by the fact that the system for checking the website for malicious inserts contains a start address search unit connected to an interpreter configured to interpret all directions of code execution upon conditional transition, a decision making unit, a local database of safe elements, a local database malicious elements, a data reception and transmission unit, a code and data correction unit, a decomposition and composition unit and a dynamic analysis unit, while the interpreter is connected to decision block connected to the block of the local database of safe elements, the block of the local database of malicious elements, the block of reception and transmission of data and the block of correction of code and data, and the block of reception and transmission of data is connected to the decomposition and composition block connected to the dynamic analysis block .

An interpreter configured to interpret all directions of code execution in a conditional transition may contain, in one embodiment, an interpretation unit connected in series, an interpretation data memory unit, and an interpretation data analysis unit. The start address search unit is preferably configured to search all the start addresses of web pages located on a web resource by recursively parsing the contents of the web pages of the web resource or by analyzing entries in additional sources of similar information, for example, server logs. The data reception and transmission unit is configured to transmit data files to the decomposition and composition unit and receive data from its output. The data receiving and transmitting unit can be additionally connected to the global database of safe elements and the global database of malicious elements.

It is optimal to execute the code and data correction block in the form of software installed on the server, providing the ability to remove malicious code and data inserts detected as a result of web scanning.

The system may further comprise a traditional anti-virus scanner of malicious code.

At the same time, the decision-making unit can be made with the possibility of working together with a traditional anti-virus scanner of malicious code with subsequent joint analysis of the results of scanning and interpretation.

The technical result is also achieved by the fact that in the method of checking web resources for malicious insertions, a web resource is recursively parsed and related information sources, for example, web server logs, are analyzed, the web resource code is interpreted, and all directions of code execution under each conditional transition with logging the course of interpretation, and when interpreting the conditional branch command, the data on the course of interpretation is duplicated, and the code interpretation continues in all directions defined by the conditional transitions, with the output of web pages that check for malicious elements by comparing all elements of potentially harmful types contained in the outgoing data with a local database of safe elements and a local database of malicious elements, with a global database of safe elements and global a database of malicious elements, and in the case of identifying unknown potentially harmful elements, they are subjected to verification and dynamic analysis, after which they are installed There is a correspondence between each detected malicious element and the code on the site that entered these elements into the source data.

It is optimal to analyze the code that the malicious element inserted, by fully identifying it, including entries in the server databases. It is preferable to inform the owner of the web resource about the presence of malicious code on the server and the need to remove it, and if consent to delete the malicious code is obtained, delete it from the server. Optimally, the data of potentially harmful elements is saved in a file and transferred for verification using a global database of known safe elements and a global database of malicious elements.

It is advisable to expose data of potentially harmful elements using static and dynamic analysis methods.

It is preferable, in the presence of potentially harmful elements, to decompose the document into many decomposed elements from which files are created, the number of which correlates with the number of decomposed elements, and then to compose one or more sets of files from decomposed elements subjected to dynamic analysis.

When composing, a functional, or incremental, or complete compositional set of files from decomposed elements is performed.

The drawing schematically shows a block diagram of a system with which the inventive method is implemented.

The system for checking the web resource for malicious inserts consists of a block 1 for searching for start addresses on the server web data connected to the interpreter 2, containing the interpretation block 2.1 and the interpretation data block 2.2 connected to block 2.3. analysis of interpretation data. The interpreter 2 block 2.1 is connected to the decision block 4 connected to the local database of 5 safe elements, the local database of 3 malicious elements, as well as to the code and data correction unit 7 connected to the notification block 8. The data transmission and transmission unit 6, also connected to decision block 4, connected to a global database of 9 safe elements, and a global database of 11 malicious elements, and also connected to a decomposition and composition block 10, which is connected to a dynamic analysis block 12.

The inventive method is implemented using the specified device as follows. Block 1 of the search for start addresses 1 on the web data of the server searches and forms a list of all available links that are present in those HTML pages that the server provides, recursively parsing the web resource, and It also analyzes server log files and other data sources about entry points, for example, URL addresses that were entered at the server input and from which the server returned page data for analysis of a web resource.

All received links to the web resource pages are transferred to the interpreter 2 interpretation block 2.1 as an input parameter - as a parameter of the code interpretation function (т1е ге Шюп_1 1соп ('бър: / ^^

after which the website code begins to be sequentially interpreted using the interpretation block 2.1, which is characterized by a unique logical identifier, since among all the logical identifiers there is not one identical. At the same time, the interpretation progress is recorded in memory block 2.2. an internal database in the form of a data set - “the interpreted function + incoming parameters received by the function + data coming from the function”, uniquely associated with the logical identifier of the code interpretation stream, for example, ID 10. For example, in the case of interpreting a site written in the PHP programming language , if an interpretation unit 2.1 detects a construct of the form $ query = sprint

("SELECT firstname FROM friends WHERE firstaame-% s', $ firstaame) in the memory block 2.2. Of the interpretation data, the sprintf function will be written, the input parameter and the result of the $ query processing the given query. The interpretation process in the interpretation block 2.1 and the interpretation data will be stored in the memory block 2.2. goes sequentially to the meeting point in the code of the web resource of the conditional branch operator (for example, “if”). In this case, the interpretation block 2.1 suspends the initial interpretation stream (in our case, ID 10), creates a new interpretation stream in the state "Suspension flax ”with a new unique logical identifier (for example, ID20), creates an identical copy of the data of its execution flow (that is, ID 10) with the help of memory block 2.2. interpretation data of identifier 2 and assigns the identifier of the new execution thread (in our case, ID20 ). Thus, in the process of interpretation, an identical clone of the initial stream of interpretation is created, but with a different logical identifier. Further, the interpretation unit 2.1 emulates the execution of the unconditional transition command so as to find out the transition points of the interpretation flow, both in the case of the fulfillment of the transition condition and in case of its non-execution. After that, the initial stream of interpretation is translated into the interpreted code as if the transition condition in the conditional operator the transition is executed, and the cloned interpretation stream is transferred to the interpreted code in this way if the condition of the transition in the conditional transition operator is not fulfilled. After that, both interpretation flows are transferred from the “paused” state to the activity state, continuing to interpret the code. Thus, the creation of two interpretation streams is intended to separate the stream of interpretation of the website code in the case when a conditional transition occurs in the path of the interpreter. The reason for this is that the interpreter checks all parts of the code posted on the website to identify malicious inserts, which are protected by the fact that during the usual linear interpretation they will not be interpreted and you can skip the malicious code insert. Therefore, after separation, the starting point of execution of the new interpretation flow is set in such a way as to start the interpretation from the very first line of code located after checking the conditions for switching to different branches of the code during interpretation. As a result of this sequence of operations, the code is interpreted along all branches of the code caused by conditional branch commands. Upon completion of the interpretation, interpreter 2 generates one or more web pages with data from the output of interpretation block 2.1, after which it transfers these pages to decision block 4. Block 4 decision making checks received from block 2.1. interpreter 2 data, for example, web pages, XML documents or other structured data by static analysis using a local database of 5 safe elements, and then checks them in the local database block 3 malicious elements. If any element or elements of potentially dangerous types were not unambiguously identified in decision block 4 after a static analysis, then decision block 4 updates the local database of known safe elements by receiving new data through block 6 for receiving and transmitting data from a global database of 9 known safe elements. Thus, the local database is updated if an unknown element of a potentially dangerous type is found, since it is likely that it was already identified elsewhere during scanning and that it is already in the global database, but the description of the element has not yet reached the specific local system database . Similarly, the local database of malicious elements is updated by receiving new data from the global database of 11 malicious elements in the event of detection of elements of potentially malicious types. After these actions, the data is checked again in decision block 4 so as not to initiate once again the analysis of potentially harmful elements established during interpretation by the interpreter 2 if they are detected in global databases, which saves the resources of the claimed system and increases it performance. If the data received from the interpreter 2 still contains unknown elements or elements of potentially dangerous types, then the data received from the interpreter 2 is transmitted to the data transmission and reception unit 6. At this stage, decision-making block 4 in one of the embodiments of the claimed system can work together with a traditional malware scanner (not shown in the drawing) working on signatures and heuristics, mainly on signatures based on regular expressions with subsequent comparison of the data obtained by both paths for more reliable identification of malicious correlations in the source code and the data that this code produces. The specified implementation can help reduce the number of false positives of the system. Block 6 transmitting and receiving data transfers the received files containing unknown elements of potentially dangerous types to the decomposition and composition block 10. The decomposition of the received document in block 10 is carried out by breaking it into many separate elements containing unknown elements of potentially harmful types, from which many files with decomposed elements. After decomposition of the received document into individual constituent elements, a set of files is compiled based on a functional, incremental or complete set. The decomposition and composition operations are carried out in the same way as presented in the description of the patent of the Russian Federation Ν ° 2613535 (publ. March 16, 2017).

Thus, block 10 decomposes the document with unknown components into multiple decomposed elements. Files are created from the latter, the number of which correlates with the number of decomposed elements, and then one or more sets of files are composed of decomposed elements, which are transferred to the dynamic analysis unit 12 for performing a dynamic method check, which means that block 12 monitors deviations from the normal client program behavior as a result of the actions of malicious elements. Dynamic analysis process It is aimed at triggering malicious code contained in a file, and it should work in the time that is allocated to dynamic analysis. In this case, the fact of the exploit’s response should be recorded, identifying the progress of the client software as abnormal. To achieve the task, the dynamic analysis unit can be performed in the form of software installed on the server of the system, executed on the basis of a program for working with the corresponding type of documents and having or emulating vulnerabilities associated with the interpretation of this type of documents, the task of which is to identify anomalies of progress execution of the software client or its behavior.

Also, as a block 12 of dynamic analysis, in one of the options, special software can be used installed on the server of the system and emulating, partially or completely, the progress of the client software related to the interpretation of the corresponding file type.

A program for working with the corresponding type of documents, installed on the server of the system and which is part of the dynamic analysis unit, is able to work with a certain type of file, interpreting its internal structure in such a way as to trigger the malicious code in the file if it is present there. The abnormal course of execution of program code is characterized by a deviation of the paths for executing client software instructions from those prescribed by the developers of this software so that malicious code contained in one form or another in a file from a composition set can be executed in the same way as client program code providing. The abnormal behavior of client software is characterized by the use of tools provided by the operating system to perform operations that are not required to interpret the internal structure of the file from the composition set and, therefore, go beyond the predefined set of allowed actions and calls of the operating system tools for this client software.

After decomposition and composition, as well as dynamic analysis, allowing to separate safe elements from malicious ones, the verdict for each of the composition files is sent from the dynamic analysis block 12 to the decomposition and composition block 10, after which the decomposition and composition block makes a verdict for each of the unknown elements of potentially harmful types, according to the results of which the global database of known safe elements 9 and the global database of 11 known malicious elements are updated. In addition, a decision command 4 is sent to the decision block 4 through the data reception and transmission unit 6 from the output of the decomposition and composition block 10 to update the local database of 5 known safe elements by receiving new data from the global database 9 of known safe elements, then the local database 3 is updated malicious elements by obtaining new data from the global database of 11 malicious elements, after which the data received from the interpreter 2 is checked again.

In the event that one or more malicious elements are detected, decision block 4 sends a request for each of the detected malicious elements to block 2.3. analysis of interpretation data, which identifies the code that inserted these elements into the source document, then analyzes the relationship of this code with the other code of the web resource and its databases in order to identify other possible parts of the malicious code and malicious entries in the web database resource. Thus, an analysis is made of exactly which code inserted the malicious element into the data that went to the server’s output, where this code takes data, what kind of data it is, what is the binding of this code, and what other code on the site is it connected to, i.e. e. a reverse analysis is performed using the database 2.2, in which the interpretation unit 2.1 records all the information about the progress of the interpretation.

After that, by static analysis of the dependencies and relationships of this specific part of the code with other parts, the interpreter 2 data analysis unit 2.3 identifies all malicious code responsible for embedding malicious elements in the pages displayed to the user in the browser. Then the data about the parts of the code and pointers to them in the form of “directory name / file name / line number”, as well as identifiers of entries in the databases identified as malicious, are transferred to decision block 4, which transfers this data to block 7 code correction and data that conducts additional analysis of the detected malicious code in order to more accurately and fully identify its relationships with other parts of the code and website data, including entries in server databases. Further, information about the detected malicious code enters notification block 8, which informs the administrator or owner of the web resource about the presence of malicious code on the server and asks for confirmation to remove the detected malicious code. In the event that confirmation of the removal of the malicious code is received, all detected malicious code is deleted from the server using the data and code correction unit 7, and the corresponding files or parts of the files are erased. The information in the description are only examples that do not limit the scope of the present invention presented in the formula. One skilled in the art will recognize that other embodiments of the present invention may exist based on the nature and scope of the present invention. The inventive system and method for checking web resources for the presence of malicious inserts can be widely used to ensure the security of web resources, as they can reliably and reliably detect malicious inserts on sites by reducing the number of false positives for a web scanner by checking not only the web code itself site, but also all the data that is issued by the site to the browser.

Claims

Claim

1. A system for checking web resources for malicious insertions, containing a start address search unit connected to an interpreter configured to interpret all directions of code execution under conditional transitions, a decision unit, a local database of safe elements, a local database of malicious elements, a data reception and transmission unit, a code and data correction unit, a decomposition and composition unit and a dynamic analysis unit, the interpreter being connected to the decision making unit, connected go with the local safe elements database unit, the local malicious elements database unit, the data reception and transmission unit, and the code and data correction unit, and the data reception and transmission unit is connected to the decomposition and composition unit connected to the dynamic analysis unit.

2. The system according to claim 1, characterized in that the interpreter is made in the form of series-connected interpretation unit and the interpretation data memory unit connected to the interpretation data analysis unit.

3. The system according to claim 1, characterized in that the data reception and transmission unit is connected to a global database of safe elements and a global database of malicious elements. 4. The system of claim 1, wherein the start address search unit is configured to search for all start addresses of web pages located on the web resource by recursively parsing the contents of the web pages of the web resource.

5. The system of claim 1, wherein the start address search unit is configured to search for all start addresses of web pages located on a web resource by analyzing entries in server logs.

6. The system according to claim 1, in which the code and data correction unit is made in the form of software installed on the server with the ability to remove malicious code elements and data detected as a result of web scanning.

7. The system under item 1, characterized in that it contains an anti-virus scanner of malicious code.

8. The system according to claim 1, characterized in that the decision block is configured to work together with an anti-virus scanner of malicious code, followed by a joint analysis of the results of scanning and interpretation.

9. The system according to claim 1, characterized in that the code and data correction unit is made in the form of software installed on the server, with the possibility of removing malicious code and data inserts detected as a result of web scanning.

10. The method for checking web resources for malicious inserts by the system according to claim 1, which consists in recursively parsing a web resource and analyzing related sources of information, for example, web resource logs, interpreting the web resource code with logging the interpretation process moreover, when interpreting a conditional branch command, the data on the interpretation process is duplicated, and code interpretation is continued in all directions defined by conditional branches, with the output of one or more web pages into They identify potentially harmful elements by comparing all elements of potentially harmful types with those contained in the local database of safe elements and the local database of malicious elements, and if unknown potentially harmful elements are detected, they are checked and dynamically analyzed, after which they establish compliance between each detected malicious element and the code on the site that entered these elements into the source data.

11. The method according to claim 10, characterized in that the code that introduced the malicious element is identified, including entries in the databases of the web resource.

12. The method according to claim 10, characterized in that it further informs the owner of the web resource about the presence of malicious code on the server and offers to remove the malicious code, and if consent is obtained to delete the malicious code, the detected malicious code is deleted from the server.

13. The method according to p. 10, characterized in that the generated data containing unknown elements of potentially harmful types are stored in a file and transmitted for verification using a global database of known safe elements and a global database of malicious elements.

14. The method according to p. 10, characterized in that in the presence of unknown elements of potentially harmful types, they are checked by decomposition document for many decomposed elements from which files are created, the number of which correlates with the number of decomposed elements, after which one or more sets of files are composed of decomposed elements, which are subjected to dynamic analysis.

16. The method according to claim 10, characterized in that when the composition is carried out functional, or incremental, or a complete compositional set of files from decomposed elements.