CN107276986B

CN107276986B - Method, device and system for protecting website through machine learning

Info

Publication number: CN107276986B
Application number: CN201710346955.6A
Authority: CN
Inventors: 王茁
Original assignee: 中云网安科技(北京)有限公司
Current assignee: Zhongyun Wangan Technology Co ltd
Priority date: 2017-05-17
Filing date: 2017-05-17
Publication date: 2020-12-18
Anticipated expiration: 2037-05-17
Also published as: CN107276986A

Abstract

The invention relates to a method for protecting a website through machine learning, which constructs a website security model through machine learning to protect the website, and comprises the following steps: the website protection device establishes reverse proxy connection with a website to receive an access request for the website, and the access request sends out an alarm signal; judging whether the access request corresponding to the alarm information is released, if so, adding a release rule; and establishing a website security model of the specific website according to the release rule, and intercepting the access with the threat by the website protection device according to the website security model. The technical scheme of the invention can safely protect information systems such as Web-based websites, education, E-commerce, banks and the like. The method can effectively prevent unknown attacks, reduce maintainers and maintenance cost, and reduce enterprise cost.

Description

Method, device and system for protecting website through machine learning

Technical Field

The invention belongs to the field of network security, and particularly relates to a method for protecting a website through machine learning.

Background

With the rapid development of the internet, the problem of website security is highlighted, and website applications often bear main business functions and store a large amount of valuable data. A traditional website firewall adopts a feature library mode to protect a website and cannot defend unknown threats. Attacks against web servers and against databases are increasing, such as attacks against database SQL injection vulnerabilities, or attacks against server ports. Due to the diversity and complexity of the website, the content of the website cannot be learned comprehensively in a manual mode.

In the prior art, protection of a website is mainly based on matching of a blacklist feature library, access requests which do not conform to a blacklist can be passed through, however, the method has hysteresis, and the feature library of the blacklist is often updated after attack. Since the unknown attacks do not conform to the features in the feature library, they are often passed through, leaving a potential safety hazard. Moreover, due to the fact that the content elements of the websites are various, the access characteristics of users to different content elements are different, different content elements have different potential safety hazards, the website access characteristics of each website are different, and the one-time protection method of the blacklist feature library is not suitable for the protection particularity of different websites.

In addition, in the prior art, self-learning and white list technologies are adopted, and the self-learning technology based on a statistical method is used for analyzing user behaviors and HTTP request parameters of specified URLs and assisting an administrator to construct a normal business model to form a white list rule. However, the method has the disadvantages that manual participation is needed to judge and check item by item, time and labor are consumed, and the error probability is very high. If the website has new services, manual analysis is still needed to be participated in, and new rules are constructed. This results in inefficiency, takes up significant administrator working time, and is inefficient due to the high probability of error.

Disclosure of Invention

The technical problem to be solved by the invention is to overcome the defects of the prior art, provide a high-efficiency intelligent method for protecting websites from unknown attacks, solve the problem that users have different access habits to each website aiming at different characteristics of each website, quickly establish a new website access model by accessing each website through a machine learning method, avoid extra labor cost and maintenance time, reduce the deployment cost of safety protection and save the construction time.

Another problem to be solved by the present invention is how to establish a new website access model based on the original website access model after upgrading the website service or expanding the content elements.

In order to solve the technical problems, the invention adopts the technical scheme that:

a method for protecting a website through machine learning, which builds a website access model to protect the website through machine learning, comprises the following steps:

s1: establishing a reverse proxy connection with a website to receive an access request to the website;

s2: sending out alarm information according to the access request;

s3: judging whether the access request corresponding to the alarm information is released, if so, adding a release rule;

s4: and repeating the steps S2 and S3 until all the business of the website is accessed, and learning the release rule to establish a website security model.

Further, the method also comprises the following steps:

s5: and judging whether the access request conforms to the website security model, and if so, allowing the access to the website.

Further, after S4, establishing the website security model, the server intercepts or releases the threatened access according to the release rule of the website security model.

Further, after the website is updated, the method further includes:

s6: repeating S2, S3 until all updates to the website are accessed, learning the rules for release and updating the website security model.

Further, whether an access request accords with the website security model is judged, and if not, the access request is intercepted;

and further, judging whether the access request conforms to the website security model, if not, intercepting the access request and sending alarm information.

Further, the administrator in S3 determines whether the access request corresponding to the warning information is released,

further, the administrator is a program or a human being with management authority, and preferably, a human being with management authority.

Further, the step S1 of establishing the reverse proxy connection with the website specifically includes:

reading a website home page;

and accessing all services of the website through the home page of the website.

Furthermore, all the services of the website at least comprise hyperlinks, documents, cookies, forms, pictures, videos and login requests.

Further wherein the access requests are assumed to be all threatening access requests.

Further, the access request is specifically other access requests except for access to the website homepage and the links in the webpage.

Further, the rule of releasing described in S3 is a correct access rule defined by an administrator, and preferably, a rule of releasing a regular expression added automatically.

Further, different access rules are established according to different accessed web page contents, and the website security model in S4 is composed of release rules for different website contents of a specific website.

Further, the website protection device can be set to a passive mode and an active mode, wherein the passive mode does not intercept the access request, only sends out alarm information to the access request with threat, and writes in an access permission rule according to the judgment made by the administrator for the alarm information, and the active mode not only sends out the alarm information to the access request with threat, but also intercepts the access request which does not meet the website security model.

Further, the passive mode and the active mode can be switched,

further, the passive mode and the active mode can be switched automatically or manually,

preferably, the passive mode and the active mode are switched for the administrator.

The invention relates to a website protection device for protecting website security through machine learning, which comprises an access request receiving module, an alarm prompting module, a judging module and a machine learning module, wherein:

the access request receiving module is used for establishing reverse proxy connection with the website so as to receive an access request to the website;

the alarm prompt module is used for sending out alarm information according to the access request;

the judging module is used for judging whether the access request corresponding to the alarm information is released, and if so, a releasing rule is added;

after the alarm prompt module and the judgment module establish the release rules for all the services of the website, the machine learning module is used for learning the release rules to establish a website security model.

Further, the machine learning module judges whether the access request conforms to the website security model, and if so, the access to the website is allowed.

Further, after the alarm prompt module and the judgment module establish a release rule for all updates of the website, the machine learning module learns the release rule to update the website security model.

Further, the establishing of the reverse proxy connection between the access request receiving module and the website specifically includes:

reading a website home page;

and accessing all services of the website through the home page of the website.

Further, the alarm prompt module sends out alarm information to all access requests which are supposed to be threatened.

Further, the alarm prompt module sends alarm information to other access requests except for the access to the website homepage and the links in the webpage.

The invention relates to a system for protecting website security through machine learning, which comprises an access request end, a website protection device, a website server end and an administrator end, wherein:

the website protection device establishes reverse proxy connection with a website server to receive an access request from an access request terminal to the website server;

the website protection device sends out alarm information according to the access request;

the administrator side judges whether the access request corresponding to the alarm information is released, if so, a release rule is added to the website protection device;

the website protection device learns all services of a website at a website server side, adds a release rule and learns the release rule to establish a website security model to the website protection device;

the website protection device judges whether an access request from the access request terminal to the website server terminal conforms to the website security model, and if so, the website server terminal is allowed to be accessed.

Further, after the website is updated, the website protection device sends out alarm information according to the access request from the access request terminal to the website server terminal;

and the website protection device learns the release rule to update the website security model until all updates of the website are accessed.

Further, the establishing of the reverse proxy connection between the website protection device and the website server specifically includes:

reading a website home page; and accessing all services of the website through the home page of the website.

The method can effectively solve the problems that each website has different contents or different access habits of users, is different from a cut-off interception strategy of a firewall and cannot update and learn different website contents in time, and the website protection device can rapidly learn different website contents and built structures by adopting the technical scheme, so that the content distribution and the structural characteristics of the website are built, and the safe access rule is built for the access habits of the different website contents by a large number of trial and error accesses of the users. After certain safety access rules are accumulated, website safety models can be established for different websites in a targeted manner, so that threatened accesses are intercepted according to the website safety models, and the accuracy and comprehensiveness of interception are greatly improved. Meanwhile, only when the website protection model is initially built, manual active judgment is needed to help machine learning, once the building of the security access architecture for the website is completed, the website protection device is switched from a passive mode to an active mode, and the security access protection for different architectures and content information of the website can be completed without manual operation. The website interception efficiency is greatly improved, and the manual deployment time is reduced. Meanwhile, once a new service is expanded or a new architecture is added to the website, the website protection device is passively or actively switched back to a passive mode, a security access rule is formulated for the threat access of the new service or the new architecture of the website on the basis of the existing website security access architecture, and then the new access rule is written into the original website security access architecture, so that the machine learning time is greatly reduced. And after the addition of the new security access rule is completed, automatically or manually switching back to the active mode, and intercepting suspicious website access by taking the new website security access architecture as a standard.

After adopting the technical scheme, compared with the prior art, the invention has the following beneficial effects:

1. the deployment time of the website protection device is shortened, and the interception accuracy of the threat access of the website is improved. The working time of the arrangement of the website protection device which needs to be invested manually is reduced.

2. The invention can greatly reduce the post-maintenance investment of the deployed website protection device, and the cost and labor cost of enterprises or units for post-updating and maintaining the website protection device are reduced because the post-updating is simpler and quicker.

3. The technical scheme of the invention can safely protect information systems such as Web-based websites, education, E-commerce, banks and the like, and because each website has a specific and independent website security access architecture and protection model, the accuracy of preventing unknown attacks is effectively improved, the range of preventing the unknown attacks is expanded, and the unknown attacks are better intercepted on the basis of not hindering normal access.

The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention, are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting the invention to the right. It is obvious that the drawings in the following description are only some embodiments, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a flowchart illustrating an embodiment of a method for protecting a website through machine learning according to the present invention;

FIG. 2 is a flowchart illustrating another embodiment of a method for protecting a web site through machine learning according to the present invention;

FIG. 3 is a schematic diagram of a website protection device for protecting a website through machine learning according to the present invention;

FIG. 4 is a diagram of a system for protecting a website through machine learning according to the present invention.

It should be noted that the drawings and the description are not intended to limit the scope of the inventive concept in any way, but to illustrate it by a person skilled in the art with reference to specific embodiments.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and the following embodiments are used for illustrating the present invention and are not intended to limit the scope of the present invention.

Example one

As shown in fig. 1, the method for protecting a website by machine learning according to this embodiment includes the following steps:

s101, switching to a passive mode, and establishing reverse proxy connection with a website to receive an access request to the website;

the website protection device is switched to a passive mode, is arranged between a website server side and a user access side, receives an access request from a user side, sends the request to the website server side, and sends a return result of data from the server side to the user access request side. The website protection device uses the homepage of the website, namely the root directory, as an entrance to perform machine learning.

S102, sending alarm information to an administrator according to the threatened access request;

the website protection device protects the website in a reverse proxy mode, a large number of users monitor access modes and habits in different webpage or content information access of the website, and the website protection device is in a passive mode at the moment. For a detected threatening or suspected abnormal website access, the website protection device prompts an administrator for a suspected access in a warning mode. The biggest difference from the traditional protection method is that the traditional firewall considers all access requests to be legal unless the access requests are matched with the threat feature library; the present invention assumes that all access requests are illegal, and only requests at the web site homepage and its linked portal are legal. The machine learning engine assumes that the request for the portal is secure, and the machine learning engine learns the return of the access request, including forms, links, etc., all but not the learned links.

For example, a website home page of a bank has contents such as hyperlinks, pictures, documents, forms and the like, the website protection device firstly accesses the home page of the website by the access habit of a normal user to acquire the content information, and a large framework is provided for the specific access structure of the home page of the website. The website protection device can access each page according to an address guided by the hyperlink and learn each accessed page at the same time, and a framework of each page is built until all the websites built by a flat physical structure and a tree-shaped physical structure or a logic structure under the websites are accessed. While access requests to the web page are deemed secure, access to links in the web page is also deemed secure, and all access requests to web pages other than that are deemed to be threatening. For example, access to web page content of a web site other than the home page, and access requests for pictures, documents, forms, etc. are considered to be threatening and unsafe. In response, an alarm message is sent to the administrator.

S103, judging whether the access request corresponding to the alarm information is released or not by an administrator, and if so, adding a release rule;

after the administrator logs in the administrator interface or sees the threat access in the mail, the administrator judges the threat access, makes the judgment of permission or prevention, prevents the dangerous threat access, allows the threat access without danger to pass, and feeds back the judgment result to the website protection device. The system helps machine learning of the website protection device, and the judgment of an administrator is used as a reference for the machine learning. The administrator is not limited to a human administrator, and may be artificial intelligence or software and hardware with judgment capability. Furthermore, even a human administrator does not necessarily need to be a large number of technical personnel with background expertise, as long as the human administrator can complete the access request for judging whether to release the threat and assist the website protection device to establish the access rule.

And S104, repeating S102 and S103 until all the services of the website are accessed, and learning the release rule to establish a website security model.

The release rule added by the website protection device according to the judgment of the threatening access request is a learning process of website services, the steps S102 and S103 are repeated continuously, the website security model performs machine learning through the entrance of the website homepage to learn all the services of the website, and the release rule is added to each service. These rules are added to the website protection device, for example in the form of regular expressions, which are learned by a machine to establish a security model for the website.

S105, switching to an active mode, judging whether the access request conforms to the website safety model, and if so, allowing the website to be accessed.

The website protection device is switched to an active mode, in the active mode, the access request is judged according to the established security model, and if the access request meets the security model, the access request is released according to the access rule established in the security model; and if the access request does not meet the security model, intercepting the access request.

For example, the security model of the website is established based on the access release rules of different content elements of the website, and after the establishment of the security model of the website is completed, the security model has corresponding access rules for all content elements in the website. If the access rule of the user login interface content in the bank website is different from the access rule of the content of the purchased financial product, the website security model stores different access rules for different webpage contents, and the website protection device judges and releases or intercepts all the accesses to the website according to the different access rules.

For another example, the most common type of vulnerability in database vulnerabilities is SQL injection vulnerability, which occurs not only at the WEB side but also in custom or standard library storage procedures, functions, triggers of the database. The SQL injection vulnerability of the database is more threatening to the database than the injection vulnerability of the WEB end.

SQL injection vulnerabilities are user mixed program commands into the input. The most direct example is that an attacker transmits own SQL codes to corresponding application programs through user input in a normal Web page, so as to execute some unauthorized SQL codes, thereby achieving the purpose of modifying, stealing or destroying database information. The SQL injection attack may even help group attackers to bypass the user authentication mechanism, allowing them to fully manipulate the database on the remote server. If an application uses some user-entered data to construct a dynamic SQL statement to access a database, it may be subject to SQL injection attacks. SQL injection is also likely to occur if stored procedures are used in the code and these stored procedures lack reasonable restrictions on user input.

In a typical SQL injection method, the normal request inputs are: http:// foo/rss. aspx? keyword is a lucky and the request input with offensiveness is: http:// foo/rss. aspx? keyword') in a group; for a website with SQL vulnerability, wrong information of the database can be exposed on a browser, and a hacker can dig out more information through the information. Learning the correct input for machine learning, i.e., learning http:// foo/rss. aspx? And establishing a security model. But because of http:// foo/rss. aspx? keyword') that the administrator does not issue a similar request and therefore the machine does not learn about the request. So after the data switches to protected mode, are there no match to http:// foo/rss. aspx? keyword') to be blocked from access. Thereby securing SQL injection vulnerabilities.

The website protection device can learn a normal access mode without threat through learning, and further prevent the access with threat from establishing connection with the server through auditing.

The method for protecting the website through machine learning reduces deployment time of the website protection device and improves interception accuracy of threat access to the website. The working time of the arrangement of the website protection device which needs to be invested manually is reduced. The technical scheme of the embodiment can safely protect information systems such as Web-based websites, education, E-commerce and banks, and because each website has a specific independent website security access architecture and a protection model, the accuracy of preventing unknown attacks is effectively improved, the range of preventing the unknown attacks is expanded, and the unknown attacks are better intercepted on the basis of not interfering normal access.

Example two

The method for protecting the website through machine learning provided by the embodiment is used for rapidly and accurately updating the website security model after the website is updated. The method specifically comprises the following steps:

s201, switching to a passive mode, and establishing reverse proxy connection with a website to receive an access request to the website;

s202, sending alarm information to an administrator according to the threatened access request;

s203, judging whether the access request corresponding to the alarm information is released by an administrator, and if so, adding a release rule;

and S204, repeating S202 and S203 until all the services of the website are accessed, and learning the release rule to establish a website security model.

S205, switching to an active mode, judging whether the access request conforms to the website security model, and if so, allowing the access to the website.

And when the website is switched to the active mode, the website protection device judges according to the security model, and if the access request conforms to the website security model, the website is allowed to be accessed. Further, if the access request does not conform to the website security model, the access request is intercepted; or if the access request does not accord with the website security model, intercepting the access request and sending out warning information.

When the access request does not accord with the website access model, the website protection device can be set to only intercept or both intercept and send out alarm information for the access request which does not accord with the website security model. If the access request is not in accordance with the website security model, the alarm information is intercepted and sent out, and then the access rule can be added to the security access model according to the steps from S202 to S204.

S206, after the website is updated, switching back to the passive mode and repeating S202 and S203 until all the updates of the website are accessed, and learning the release rule to update the website security model

In this embodiment, on the basis of the first embodiment, when the content of the website is updated or added, the administrator may switch the network protection device to the passive mode again, so as to update the website security model. The switching frequency of the administrator for the passive mode and the active mode may be arbitrary, and the unit of the frequency may be day, month, year, or hour, minute, and second. If the administrator is human, the switching may be done at intervals, but if the administrator is artificial intelligence, software, or other software hardware, the frequency of switching may vary.

For example, a website protection device has completed a website access model for a certain bank website, but a bank needs to expand the credit card business of the bank, design a new website for applying and inquiring a credit card, and need to bring the website online to the original website. At this time, the newly online web page may have web page modules that have not been used before and a vulnerability brought with the web page modules, and in order to prevent the vulnerability from being attacked by exploitation, the existing web site access model needs to be updated by the web site protection device. At this time, the administrator switches the website protection device to the passive mode again, builds a network access architecture for the newly added website part according to the steps from S201 to S204, and completes the network access architecture for the new website by combining the former architecture. And after the website safety model is finished, switching to an active mode, and intercepting the access which is judged not to accord with the new website safety model by using a new network access architecture.

The method for protecting the website through machine learning in the embodiment can greatly reduce the post-maintenance investment of the deployed website protection device, and the cost and labor cost of enterprises or units for post-updating and maintaining the website protection device are reduced because the post-updating is simpler and quicker.

EXAMPLE III

As shown in fig. 3, a website protection apparatus 300 using the above-mentioned method for protecting a website by machine learning includes a module 301 for receiving an access request, a module 302 for prompting an alarm, a module 303 for determining, and a module 304 for machine learning;

the access request receiving module 301 is configured to establish a reverse proxy connection with a website; the establishing of the reverse proxy connection specifically includes that the access request receiving module 301 reads a website homepage, and accesses all services of a website through the website homepage to receive an access request to the website;

the alarm prompt module 302 is used for sending alarm information for access requests supposed to be all threats; wherein the threatened access is other access requests except for the access to the website homepage and the links in the webpage;

after the alarm prompting module 302 and the judging module 303 establish a release rule for all the services of the website, the machine learning module 304 is configured to learn the release rule to establish a website security model;

the machine learning module 304 determines whether the access request conforms to the website security model, and if so, allows the website to be accessed;

after the website is updated, after the warning prompt module 302 and the judgment module 303 establish a release rule for all updates of the website, the machine learning module 304 learns the release rule to update the website security model.

The machine learning module 304 determines the access request according to the updated website security model, and the access request conforms to the access permission of the updated website security model.

The website protection apparatus 300 in this embodiment is a website protection apparatus for protecting website security through machine learning in the present invention, and may be software installed on a website server side, or hardware arranged between the website server side and an access request side, or other software and hardware capable of completing the above processes.

Example four

As shown in fig. 4, a system 400 for protecting website security through machine learning according to the present invention includes an access request terminal 404, a website protection device 402, a website server terminal 401, and an administrator terminal 403, where the website protection device 402 establishes a reverse proxy connection with the website server terminal 401, and accesses all services of a website through a website home page by reading return data of the website home page in the website server terminal 401; to receive an access request from the access request terminal 404 to the website server terminal 401;

the website protection device 402 sends out alarm information according to the access request;

the administrator 403 determines whether the access request corresponding to the alarm information is released, and if so, adds a release rule to the website protection device 402;

the website protection device 402 learns all services of the website server 401, adds a release rule and learns the release rule to establish a website security model to the website protection device 402;

the website protection device 402 determines whether the access request from the access request terminal 404 to the website server terminal 401 conforms to the website security model, and if so, allows the website server terminal 401 to be accessed;

after the website is updated, the website protection device 402 sends out warning information according to the access request from the access request terminal 404 to the website server terminal 401;

until all updates to the website are accessed, website protection device 402 learns the clearance rules to update the website security model;

the site protection apparatus 402 determines whether or not the access request from the access request terminal 404 to the site server terminal 401 satisfies the updated site security model, and if so, allows access to the site server terminal 401.

In this embodiment, the website server 401 is a web server capable of storing website content data and performing data transmission and exchange. The website protecting apparatus 402 is a website protecting apparatus for protecting the security of a website through machine learning in the present invention, and may be software installed in the website server 401, or may be hardware installed between the website server 401 and the access request terminal 404. The administrator 403 is artificial intelligence or software or human or any administrator capable of making a decision on the access request. The access request terminal 404 is an intelligent terminal, a personal computer, or other server.

Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for protecting website security through machine learning is characterized in that,

s1, switching to a passive mode, and establishing reverse proxy connection with the website to receive an access request to the website;

s2, sending out alarm information according to the access request, and not intercepting the access request in a passive mode;

s3, judging whether the access request corresponding to the alarm information is released, if so, adding a release rule;

s4, repeating S2 and S3 until all the businesses of the website are accessed, and learning the release rule to establish a website security model;

and S5, switching to an active mode, judging whether the access request conforms to the website security model, and if so, allowing the access to the website.

2. The method of claim 1, wherein when the website is updated, the method further comprises:

s6, repeating S2 and S3 until all updates of the website are accessed, and learning the releasing rule to update the website security model.

3. The method of claim 2, wherein the website security model is determined whether the access request conforms to the website security model, and if not, the access request is intercepted.

4. A method for protecting website security through machine learning as claimed in any one of claims 1 to 3, wherein it is determined whether the access request conforms to the website security model, and if not, the access request is intercepted and an alarm message is issued.

5. The method according to claim 1, wherein the step of establishing a reverse proxy connection with the website in step S1 specifically comprises:

reading a website home page;

and accessing all services of the website through the home page of the website.

6. A method for protecting website security through machine learning as claimed in any one of claims 1 to 3, wherein the access requests are assumed to be threat access requests.

7. A method for protecting website security through machine learning according to any one of claims 1 to 3, wherein the access request is specifically an access request other than access to the website homepage and the links in the webpage.

8. An apparatus for applying the method for protecting website security through machine learning of any one of claims 1-3, wherein the apparatus comprises a module for receiving access request, a module for prompting alarm, a module for determining, and a machine learning module, wherein:

9. A system for applying the method for protecting website security through machine learning according to any one of claims 1-3, wherein the system comprises an access request side, a website protection device, a website server side and an administrator side, wherein: the website protection device establishes reverse proxy connection with a website server to receive an access request from an access request terminal to the website server;