CN107547555B

CN107547555B - Website security monitoring method and device

Info

Publication number: CN107547555B
Application number: CN201710812031.0A
Authority: CN
Inventors: 张乐平; 张博; 李海峰; 侯磊
Original assignee: Beijing Deepctrl Co ltd
Current assignee: Beijing Deepctrl Co ltd
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2021-04-16
Anticipated expiration: 2037-09-11
Also published as: CN107547555A

Abstract

The invention provides a website safety monitoring method and a device, wherein the method comprises the following steps: extracting webpage content information from webpages corresponding to various websites of a target website, wherein the webpage content information comprises: at least one of video information, picture information or text information; respectively determining whether the content information of the web pages in each web page contains bad information; when bad information exists in the webpage content information of any webpage, determining that the target website is an abnormal website; sending prompt information to a management terminal corresponding to the abnormal website so that corresponding managers can maintain the abnormal website; and/or closing an access channel of the abnormal website or deleting the abnormal webpage with bad information to prevent the user terminal from continuously accessing the abnormal website or webpage. The invention monitors the webpage content of the target website in real time, and carries out quick early warning under the condition of identifying bad information in the webpage content, thereby realizing the omnibearing safety protection of the target website.

Description

Website security monitoring method and device

Technical Field

The invention relates to the technical field of information security, in particular to a website security monitoring method and device.

Background

At present, with the continuous popularization of internet technology, websites have become standard allocations of various units of governments, enterprises and associations, and in addition, various commercial websites of news, entertainment, forums and the like are available. The website enriches social communication modes and brings network security hidden dangers, such as: in 2015, a middle school officer in Sichuan Dazhou presented erotic advertisements; in 2015, a government official website at a certain place becomes a pornographic website, and is not maintained to be attacked by hackers for a long time; in 2015, the website of the city management bureau in a certain city is called as a hacker attack website by showing erotic links; in 2016, a website of a certain company is maliciously attacked by a competitor to refresh pornography and the like, and the phenomenon that the website is attacked by a hacker or invaded by a virus to present bad information such as pornography, gambling, reaction, violence and the like is rare, so that bad social influence is caused, the social image of the company establishing the website is seriously influenced, and certain economic loss is brought to the company, so that the website security problem is more and more concerned by people and the society in recent years.

Currently, a website security monitoring method is provided in the related art, and the method mainly includes: firstly, vulnerability scanning is carried out on a website by using antivirus software, and whether vulnerabilities exist in all webpages in the website, whether the webpages are hung up with horses or not, whether the webpages are tampered or not and the like are detected; and reminding a website administrator to repair and reinforce the website in time, thereby ensuring the safe operation of the web website. For example, a website security detection service platform provided by 360 corporation is http:// webscan.360.cn/, and the main detection items are: vulnerability detection, vulnerability repair, backdoor check and kill, and the like.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art: the website security monitoring method in the related art mainly scans viruses or trojans invading a website, and due to the fact that the viruses or trojans are various in types and extremely fast in updating and variation speed, due to the fact that the updating speed of a virus library is often delayed, the website cannot be attacked and protected in time fundamentally, and further the website cannot be protected in an all-round mode.

Disclosure of Invention

In view of this, an object of the embodiments of the present invention is to provide a method and an apparatus for monitoring website security, so as to implement omnibearing security protection on a target website.

In a first aspect, an embodiment of the present invention provides a website security monitoring method, where the method includes:

extracting webpage content information from webpages corresponding to various websites of a target website, wherein the webpage content information comprises: at least one of video information, picture information or text information;

respectively determining whether the webpage content information in each webpage contains bad information;

when bad information exists in the webpage content information in any webpage, determining that the target website is an abnormal website;

sending prompt information to a management terminal corresponding to the abnormal website so that a manager corresponding to the management terminal can maintain the abnormal website; and/or closing an access channel of the abnormal website or deleting the abnormal webpage with bad information to prevent the user terminal from continuously accessing the abnormal website or the abnormal webpage.

With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where when the web page content information includes picture information, the determining whether the web page content information in each web page includes bad information includes:

sensitive image area extraction processing is carried out on the picture information to be identified, and a plurality of sensitive image areas in the picture information are obtained;

calculating the bad information matching degree of each sensitive image area by using a first bad information identification model, and calculating the bad information existence probability of the picture information according to each bad information matching degree;

and when the existence probability of the bad information is greater than a first preset threshold value, determining that the bad information exists in the picture information.

With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where when the web page content information includes video information, the determining whether the web page content information in each web page includes bad information includes:

performing frame processing on the video information to be identified to obtain a plurality of video frames;

performing sensitive area extraction processing on each video frame one by one to obtain a plurality of sensitive areas in the video frame;

calculating the bad information matching degree of each sensitive area by using a first bad information identification model, and calculating the bad information existence probability of the video frame according to each bad information matching degree;

when the existence probability of the bad information is larger than a second preset threshold value, determining that the bad information exists in the video frame;

and when any video frame has bad information, determining that the video information has the bad information.

With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where when the web page content information includes text information, the determining whether the web page content information in each web page includes bad information includes:

sentence division processing is carried out on the text information to be recognized, and a plurality of independent sentences are obtained;

performing keyword segmentation processing on a plurality of independent sentences, and analyzing the dependency relationship among a plurality of keywords in each sentence;

according to the dependency relationship, carrying out negative emotion algorithm identification on each keyword or the combination of a plurality of keywords, and judging whether each sentence has negative emotion;

based on a pre-constructed sensitive vocabulary library, carrying out sensitive vocabulary identification on each keyword, and judging whether each sentence contains sensitive vocabulary;

if any statement has negative emotion and contains sensitive words, calculating the existence probability of bad information of the text information;

and when the existence probability of the bad information is greater than a third preset threshold value, determining that the bad information exists in the text information.

With reference to any one of the first aspect to the third possible implementation manner of the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where when the web content information includes at least two of three types of information, namely video information, picture information, and text information, the determining whether there is bad information in the web content information in each web page respectively includes:

respectively calculating the existence probability of bad information corresponding to each type of bad information in each type of information, wherein the bad information comprises at least one of the following: pornography, reflexion, violence;

judging whether the existence probability of the bad information corresponding to each type of bad information in each type of information is greater than a preset threshold corresponding to a certain type of bad information in the type of information;

and if the existence probability of any bad information is greater than the corresponding preset threshold value, determining that the bad information exists in the webpage content information.

With reference to the fourth possible implementation manner of the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the method further includes:

if the existence probability of all the bad information is not greater than the corresponding preset threshold value, calculating the comprehensive bad information existence probability corresponding to various types of bad information according to the existence probability of the bad information corresponding to each type of bad information in various types of information;

judging whether the existence probability of each comprehensive bad information is greater than a comprehensive preset threshold corresponding to the corresponding type of bad information;

and if the existence probability of any one piece of comprehensive bad information is greater than the corresponding comprehensive preset threshold value, determining that bad information exists in the webpage content information.

With reference to any one of the first aspect to the third possible implementation manner of the first aspect, an embodiment of the present invention provides a sixth possible implementation manner of the first aspect, where the sending of the prompt message to the management terminal corresponding to the abnormal website includes:

searching an information sending mode corresponding to the management terminal according to the management terminal, wherein the information sending mode comprises any one of the following modes: short message, WeChat, QQ, email, or telephone;

and sending the prompt information to the management terminal according to the information sending mode.

With reference to the first aspect, an embodiment of the present invention provides a seventh possible implementation manner of the first aspect, where the method further includes:

judging whether the target website is invaded and tampered or not according to the webpage content information;

if so, sending prompt information to a management terminal corresponding to the target website to remind a manager corresponding to the management terminal.

In a second aspect, an embodiment of the present invention further provides a website security monitoring apparatus, where the apparatus includes:

the information capturing module is used for extracting webpage content information from webpages corresponding to various websites of a target website, wherein the webpage content information comprises: at least one of video information, picture information or text information;

the bad information identification module is used for respectively determining whether the content information of the web pages in each web page contains bad information;

the abnormal website determining module is used for determining the target website as an abnormal website when bad information exists in the webpage content information in any webpage;

the prompt information sending module is used for sending prompt information to the management terminal corresponding to the abnormal website so that a manager corresponding to the management terminal can maintain the abnormal website; and/or closing an access channel of the abnormal website or deleting the abnormal webpage with bad information to prevent the user terminal from continuously accessing the abnormal website or the abnormal webpage.

With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the apparatus further includes:

the intrusion tampering determination module is used for judging whether the target website is tampered by intrusion according to the webpage content information;

and the prompt information sending module is further used for sending prompt information to a management terminal corresponding to the target website when the target website is determined to be invaded and tampered so as to remind a manager corresponding to the management terminal.

In the website security monitoring method and device provided by the embodiment of the invention, the method comprises the following steps: extracting webpage content information from webpages corresponding to various websites of a target website, wherein the webpage content information comprises: at least one of video information, picture information or text information; respectively determining whether the content information of the web pages in each web page contains bad information; when bad information exists in the webpage content information of any webpage, determining that the target website is an abnormal website; sending prompt information to a management terminal corresponding to the abnormal website so that a manager corresponding to the management terminal can maintain the abnormal website; and/or closing an access channel of the abnormal website or deleting the abnormal webpage with bad information to prevent the user terminal from continuously accessing the abnormal website or the abnormal webpage. According to the embodiment of the invention, the webpage content of the target website is monitored in real time, and the early warning is quickly carried out under the condition that the bad information exists in the webpage content, so that the comprehensive safety protection of the target website is realized.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic flow chart illustrating a website security monitoring method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram illustrating a website security monitoring apparatus according to an embodiment of the present invention;

fig. 3 is a schematic view illustrating a specific application scenario of a website security monitoring apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Considering that the website security monitoring method in the related art mainly scans viruses or trojans invading the website, because the viruses or trojans are various, various bugs in the website cannot be fundamentally checked and killed, and further, the website cannot be comprehensively and safely protected. Based on this, embodiments of the present invention provide a method and an apparatus for monitoring website security, which are described below by way of embodiments.

As shown in fig. 1, an embodiment of the present invention provides a website security monitoring method, which includes steps S102 to S108, and specifically includes the following steps:

step S102: extracting webpage content information from webpages corresponding to all websites of a target website, wherein the target website is a website to be monitored, the webpage content information can contain video information, picture information and text information, or the webpage content information contains a combination of any two items of the video information, the picture information and the text information, or the webpage content information contains a combination of three items of the video information, the picture information and the text information, or no useful information exists in the webpage content information, and at the moment, when no useful information exists in the webpage content information, the target website is considered to be a normal website;

step S104: respectively determining whether the webpage content information in each webpage contains bad information, wherein the bad information comprises pornography, gambling, reactionary, violence and other sensitive content customized by a user;

step S106: when the content information of the web pages in any one of the web pages contains bad information, determining the target website as an abnormal website;

step S108: sending prompt information to a management terminal corresponding to the abnormal website so that a manager corresponding to the management terminal can maintain the abnormal website; and/or closing an access channel of the abnormal website or deleting the abnormal webpage with bad information to prevent the user terminal from continuously accessing the abnormal website or the abnormal webpage.

Specifically, considering that bad information may invade into webpage content information contained in a webpage at any time, in order to identify a target website with the bad information in time, thereby realizing rapid abnormal early warning and preventing bad influence caused by spreading of the bad information, based on the above, webpage content information can be extracted from webpages corresponding to each website of the target website according to a preset time interval, and then the bad information identification is performed on the extracted webpage content information, that is, the webpage content information in each webpage is extracted in a cyclic scanning and periodic capturing manner; further, considering that the capturing speed of the web content information needs to be increased under the condition that there are many web pages, based on this, the web content information may be extracted from the web pages corresponding to each website of the target website in a parallel thread manner or a manner that the server matrix accesses a plurality of web pages simultaneously. The preset time interval can be set according to actual requirements, all webpage contents of the website can be monitored in real time in 24 hours without dead angles, and therefore information security of the website is guaranteed.

Under the condition that the bad information exists in the webpage content, the following exception handling modes can be adopted, and the exception handling method specifically comprises the following steps:

(1) sending prompt information to a management terminal corresponding to an abnormal website so that a manager corresponding to the management terminal can maintain the corresponding abnormal website, thereby achieving the purpose of real-time abnormal early warning and automatically notifying a relevant responsible department and a relevant manager of the abnormal website, specifically, after the manager corresponding to the management terminal receives the prompt information, the manager can directly close the corresponding abnormal website for a company website, and the manager can directly delete the corresponding abnormal post for a forum website;

(2) if the website is provided with a corresponding access interface, an access channel of an abnormal website can be automatically closed or an abnormal webpage with bad information is deleted, so that the user terminal is prevented from continuously accessing the abnormal website or the abnormal webpage;

(3) sending prompt information to a management terminal corresponding to the abnormal website so that a manager corresponding to the management terminal can maintain the corresponding abnormal website; meanwhile, an access channel of the abnormal website is directly closed or the abnormal webpage with bad information is deleted, so that the user terminal is prevented from continuously accessing the abnormal website or the abnormal webpage.

In the embodiment provided by the invention, the webpage content of the target website is monitored in real time, and the quick early warning is carried out under the condition that bad information exists in the webpage content, so that the omnibearing safety protection of the target website is realized. The website security monitoring method provided by the embodiment of the invention can be applied to any websites which are easy to be attacked by lawless persons through viruses or trojans to transmit bad information, such as government, institution, school portal websites, enterprise and public institution websites, social network celebrity pages, open forum message boards and the like; the method can also be applied to monitoring websites which propagate the bad information aiming at human factors, adopts a mode of detecting the webpage content really released by the websites, and can quickly and accurately identify the abnormal websites with the bad information as long as the bad information is embodied in the webpage content no matter what website attack technology is adopted by lawless persons.

Further, considering that the speed of bad information identification may be slower than the speed of web content capture, the step S102: after extracting the web page content information from the web pages corresponding to the respective websites of the target website, the method further comprises the following steps:

caching the extracted webpage content information in each webpage, periodically acquiring the stored webpage content information, and transmitting the webpage content information to a bad information identification module, wherein the bad information identification module judges whether the webpage content information contains bad information or not through a bad information identification model.

Further, considering that the content information of the web page to be identified may include video information, picture information, and text information, or the content information of the web page includes a combination of any two items of the video information, the picture information, and the text information, or the content information of the web page includes a combination of three items of the video information, the picture information, and the text information, and for different types of information included in the content information of the web page, the bad information identification manner is also different, specifically:

(a) when the web page content information includes picture information, the determining whether there is bad information in the web page content information in each web page respectively specifically includes:

the method comprises the steps of performing sensitive image area extraction processing on picture information to be identified to obtain a plurality of sensitive image areas in the picture information, specifically, performing transformation and filtering operation on the picture information, removing irrelevant contents, extracting potential image areas and contents, performing shape analysis, mathematical morphology and characteristic filtering processing on the extracted potential image areas and contents, and respectively determining a plurality of sensitive image areas to be identified in each picture information;

(b) When the web page content information includes video information, the determining whether there is bad information in the web page content information in each web page respectively specifically includes:

performing framing processing on the video information to be identified to obtain a plurality of video frames, wherein identification of bad information can be performed by taking the video frames as a unit;

performing sensitive area extraction processing on each video frame one by one to obtain a plurality of sensitive areas in the video frame, specifically, performing transformation and filtering operation on the video frame to remove irrelevant content, extracting potential image areas and content, performing shape analysis, mathematical morphology and characteristic filtering processing on the extracted potential image areas and content, and respectively determining a plurality of sensitive areas to be identified in each video frame;

and when the bad information exists in any one of the video frames, determining that the bad information exists in the video information.

In the embodiment provided by the invention, the pre-trained bad information identification model with a multi-hidden-layer structure is used for identifying bad information, the deep learning intelligent model based on the multi-layer neural network is suitable for analyzing and identifying big data (massive images), and the speed of identifying the bad information in the webpage content is improved, so that whether the bad information exists in the webpage content corresponding to the target website can be timely and quickly determined, and further, the quick early warning is carried out under the condition that the bad information exists in the webpage content is identified.

Specifically, the bad information identification model for identifying whether bad information exists in the web page content information included in the web page needs to be constructed in advance; the bad information identification model is obtained by training in a special high-performance training server, so that the bad information identification model can be directly used for identifying the bad information of the extracted webpage content information in the following process, wherein the bad information identification model is constructed in the following mode:

the method comprises the steps of utilizing a deep learning method to conduct deep neural network training on selected bad information samples to obtain a bad information identification model, wherein the bad information identification model is a multi-hidden-layer machine learning model, collecting massive images and videos containing bad information as the bad information samples, using the bad information samples to conduct training on the model to optimize characteristic parameters, and continuously improving identification precision in the deep learning process, so that accuracy of classification or prediction is finally improved.

Further, considering that there may be an erroneous identification of bad information for a certain video frame, when the web content information includes video information, the bad video frames usually continuously appear in a plurality of bad video frames, and the normal video frames usually also continuously appear, so as to avoid that a normal website is determined as an abnormal website due to an erroneous determination of a certain video frame, thereby avoiding an influence on the normal operation of a target website, based on which, when there is bad information in any one of the video frames, it is determined that there is bad information in the video information, specifically including: when the video frame is identified to be the video frame containing the bad information for the first time, adding 1 to the occurrence frequency of the bad video frame;

judging whether the occurrence frequency of the bad video frames accumulated currently is greater than a bad frequency threshold value;

and if so, determining that bad information exists in the video information.

In the embodiment provided by the invention, considering the situation that bad information is mistakenly identified for a certain video frame, when the webpage content information comprises video information, the bad video frames usually continuously appear in a plurality of bad video frames, and the normal video frames usually also continuously appear, by adopting the bad information determining mode, whether the webpage content information contains the bad information can be quickly and accurately determined, and the situation that a certain video frame is mistakenly determined to determine a normal website as an abnormal website is avoided, so that the influence on the normal operation of a target website is avoided.

(c) When the web page content information includes text information, the determining whether there is bad information in the web page content information in each web page respectively specifically includes:

the method comprises the steps of performing sentence dividing processing on extracted text information to be recognized to obtain a plurality of independent sentences, and specifically, converting the extracted text information into a plurality of independent sentences according to punctuation marks, spaces and the like in the text information;

performing keyword segmentation processing on a plurality of independent sentences, analyzing the dependency relationship among a plurality of keywords in each sentence, specifically, segmenting the complete sentence into a plurality of independent keywords, and determining the dependency relationship by combining the relationships of the main subject, the predicate, and the like among the keywords;

according to the dependency relationship, carrying out negative emotion algorithm identification on each keyword or a combination of a plurality of keywords, and judging whether each sentence has negative emotion;

if any statement has negative emotion and contains sensitive words, calculating the existence probability of bad information of the text information, specifically, calculating the existence probability of bad information of the text information according to the degree that each statement has negative emotion and the degree that each statement contains sensitive words;

and when the existence probability of the bad information is greater than a third preset threshold value, determining that the bad information exists in the text information to be recognized.

(d) When the web page content information includes at least two items of three types of information, i.e., video information, picture information, or text information, the determining whether the web page content information in each web page includes bad information specifically includes:

and if the existence probability of any unhealthy information is greater than the corresponding preset threshold value, determining that the unhealthy information exists in the webpage content information.

The table 1 shows a preset threshold corresponding to each type of bad information in each type of information, and specifically includes:

TABLE 1

Description of the parameters, in Table 1, K₂₂Indicating that a preset threshold corresponding to the reaction exists in the video information;

the bad information existence probability corresponding to each type of bad information in each type of information is given in table 2, and specifically:

TABLE 2

Description of the parameters, in Table 2, P₂₂The existence probability of bad information representing the existence of reaction in the video information is represented;

that is, P is judged separately₂₂Whether or not greater than K₂₂、P₂₃Whether or not greater than K₂₃Etc., i.e. separately P_ijK being identical to the subscript_ijMaking a comparison, wherein P₁₁、P₂₁、P₃₁The method for calculating (b) is the same as the method for calculating the existence probability of bad information in (a), and P is₁₂、P₂₂、P₃₂The method of calculating (b) is the same as the method of calculating the existence probability of bad information, and P is₁₃、P₂₃、P₃₃The method of calculating (c) is the same as the method of calculating the existence probability of the bad information in (c);

it should be noted that the types of the bad information are not limited to pornography, reaction, violence, but may also include other types of bad information, and the classification of the bad information may be divided according to actual needs.

Specifically, a scoring mechanism is respectively set for the discrimination results of the picture information, the video information and the text information, scores are respectively set for bad information such as pornography, reaction and violence, and pornography, reaction and violence attribute scoring is respectively carried out for the picture information, the video information and the text information in a certain webpage, namely P is respectively determined₁₁、P₂₁、P₃₁、P₁₂、P₂₂、P₃₂、P₁₃、P₂₃、P₃₃Setting respective alarm threshold scores of the picture information, the video information and the text information aiming at different types of bad information, namely respectively setting K₁₁、K₂₁、K₃₁、K₁₂、K₂₂、K₃₂、K₁₃、K₂₃、K₃₃(ii) a If the analysis result score of any kind of bad information exceeds the corresponding alarm threshold value, further processing is triggered, and therefore rapid early warning is achieved.

Further, considering that although the existence probability of the bad information corresponding to each type of bad information in all types of information is not greater than the corresponding preset threshold, but the existence probability of the bad information in the web page content information is dispersed, that is, the picture information, the video information and the text information all have a bit of pornographic bad information, but the existence probability of the pornographic bad information in the picture information is not greater than the corresponding preset threshold, the existence probability of the pornographic bad information in the video information is not greater than the corresponding preset threshold, the existence probability of the pornographic bad information in the text information is not greater than the corresponding preset threshold, but three portions of pornographic bad information in the picture information, the video information and the text information may exceed the allowable range, in order to accurately determine whether the bad information exists in the web page under the condition that the proportion of each type of the bad information in each type of information is dispersed, therefore, the accuracy of identifying bad information in the webpage content information is improved, and on the basis, the method further comprises the following steps:

if the existence probabilities of all the bad information are not greater than the corresponding preset threshold values, calculating the comprehensive bad information existence probability corresponding to various types of bad information according to the existence probability of the bad information corresponding to each type of bad information in all types of information, namely calculating the comprehensive bad information existence probability of pornography, the comprehensive bad information existence probability of reaction and the comprehensive bad information existence probability of violence in the webpage content information;

judging whether the existence probability of each piece of comprehensive bad information is greater than a comprehensive preset threshold corresponding to the corresponding type of bad information;

and if the existence probability of any one piece of comprehensive bad information is greater than the corresponding comprehensive preset threshold value, determining that the bad information exists in the webpage content information.

Specifically, for bad information types such as pornography, reaction, violence, etc., the correlations of the picture information, the video information, and the text information in a certain web page are analyzed respectively, and correlation weighting is performed, for example, the pornography attributes of the picture information, the video information, and the text information in the extracted web page content information are respectively scored, wherein the pornography attribute scores of the picture information, the video information, and the text information are respectively P₁₁、P₁₂、P₁₃Then, the total pornographic attribute score for the web page, i.e., the probability of existence of the integrated bad information of pornography, is S ═ f (P)₁₁，P₁₂，P₁₃) Function f (P)₁₁，P₁₂，P₁₃) As a function of the correlation weight, with the aim of making P₁₁、P₁₂、P₁₃All contribute to the total score S, f (P)₁₁，P₁₂，P₁₃) A common simplified calculation formula is f (P)₁₁，P₁₂，P₁₃)＝P₁₁+P₁₂+P₁₃And the related relevance weighting function can be customized for the special target website to calculate the comprehensive bad information existence probability corresponding to various types of bad information.

Further, in order to enable a manager to select a sending mode of the prompt information according to actual needs, the sending of the prompt information to the management terminal corresponding to the abnormal website includes:

searching an information transmission mode corresponding to the management terminal according to the management terminal, wherein the information transmission mode comprises any one of the following modes: short message, WeChat, QQ, email, or telephone;

Further, considering that the webpage content in the target website may be tampered by intrusion, based on this, the method further includes:

if so, sending prompt information to the management terminal corresponding to the target website to remind the manager corresponding to the management terminal.

Specifically, when the content of the web page has bad information which is unlikely to appear, it indicates that the target website is tampered by intrusion, for example, pornography pictures appear on a government website.

In the website security monitoring method provided by the invention, the method comprises the following steps: extracting webpage content information from webpages corresponding to various websites of a target website, wherein the webpage content information comprises: at least one of video information, picture information or text information; respectively determining whether the content information of the web pages in each web page contains bad information; when bad information exists in the webpage content information of any webpage, determining that the target website is an abnormal website; sending prompt information to a management terminal corresponding to the abnormal website so that a manager corresponding to the management terminal can maintain the abnormal website; and/or closing an access channel of the abnormal website or deleting the abnormal webpage with bad information to prevent the user terminal from continuously accessing the abnormal website or the abnormal webpage. According to the embodiment of the invention, the webpage content of the target website is monitored in real time, and the early warning is quickly carried out under the condition that the bad information exists in the webpage content, so that the comprehensive safety protection of the target website is realized.

An embodiment of the present invention further provides a website security monitoring apparatus, as shown in fig. 2, the apparatus includes:

the information crawling module 202 is configured to extract web content information from a web page corresponding to each website of the target website, where the web content information includes: at least one of video information, picture information or text information;

a bad information identification module 204, configured to determine whether there is bad information in the content information of the web pages in each of the web pages, respectively;

an abnormal website determining module 206, configured to determine that the target website is an abnormal website when there is bad information in the web page content information in any one of the web pages;

a prompt information sending module 208, configured to send a prompt information to a management terminal corresponding to the abnormal website, so that a manager corresponding to the management terminal maintains the abnormal website; and/or closing an access channel of the abnormal website or deleting the abnormal webpage with bad information to prevent the user terminal from continuously accessing the abnormal website or the abnormal webpage.

Further, the above apparatus further comprises:

the prompt information sending module 208 is further configured to send prompt information to a management terminal corresponding to the target website when it is determined that the target website is tampered with by intrusion, so as to remind a manager corresponding to the management terminal.

In the embodiment provided by the invention, the webpage content of the target website is monitored in real time, and the quick early warning is carried out under the condition that bad information exists in the webpage content, so that the omnibearing safety protection of the target website is realized. The website security monitoring device provided by the embodiment of the invention can be applied to any websites which are easy to be attacked by lawless persons through viruses or trojans to transmit bad information, such as government, institution, school portal websites, enterprise and public institution websites, social network celebrity pages, open forum message boards and the like; the method can also be applied to monitoring websites which propagate the bad information aiming at human factors, adopts a mode of detecting the webpage content really released by the websites, and can quickly and accurately identify the abnormal websites with the bad information as long as the bad information is embodied in the webpage content no matter what website attack technology is adopted by lawless persons.

The website safety monitoring can be realized through the following two ways:

(1) service form: the Software as a Service (SaaS) type user customization Service, system hardware can be based on a server, a cloud platform, a workstation and other various types of systems, and website content safety protection is provided for a plurality of target websites;

(2) the device form is as follows: monitoring the safety of a target website through a plurality of independent website safety monitoring devices, wherein a user can randomly deploy the independent website safety monitoring devices, and each website safety monitoring device provides website content safety protection for a single user;

specifically, for a website security monitoring device in the form of a device, the device may exist in the form of: the general computer architecture is additionally provided with a high-performance GPU (graphics processing Unit) display card, and bad information identification of a calculation end is carried by utilizing the GPU computing capacity of the display card; or, the embedded GPU chip is an embedded GPU chip developed by NVIDIA and other companies, such as Tegra K1, Tegra X1, and the like; or the programmable logic device FPGA identifies bad information of the webpage content based on the parallel computing capability of the FPGA.

In addition, in the embodiment provided by the invention, through providing an independent website safety monitoring device, the device is arranged between a server and a management terminal of a target website to be monitored, bad information identification is carried out on webpage content information in an extracted webpage in real time, and quick early warning is carried out under the condition that the bad information exists in the webpage content, so that the omnibearing safety protection is carried out on the target website, the management and control performance of the target website is improved, meanwhile, the device is convenient to use, has the characteristic of plug and play, is convenient for the overhaul, maintenance and upgrade of subsequent devices, and does not occupy the resources of the server of the target website to be monitored in the webpage content monitoring process.

Further, considering that the extracted web content information may have a relatively complex screen structure and screen materials, and the bad information identification model with a multi-hidden-layer structure obtained by training based on the deep learning method is used for identifying bad information, the requirements on the data processing amount and the data processing speed of the processor are relatively high, and further, in order to cope with various complex web content information and simultaneously improve the data processing speed and the bad information identification precision of the processor, the processor comprises: a processor based on the cooperative work of the GPU and the CPU, or a programmable logic device FPGA.

The bad information recognition model with the multi-hidden-layer structure is obtained by training based on the deep learning method, so that the requirements on data processing quantity and data processing speed are high, and the data processing quantity is expressed by training the bad information recognition model by using massive bad contents; on the other hand, the method is characterized in that specific webpage content information is analyzed in application by using a bad information identification configuration parameter matching model obtained by training;

(1) in the training process of the bad information identification model, the embodiment provided by the invention has the advantages that the high-performance GPU video card is configured in the special high-performance training server, and the parallel computing capability of the GPU video card is utilized to accelerate the model training speed;

(2) in the application process of the bad information identification model, in order to take account of severe application occasions, the embodiment provided by the invention selects an NVIDIA industrial-grade processor based on the cooperative work of the GPU and the CPU to accelerate the real-time calculation and analysis, or the model processing program can be realized in a programmable logic device FPGA, and the strong real-time calculation capability is realized by utilizing the parallel calculation characteristic of the programmable logic device FPGA.

Specifically, the FPGA (Programmable logic device) may be used to identify bad information of the extracted web page content information. The FPGA sets the working state of the FPGA by a program stored in an on-chip RAM, so the on-chip RAM needs to be programmed during working. Different programming modes can be adopted according to different configuration modes. When the power is on, the FPGA chip reads data in an off-chip configuration chip (most of FLASH, EPROM and the like) into the on-chip programming RAM, and after the configuration is finished, the FPGA enters a working state. After power failure, the FPGA is recovered to be a white chip, and the internal logic relation disappears, so that the FPGA can be repeatedly used. The programming of the FPGA does not need a special FPGA programmer, and only needs universal FLASH, EPROM and PROM programmers. When the FPGA function needs to be modified, only the programming data of the off-chip configuration chip needs to be updated. Thus, different circuit functions can be generated by the same FPGA and different programming data. Thus, the use of FPGAs is very flexible.

In the embodiment provided by the invention, the bad information identification is carried out by using the bad information identification model with a multi-hidden-layer structure obtained by training based on the deep learning method, the requirements on the data processing quantity and the data processing speed of the processor are high, a mixed high-performance processor is constructed based on multi-core CPU flow control and GPU parallel acceleration, or the programmable logic device FPGA is selected as a core component for realizing the bad information identification process, the temperature, humidity, vibration, electromagnetic compatibility and other performances of the website security monitoring equipment can be enhanced, the equipment is more favorably applied to a harsher environment or the content information of the webpage to be identified possibly has a more complex picture structure and picture materials, the method can effectively improve the identification speed and the identification precision of the bad information, and can avoid the interference of the environment on the processor in the data processing process.

Further, considering that the speed of identifying the bad information may be slower than the speed of extracting the web content information, based on this, the processor is further configured to perform a cache process on the extracted web content information, periodically acquire the stored web content information, and transmit the web content information to the bad information identifying module in the processor.

Wherein, above-mentioned website safety monitoring device mainly includes three major parts: the front end, the back end and the calculation end are specifically as follows:

(1) front end: the main purpose is to provide basic functions of product introduction, user registration and login, payment management, state display and the like for the webpage service of a user;

(2) a rear end: the system comprises two main applications, namely a back-end management page and a task management system;

the back-end management page is mainly used for displaying and setting internal information for system maintenance personnel, and is used for performing functions of user management, state display, alarm triggering and the like;

the task management system comprises: the system is responsible for realizing website monitoring specifically and is used for task allocation, monitoring and management, verification triggering, report management, cluster monitoring and the like; the function of each functional module is specifically as follows:

the task allocation mainly allocates all websites to be monitored to different computing side servers, so that all the computing side servers are in load balance; the task management system distributes tasks to the computing end servers by taking websites as units, the computing end servers are possibly responsible for monitoring and analyzing a plurality of websites at the same time, the computing end servers periodically send cpu occupancy rates, memory occupancy rates and bandwidth occupancy rates to the task management system, and the task management system dynamically adjusts the number of monitoring websites responsible for the computing end servers according to the cpu, memory and bandwidth load conditions of each computing end server;

the monitoring and management mainly monitors the working states of all components of the system and performs necessary system management on a foreground, a background and a computing end;

the verification triggering mainly comprises the steps of receiving the scanning analysis result of a calculation end to a responsible website in real time, comprehensively judging the safety state of the detected website, triggering intervention mechanisms including short messages, mails and other alarm mechanisms when a problem is found, and directly closing related websites if a direct intervention interface is provided;

the report management is mainly to comprehensively summarize the results of various detection items of the detected website to form detection reports facing the inside and the outside;

the cluster monitoring mainly comprises that when the system scale is large, a back end and a computing end need to establish a server cluster, and the cluster needs to be managed in a unified manner;

(3) and a calculating end: the specific realization and operation platform of bad information identification is operated on an independent server cluster with a GPU; the main functions of the computing end comprise the functions of webpage capturing and storing, video distinguishing, picture distinguishing, character distinguishing and the like, the computing end analyzes captured contents and identifies possible bad information in the contents, including pornography, gambling, reaction, violence and other sensitive contents customized by users;

it should be noted that, in the practical application process, the back end and the computing end software may be combined together.

In addition, an embodiment of the present invention further provides a website security monitoring system, including: a server of a target website to be monitored, the website security monitoring device and the management terminal shown in fig. 2;

the website safety monitoring device is connected with a server and a management terminal of a target website;

the website safety monitoring device is used for extracting webpage content information from webpages corresponding to various websites of a target website, wherein the webpage content information comprises: at least one of video information, picture information or text information; respectively determining whether the content information of the web pages in each web page contains bad information; when bad information exists in the webpage content information of any webpage, determining that the target website is an abnormal website; sending prompt information to a management terminal corresponding to the abnormal website so that a manager corresponding to the management terminal can maintain the abnormal website; and/or closing an access channel of the abnormal website or deleting the abnormal webpage with bad information to prevent the user terminal from continuously accessing the abnormal website or webpage.

Further, as shown in fig. 3, a specific application scenario diagram of the website security monitoring device is provided, where the website security monitoring device can be applied to any website that is easily attacked by lawless persons through viruses or trojans to propagate malicious information, such as government, institution, school portal websites, enterprise and public institution websites, social network celebrity pages, and open forum message boards; the method can also be applied to monitoring websites which propagate the bad information aiming at human factors, adopts a mode of detecting the webpage content really released by the websites, and can quickly and accurately identify the abnormal websites with the bad information as long as the bad information is embodied in the webpage content no matter what website attack technology is adopted by lawless persons. In addition, for different service objects, the protection emphasis points (judgment thresholds of pornography, violence, reaction and the like) are different, and the protection modes are different (mainly according to the requirements of different service objects), for example, a government website is more sensitive to reaction contents, an education website is more sensitive to pornography contents, and relevant parameters in a bad information identification model in a calculation end can be set according to actual requirements.

The website security monitoring device provided by the embodiment of the invention can be specific hardware on equipment or software or firmware installed on the equipment. The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A website security monitoring method is characterized by comprising the following steps:

sending prompt information to a management terminal corresponding to the abnormal website so that a manager corresponding to the management terminal can maintain the abnormal website; and/or closing an access channel of the abnormal website or deleting the abnormal webpage with bad information to prevent the user terminal from continuously accessing the abnormal website or the abnormal webpage;

when the web page content information includes the picture information, the determining whether the web page content information in each web page has bad information respectively includes:

calculating the bad information matching degree of each sensitive image area by using a bad information identification model, and calculating the bad information existence probability of the picture information according to each bad information matching degree; the bad information identification model is constructed in the following mode: deep neural network training is carried out on the selected bad information samples by utilizing a deep learning method to obtain a bad information identification model, the bad information identification model is a multi-hidden-layer machine learning model, massive images and videos containing bad information are collected to serve as the bad information samples, and the bad information samples are used for training the bad information identification model to optimize characteristic parameters so as to continuously improve identification precision in the deep learning process;

when the existence probability of the bad information is larger than a first preset threshold value, determining that the bad information exists in the picture information;

when the web page content information includes text information, the determining whether there is bad information in the web page content information in each web page respectively includes:

when the existence probability of the bad information is larger than a third preset threshold value, determining that the bad information exists in the text information;

when the web page content information includes video information, the determining whether there is bad information in the web page content information in each web page respectively includes:

calculating the bad information matching degree of each sensitive area by using a bad information identification model, and calculating the bad information existence probability of the video frame according to each bad information matching degree;

when bad information exists in any video frame, determining that the bad information exists in the video information;

when bad information exists in any one of the video frames, determining that the bad information exists in the video information, and specifically comprising the following steps: when the video frame is identified to be the video frame containing the bad information for the first time, adding 1 to the occurrence frequency of the bad video frame;

if yes, determining that bad information exists in the video information;

when the web page content information includes at least two items of three types of information, namely video information, picture information or text information, the step of respectively determining whether the web page content information in each web page has bad information includes:

if the existence probability of any unhealthy information is greater than the corresponding preset threshold value, determining that the unhealthy information exists in the webpage content information;

2. The method according to claim 1, wherein the sending of the prompt message to the management terminal corresponding to the abnormal website includes:

3. The method of claim 1, further comprising:

4. A website security monitoring device, the device comprising:

the prompt information sending module is used for sending prompt information to the management terminal corresponding to the abnormal website so that a manager corresponding to the management terminal can maintain the abnormal website; and/or closing an access channel of the abnormal website or deleting the abnormal webpage with bad information to prevent the user terminal from continuously accessing the abnormal website or the abnormal webpage;

when the webpage content information comprises picture information, the bad information identification module is used for determining whether the webpage content information contains bad information according to the following steps:

when the existence probability of the bad information is larger than a preset threshold value, determining that the bad information exists in the picture information;

when the webpage content information comprises text information, the bad information identification module is used for determining whether bad information exists in the webpage content information in each webpage according to the following steps:

when the webpage content information comprises video information, the bad information identification module is used for determining whether bad information exists in the webpage content information in the webpage according to the following steps:

the bad information identification module is specifically used for determining that bad information exists in the video information according to the following steps: when the video frame is identified to be the video frame containing the bad information for the first time, adding 1 to the occurrence frequency of the bad video frame;

if yes, determining that bad information exists in the video information;

when the webpage content information comprises at least two items of three types of information, namely video information, picture information or text information, the bad information identification module is used for determining whether bad information exists in the webpage content information in the webpage according to the following steps:

5. The apparatus of claim 4, further comprising: