CN116015772A

CN116015772A - Malicious website processing method, device, equipment and storage medium

Info

Publication number: CN116015772A
Application number: CN202211590350.9A
Authority: CN
Inventors: 王晓伟; 马庆贺; 高磊; 杨真
Original assignee: Shenzhen Secxun Technology Co ltd
Current assignee: Shenzhen Secxun Technology Co ltd
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-04-25

Abstract

The invention relates to the technical field of data security, and discloses a method, a device, equipment and a storage medium for processing a malicious website, wherein the method comprises the following steps: obtaining a webpage text, a webpage tag sequence and a webpage screenshot according to a malicious website to be processed; identifying the characteristic information through a target webpage image classification model; identifying the webpage text through the target text classification model, and identifying the webpage tag sequence through the target tag sequence classification model; determining the category of the malicious website to be processed according to the identification result and the category of the webpage screenshot; determining a target website processing strategy according to the category of the malicious website to be processed, and processing the malicious website to be processed according to the target website processing strategy; by the method, the malicious websites to be processed are processed according to the target website processing strategy determined by the category, so that the efficiency and the accuracy of processing the malicious websites can be effectively improved.

Description

Malicious website processing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of data security technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing a malicious website.

Background

The internet brings convenience to people and also brings harm, such as fraud, and the following internet technology is changed continuously, so that old users who touch the network more and more or teenagers who use the network for a long time are subjected to property loss and mental loss due to fraud, one of the ways of fraud is to implement fraud on users through malicious websites, such as loans, bill swiping, pig killing discs, fraud law fraud and the like, only the malicious websites are recognized to be far insufficient, how to treat the malicious websites is important, and at present, the common related technology is a firewall, particularly the firewall intercepts the malicious websites, access to the malicious websites is forbidden, but manufacturers of the malicious websites can manufacture novel malicious websites according to the working principle of the firewall, so that the resistance of the firewall is rapidly reduced, and the efficiency and the accuracy of processing the malicious websites are lower.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The invention mainly aims to provide a malicious website processing method, device, equipment and storage medium, and aims to solve the technical problems that the efficiency and accuracy of processing malicious websites are low in the prior art.

In order to achieve the above object, the present invention provides a method for processing a malicious website, where the method for processing a malicious website includes the following steps:

obtaining a webpage text, a webpage tag sequence and a webpage screenshot according to a malicious website to be processed;

extracting characteristic information of the webpage screenshot, and identifying the characteristic information through a target webpage image classification model to obtain the class of the webpage screenshot;

when the content of the webpage text is less than a preset text content threshold and the content of the webpage tag sequence is less than a preset tag sequence threshold, identifying the webpage text through a target text classification model, and identifying the webpage tag sequence through a target tag sequence classification model;

determining the category of the malicious website to be processed according to the identification result and the category of the webpage screenshot;

determining a target website processing strategy according to the category of the malicious website to be processed, and processing the malicious website to be processed according to the target website processing strategy.

Optionally, the obtaining the webpage text, the webpage tag sequence and the webpage screenshot according to the malicious website to be processed includes:

acquiring a malicious website to be processed, and accessing the malicious website to be processed on a virtual machine through target HTTP get naming to obtain malicious website source codes and malicious website contents;

analyzing the malicious website source code to obtain webpage text and webpage label data;

obtaining a corresponding webpage label sequence according to the webpage label data;

and capturing the malicious website content through a target operation browser to obtain a webpage screenshot.

Optionally, the extracting feature information of the webpage screenshot, identifying the feature information through a target webpage image classification model, and obtaining the class of the webpage screenshot includes:

detecting the webpage screenshot to obtain a webpage screenshot shape;

adjusting the screenshot shape of the webpage according to a preset fixed image shape;

obtaining a corresponding screenshot pixel value according to the webpage screenshot after the shape adjustment;

when a screenshot pixel value is located in a preset pixel interval, carrying out mean value calculation on the screenshot pixel value to obtain a current screenshot pixel mean value, and carrying out variance calculation on the screenshot pixel value to obtain a current screenshot pixel variance;

and when the current screenshot pixel mean value is a preset mean value threshold value and the current screenshot pixel variance is a preset variance threshold value, extracting characteristic information of the webpage screenshot, and identifying the characteristic information through a target webpage image classification model to obtain the class of the webpage screenshot.

Optionally, when the current screenshot pixel mean value is a preset mean value threshold and the current screenshot pixel variance is a preset variance threshold, extracting feature information of the webpage screenshot, and identifying the feature information through a target webpage image classification model to obtain a class of the webpage screenshot, including:

when the current screenshot pixel mean value is a preset mean value threshold value and the current screenshot pixel variance is a preset variance threshold value, extracting features of the webpage screenshot through a RestNet network to obtain feature information of each scale;

fusing the characteristic information of each scale through a MaxPooling network layer to obtain multi-scale characteristic information;

identifying the multi-scale characteristic information through a full-connection layer of a target webpage image classification model to obtain probability values of various categories to which the webpage screenshot belongs;

and extracting the maximum probability value in the probability values of the various categories, and taking the category corresponding to the maximum probability value as the category of the webpage screenshot.

Optionally, when the content of the web page text is less than a preset text content threshold and the content of the web page tag sequence is less than a preset tag sequence threshold, identifying the web page text through a target text classification model, and identifying the web page tag sequence through a target tag sequence classification model includes:

detecting the webpage text, and obtaining corresponding text content according to a webpage text detection result;

detecting the webpage tag sequence, and obtaining corresponding tag sequence content according to a tag sequence detection result;

when the content of the webpage text is less than a preset text content threshold value, judging whether the content of the webpage tag sequence is less than a preset tag sequence threshold value or not;

and when the content of the webpage tag sequence is less than a preset tag sequence threshold value, identifying the webpage text through a target text classification model, and identifying the webpage tag sequence through the target tag sequence classification model.

Optionally, when the content of the web page tag sequence is less than a preset tag sequence threshold, identifying the web page text through a target text classification model, and identifying the web page tag sequence through a target tag sequence classification model includes:

when the content of the webpage tag sequence is less than a preset tag sequence threshold value, performing word meaning analysis on the webpage text to obtain each vocabulary;

counting the occurrence frequency of each vocabulary, and screening the vocabulary with the frequency larger than a preset frequency threshold value from each vocabulary;

constructing a corresponding vocabulary according to the frequency obtained by screening, and constructing a word embedding matrix according to the vocabulary;

obtaining a word vector list according to the webpage tag sequence and the word embedding matrix;

inquiring a word vector corresponding to the webpage text according to the word embedding matrix, and identifying the word vector through a target text classification model;

the word vectors and the word vector list are converged through a global pooling layer and a weight connection layer to obtain target word vector characteristics;

and identifying the target word vector features through a target tag sequence classification model.

Optionally, the determining a target website processing policy according to the category of the malicious website to be processed, and processing the malicious website to be processed according to the target website processing policy includes:

selecting a target website processing strategy from a target malicious website processing strategy set according to the category of the malicious website to be processed;

intercepting the malicious website to be processed, and acquiring a uniform resource locator of the malicious website to be processed;

obtaining a corresponding uniform resource locator segment according to the domain name information of the uniform resource locator;

inserting a barrier character at a preset position of the uniform resource locator segment according to a target website processing strategy, and calculating a hash value of the uniform resource locator segment;

and storing the hash value of the uniform resource locator segment into a malicious website block chain.

In addition, in order to achieve the above object, the present invention further provides a processing device for a malicious website, where the processing device for a malicious website includes:

the acquisition module is used for acquiring a webpage text, a webpage tag sequence and a webpage screenshot according to the malicious website to be processed;

the extraction module is used for extracting the characteristic information of the webpage screenshot, and identifying the characteristic information through a target webpage image classification model to obtain the class of the webpage screenshot;

the identification module is used for identifying the webpage text through the target text classification model and identifying the webpage tag sequence through the target tag sequence classification model when the content of the webpage text is less than a preset text content threshold and the content of the webpage tag sequence is less than a preset tag sequence threshold;

the determining module is used for determining the category of the malicious website to be processed according to the identification result and the category of the webpage screenshot;

and the processing module is used for determining a target website processing strategy according to the category of the malicious website to be processed and processing the malicious website to be processed according to the target website processing strategy.

In addition, in order to achieve the above object, the present invention further provides a malicious website processing device, where the malicious website processing device includes: the system comprises a memory, a processor and a malicious website processing program stored on the memory and capable of running on the processor, wherein the malicious website processing program is configured to realize the malicious website processing method.

In addition, in order to achieve the above object, the present invention further provides a storage medium, where a processing program of a malicious website is stored, where the processing program of the malicious website is executed by a processor to implement the processing method of the malicious website as described above.

According to the malicious website processing method, a webpage text, a webpage tag sequence and a webpage screenshot are obtained according to the malicious website to be processed; extracting characteristic information of the webpage screenshot, and identifying the characteristic information through a target webpage image classification model to obtain the class of the webpage screenshot; when the content of the webpage text is less than a preset text content threshold and the content of the webpage tag sequence is less than a preset tag sequence threshold, identifying the webpage text through a target text classification model, and identifying the webpage tag sequence through a target tag sequence classification model; determining the category of the malicious website to be processed according to the identification result and the category of the webpage screenshot; determining a target website processing strategy according to the category of the malicious website to be processed, and processing the malicious website to be processed according to the target website processing strategy; by the method, the malicious websites to be processed are processed according to the target website processing strategy determined by the category, so that the efficiency and the accuracy of processing the malicious websites can be effectively improved.

Drawings

FIG. 1 is a schematic structural diagram of a processing device for malicious websites of a hardware running environment according to an embodiment of the present invention;

FIG. 2 is a flowchart of a first embodiment of a method for processing a malicious website according to the present invention;

FIG. 3 is a flowchart illustrating a second embodiment of a method for processing a malicious website according to the present invention;

fig. 4 is a schematic functional module diagram of a first embodiment of a malicious website processing apparatus according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of a processing device structure of a malicious website of a hardware running environment according to an embodiment of the present invention.

As shown in fig. 1, the processing device of the malicious website may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

Those skilled in the art will appreciate that the architecture shown in fig. 1 does not constitute a limitation of the processing device of malicious web sites, and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include an operating system, a network communication module, a user interface module, and a processing program of a malicious web site.

In the processing device of the malicious website shown in fig. 1, the network interface 1004 is mainly used for performing data communication with a workstation of the network integration platform; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the malicious website processing device of the present invention may be disposed in the malicious website processing device, where the malicious website processing device invokes, through the processor 1001, a processing program of a malicious website stored in the memory 1005, and executes a processing method of a malicious website provided by the embodiment of the present invention.

Based on the hardware structure, the embodiment of the method for processing the malicious website is provided.

Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a method for processing a malicious website according to the present invention.

In a first embodiment, the method for processing a malicious website includes the following steps:

and step S10, obtaining a webpage text, a webpage label sequence and a webpage screenshot according to the malicious website to be processed.

It should be noted that, the execution body of the embodiment is a processing device of a malicious website, and may be other devices that can implement the same or similar functions, such as a website processor, which is not limited in this embodiment, and in this embodiment, the description is given by taking the website processor as an example.

It should be understood that web page text refers to the text content of a web page generated by accessing a malicious web site to be processed on a virtual machine, a web page tag sequence refers to a tag sequence of the generated web page, the web page tag may be a web page HTML tag, and web page screenshots refer to screenshots of web page content including, but not limited to, web page text and web page pictures.

Further, step S10 includes: acquiring a malicious website to be processed, and accessing the malicious website to be processed on a virtual machine through target HTTP get naming to obtain malicious website source codes and malicious website contents; analyzing the malicious website source code to obtain webpage text and webpage label data; obtaining a corresponding webpage label sequence according to the webpage label data; and capturing the malicious website content through a target operation browser to obtain a webpage screenshot.

It can be understood that, after obtaining a malicious website to be processed, in order to avoid attack and intrusion of the malicious website to the device, in this embodiment, the virtual machine accesses the malicious website to be processed through naming a target HTTP get to obtain a malicious website source code and malicious website content, the malicious website source code refers to a source code of a webpage corresponding to the malicious website to be processed, the webpage tag data refers to tag data of the webpage corresponding to the malicious website to be processed, the webpage tag data are located at two ends of the source code, then a corresponding webpage tag sequence is obtained according to the webpage tag data, and then screenshot is performed on the malicious website content through a target operation browser to obtain webpage screenshot, where the target operation browser can be a Selenium operation browser.

And step S20, extracting the characteristic information of the webpage screenshot, and identifying the characteristic information through a target webpage image classification model to obtain the class of the webpage screenshot.

It can be understood that the feature information refers to information capable of uniquely identifying a webpage screenshot, the feature information can be a webpage screenshot identification field, the target webpage image classification model refers to a model for classifying a webpage image, the target webpage image classification model is obtained by performing transfer learning to fine tuning of a webpage screenshot data set by adopting a model pre-trained by an ImageNet data set, and compared with a general image classification model, the depth of the target webpage image classification model is increased and internal residual blocks are connected by using a jump mode, so that the trouble of gradient disappearance caused by the increase of the depth can be relieved.

Further, step S20 includes: detecting the webpage screenshot to obtain a webpage screenshot shape; adjusting the screenshot shape of the webpage according to a preset fixed image shape; obtaining a corresponding screenshot pixel value according to the webpage screenshot after the shape adjustment; when a screenshot pixel value is located in a preset pixel interval, carrying out mean value calculation on the screenshot pixel value to obtain a current screenshot pixel mean value, and carrying out variance calculation on the screenshot pixel value to obtain a current screenshot pixel variance; and when the current screenshot pixel mean value is a preset mean value threshold value and the current screenshot pixel variance is a preset variance threshold value, extracting characteristic information of the webpage screenshot, and identifying the characteristic information through a target webpage image classification model to obtain the class of the webpage screenshot.

It should be understood that after the shape of the webpage screenshot is obtained, the shape of the webpage screenshot needs to be adjusted to a preset fixed image shape, then whether the screenshot pixel value of the webpage screenshot after the shape adjustment is located in a preset pixel interval is judged, if yes, the screenshot pixel value needs to be reduced to the preset pixel interval according to a proportion, the preset pixel interval is [0,1], then the current screenshot pixel mean value and the current screenshot pixel variance of the screenshot pixel value are calculated respectively, then whether the condition that the current screenshot pixel mean value is a preset mean value threshold value and the current screenshot pixel variance is a preset variance threshold value is met is judged, if no, the current screenshot pixel mean value needs to be normalized to the preset mean value threshold value, the current screenshot pixel variance needs to be normalized to the preset variance threshold value, the preset mean value threshold value is 0, the preset variance threshold value is 1, and then the classification of the webpage screenshot is identified through the target webpage image classification model.

Further, when the current screenshot pixel mean value is a preset mean value threshold value and the current screenshot pixel variance is a preset variance threshold value, extracting feature information of the webpage screenshot, and identifying the feature information through a target webpage image classification model to obtain a class of the webpage screenshot, including: when the current screenshot pixel mean value is a preset mean value threshold value and the current screenshot pixel variance is a preset variance threshold value, extracting features of the webpage screenshot through a RestNet network to obtain feature information of each scale; fusing the characteristic information of each scale through a MaxPooling network layer to obtain multi-scale characteristic information; identifying the multi-scale characteristic information through a full-connection layer of a target webpage image classification model to obtain probability values of various categories to which the webpage screenshot belongs; and extracting the maximum probability value in the probability values of the various categories, and taking the category corresponding to the maximum probability value as the category of the webpage screenshot.

It will be appreciated that after obtaining the web page screenshot satisfying the condition, feature extraction is performed through a RestNet network, where the RestNet network includes network layers with different scales, referring to fig. 3, where the RestNet network includes, but is not limited to (7×7conv,64,/2), (3×3conv, 64), (3×3conv,128,/2), (3×3conv, 128), (3×3conv,256,/2), (3×3conv, 512), and feature extraction is performed through a RestNet network, where feature information with each scale is then fused into multi-scale feature information through a MaxPooling network layer, then the multi-scale feature information is identified through a full-connection layer of a target web page image classification model, and a probability value of each class to which the web page screenshot belongs is output, and then a class corresponding to the probability value of each class is regarded as the class of the web page screenshot, for example, the probability value of class 1 is 60%, the probability value of class 2 is regarded as the class of the web page screenshot, and the probability value of class 3 is regarded as class 95.

Step S30, when the content of the webpage text is less than a preset text content threshold and the content of the webpage tag sequence is less than a preset tag sequence threshold, identifying the webpage text through a target text classification model, and identifying the webpage tag sequence through a target tag sequence classification model.

It should be understood that after the web page text is obtained and after the web page tag sequence is obtained, whether the content of the web page text is less than the preset text content threshold and the content of the web page tag sequence is less than the preset tag sequence threshold needs to be judged, if yes, the content of the web page text and the content of the web page tag sequence are indicated to be too small, at this time, the web page text is identified through a target text classification model, the web page tag sequence is identified through a target tag sequence classification model, both the target text classification model and the target tag sequence classification model are trained through a TextCNN deep learning algorithm, and the target text classification model is trained to adopt text types such as pornography, lottery, loan, brush bill, ETC fraud, counterfeit public inspection and normal legitimacy.

And step S40, determining the category of the malicious website to be processed according to the identification result and the category of the webpage screenshot.

It can be understood that after the recognition results of the webpage text and the webpage tag sequence are obtained, the category of the malicious website to be processed is comprehensively considered and determined by combining the category of the webpage screenshot.

And S50, determining a target website processing strategy according to the category of the malicious website to be processed, and processing the malicious website to be processed according to the target website processing strategy.

It should be understood that the target website processing policy refers to a policy for processing malicious websites, and because the processing policies of malicious websites of different types are different, when the type of the malicious website to be processed is obtained, the most appropriate website processing policy is determined according to the type of the malicious website to be processed, and then the malicious website to be processed is processed through the target website processing policy.

Further, step S50 includes: selecting a target website processing strategy from a target malicious website processing strategy set according to the category of the malicious website to be processed; intercepting the malicious website to be processed, and acquiring a uniform resource locator of the malicious website to be processed; obtaining a corresponding uniform resource locator segment according to the domain name information of the uniform resource locator; inserting a barrier character at a preset position of the uniform resource locator segment according to a target website processing strategy, and calculating a hash value of the uniform resource locator segment; and storing the hash value of the uniform resource locator segment into a malicious website block chain.

It can be understood that after a target website processing policy most suitable for the category of the malicious website to be processed is selected, then the malicious website to be processed is intercepted, i.e. the malicious website to be processed is not accessed, then a corresponding uniform resource locator segment is obtained according to the domain name information of the uniform resource locator of the malicious website to be processed, then a blocking character is inserted into a preset position of the uniform resource locator segment, so that the whole malicious website to be processed is in an invalid state, then the hash value of the uniform resource locator segment is stored into a malicious website block chain, and when other users meet the malicious website to be processed, a malicious label is automatically popped up, so that the equipment of the other users is prevented from being damaged by the malicious website to be processed.

According to the embodiment, a webpage text, a webpage tag sequence and a webpage screenshot are obtained according to a malicious website to be processed; extracting characteristic information of the webpage screenshot, and identifying the characteristic information through a target webpage image classification model to obtain the class of the webpage screenshot; when the content of the webpage text is less than a preset text content threshold and the content of the webpage tag sequence is less than a preset tag sequence threshold, identifying the webpage text through a target text classification model, and identifying the webpage tag sequence through a target tag sequence classification model; determining the category of the malicious website to be processed according to the identification result and the category of the webpage screenshot; determining a target website processing strategy according to the category of the malicious website to be processed, and processing the malicious website to be processed according to the target website processing strategy; by the method, the malicious websites to be processed are processed according to the target website processing strategy determined by the category, so that the efficiency and the accuracy of processing the malicious websites can be effectively improved.

In an embodiment, as shown in fig. 3, a second embodiment of the method for processing a malicious website according to the present invention is provided based on the first embodiment, where the step S30 includes:

step S301, detecting the web page text, and obtaining corresponding text content according to the web page text detection result.

It should be understood that text content refers to content of web page text, including but not limited to web page text and web page pictures, and specifically, after obtaining web page text, the web page text is detected to obtain corresponding text content.

Step S302, detecting the webpage tag sequence, and obtaining corresponding tag sequence content according to a tag sequence detection result.

It can be understood that the tag sequence content refers to the content of the web page tag sequence, specifically, after the web page tag sequence is obtained, the web page tag sequence is detected to obtain the corresponding tag sequence content.

Step S303, when the content of the web page text is less than a preset text content threshold, determining whether the content of the web page tag sequence is less than a preset tag sequence threshold.

It should be understood that after obtaining the content of the web page text, it needs to be determined whether the content of the web page text is less than the preset text content threshold, and if so, it needs to continuously determine whether the content of the web page tag sequence is less than the preset tag sequence threshold.

And step S304, when the content of the webpage label sequence is less than a preset label sequence threshold value, identifying the webpage text through a target text classification model, and identifying the webpage label sequence through the target label sequence classification model.

It can be understood that when the content of the web page tag sequence is determined to be less than the preset tag sequence threshold, the web page tag sequence and the web page text are indicated to be too less, at this time, the web page text is identified by the target text classification model, and the web page tag sequence is identified by the target tag sequence classification model.

Further, step S304 includes: when the content of the webpage tag sequence is less than a preset tag sequence threshold value, performing word meaning analysis on the webpage text to obtain each vocabulary; counting the occurrence frequency of each vocabulary, and screening the vocabulary with the frequency larger than a preset frequency threshold value from each vocabulary; constructing a corresponding vocabulary according to the frequency obtained by screening, and constructing a word embedding matrix according to the vocabulary; obtaining a word vector list according to the webpage tag sequence and the word embedding matrix; inquiring a word vector corresponding to the webpage text according to the word embedding matrix, and identifying the word vector through a target text classification model; the word vectors and the word vector list are converged through a global pooling layer and a weight connection layer to obtain target word vector characteristics; and identifying the target word vector features through a target tag sequence classification model.

It should be understood that when it is determined that the content of the web page tag sequence is less than the preset tag sequence threshold, dividing the web page text into word sizes, obtaining each word according to the word sizes, counting the occurrence frequency of each word, judging whether the counted frequency is greater than the preset frequency threshold, if so, constructing a corresponding vocabulary table with the words corresponding to the frequency, constructing a word embedding matrix according to the vocabulary table, inquiring a word vector corresponding to the word through any word by the word embedding matrix, characterizing the feature of each dimension of the word through the word vector, identifying the word vector through a target text classification model, converging the word vector and the word vector list through a global pooling layer and a weight connection layer, and identifying the feature of the converged target word vector through a target tag sequence classification model.

According to the embodiment, the webpage text is detected, and corresponding text content is obtained according to a webpage text detection result; detecting the webpage tag sequence, and obtaining corresponding tag sequence content according to a tag sequence detection result; when the content of the webpage text is less than a preset text content threshold value, judging whether the content of the webpage tag sequence is less than a preset tag sequence threshold value or not; when the content of the webpage tag sequence is less than a preset tag sequence threshold value, identifying the webpage text through a target text classification model, and identifying the webpage tag sequence through a target tag sequence classification model; through the method, the webpage text and the webpage tag sequence are detected respectively, whether the condition that the content of the webpage text is less than the preset text content threshold and the content of the webpage tag sequence is less than the preset tag sequence threshold is judged, if yes, the webpage text is identified through the target text classification model, and the webpage tag sequence is identified through the target tag sequence classification model, so that the accuracy of identifying the webpage text and the webpage tag sequence can be improved effectively.

In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium stores a processing program of the malicious website, and the processing program of the malicious website realizes the steps of the processing method of the malicious website when being executed by a processor.

Because the storage medium adopts all the technical schemes of all the embodiments, the storage medium has at least all the beneficial effects brought by the technical schemes of the embodiments, and the description is omitted here.

In addition, referring to fig. 4, an embodiment of the present invention further provides a processing apparatus for a malicious website, where the processing apparatus for a malicious website includes:

the obtaining module 10 is configured to obtain a webpage text, a webpage tag sequence and a webpage screenshot according to a malicious website to be processed.

And the extracting module 20 is used for extracting the characteristic information of the webpage screenshot, and identifying the characteristic information through a target webpage image classification model to obtain the class of the webpage screenshot.

The identifying module 30 is configured to identify the web page text through a target text classification model and identify the web page tag sequence through a target tag sequence classification model when the content of the web page text is less than a preset text content threshold and the content of the web page tag sequence is less than a preset tag sequence threshold.

And the determining module 40 is configured to determine a category of the malicious website to be processed according to the identification result and the category of the webpage screenshot.

The processing module 50 is configured to determine a target website processing policy according to the category of the malicious website to be processed, and process the malicious website to be processed according to the target website processing policy.

It should be noted that the above-described working procedure is merely illustrative, and does not limit the scope of the present invention, and in practical application, a person skilled in the art may select part or all of them according to actual needs to achieve the purpose of the embodiment, which is not limited herein.

In addition, technical details not described in detail in this embodiment may refer to the method for processing a malicious website provided in any embodiment of the present invention, which is not described herein again.

Other embodiments of the malicious website processing apparatus or the implementation method thereof according to the present invention may refer to the above method embodiments, and are not repeated herein.

Furthermore, it should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. Read Only Memory)/RAM, magnetic disk, optical disk) and including several instructions for causing a terminal device (which may be a mobile phone, a computer, an integrated platform workstation, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. The method for processing the malicious website is characterized by comprising the following steps of:

2. The method for processing a malicious website as set forth in claim 1, wherein the obtaining a web page text, a web page tag sequence and a web page screenshot according to the malicious website to be processed includes:

3. The method for processing a malicious web site according to claim 1, wherein the extracting the feature information of the web page screenshot, and identifying the feature information through a target web page image classification model, to obtain the category of the web page screenshot comprises:

detecting the webpage screenshot to obtain a webpage screenshot shape;

4. The method for processing a malicious web site according to claim 3, wherein when the current screenshot pixel mean is a preset mean threshold and the current screenshot pixel variance is a preset variance threshold, extracting feature information of the webpage screenshot, and identifying the feature information through a target webpage image classification model to obtain a class of the webpage screenshot, includes:

5. The method for processing a malicious web site according to claim 1, wherein when the content of the web text is less than a preset text content threshold and the content of the web tag sequence is less than a preset tag sequence threshold, identifying the web text by a target text classification model, and identifying the web tag sequence by a target tag sequence classification model, comprises:

6. The method for processing a malicious web site according to claim 5, wherein when the content of the web tag sequence is less than a preset tag sequence threshold, identifying the web page text by the target text classification model, and identifying the web page tag sequence by the target tag sequence classification model, comprises:

7. The method for processing a malicious website according to any one of claims 1 to 6, wherein determining a target website processing policy according to the category of the malicious website to be processed, and processing the malicious website to be processed according to the target website processing policy, includes:

8. The malicious website processing device is characterized by comprising:

9. A malicious web site processing apparatus, wherein the malicious web site processing apparatus includes: memory, processor and stored on said memory and executable on said processor a malicious web site handling program configured with a method for handling a malicious web site according to any one of claims 1 to 7.

10. A storage medium, wherein a processing program of a malicious web site is stored on the storage medium, and when the processing program of the malicious web site is executed by a processor, the processing method of the malicious web site according to any one of claims 1 to 7 is implemented.