CN112597828A

CN112597828A - Training method and device of webpage recognition model and webpage recognition method

Info

Publication number: CN112597828A
Application number: CN202011447056.3A
Authority: CN
Inventors: 周余钱
Original assignee: JD Digital Technology Holdings Co Ltd
Current assignee: JD Digital Technology Holdings Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-04-02

Abstract

The embodiment of the application provides a training method and a device of a webpage recognition model, a webpage recognition method, electronic equipment and a storage medium, and the method comprises the following steps: collecting a sample set, extracting the characteristics of each training image in the sample set to obtain the image characteristics of each training image, and repeating the following steps until a mature webpage identification model is obtained: the method comprises the steps of establishing a weight matrix corresponding to each image characteristic according to a convolutional neural network model, adjusting parameters of the convolutional neural network model according to the weight matrix, generating a webpage identification model for identifying the webpage by training the convolutional neural network model based on training images of normal webpages and training images of abnormal webpages, and avoiding the defect of low identification efficiency caused by manual identification in the related technology, thereby improving the intelligence of identifying the webpage and improving the technical effect of identification accuracy.

Description

Training method and device of webpage recognition model and webpage recognition method

Technical Field

The embodiment of the application relates to the technical field of internet, in particular to a training method and device of a webpage recognition model, a webpage recognition method, electronic equipment and a storage medium.

Background

With the development of internet and terminal technologies, it is one of daily activities for a user to browse web pages through a browser to acquire network information.

When a script of a web page or the like has an error, the browser may not load a complete web page, so that the web page opened by the user is a web page carrying error information, or a web page with an error is prompted to be displayed.

In the process of implementing the present application, the inventor finds that at least the following problems exist in the prior art: the method is influenced by artificial subjective factors, and may cause the problems of low reliability and low efficiency of identification.

Disclosure of Invention

The embodiment of the application provides a training method and device of a webpage recognition model, a webpage recognition method, electronic equipment and a storage medium, and aims to solve the problem of low accuracy of webpage recognition.

In a first aspect, an embodiment of the present application provides a method for training a web page recognition model, where the method includes:

acquiring a sample set, wherein the sample set comprises: training images of normal web pages and training images of abnormal web pages;

performing feature extraction on each training image in the sample set to obtain the image feature of each training image;

repeating the following steps until a mature webpage identification model is obtained: constructing a weight matrix corresponding to each image feature according to a preset convolutional neural network model, and adjusting parameters of the convolutional neural network model according to the weight matrix; each weight in the weight matrix represents, and the probability that the training image corresponding to each weight is an abnormal webpage;

the webpage identification model is used for identifying a normal webpage and an abnormal webpage.

In the embodiment, the convolutional neural network model is trained through the training image based on the normal webpage and the training image based on the abnormal webpage, and the webpage identification model for identifying the webpage is generated, so that the defects that in the related technology, the identification efficiency is low and the accuracy is low due to the adoption of a manual identification mode can be avoided, the intellectualization of webpage identification is improved, and the identification accuracy and the reliability are improved.

In some embodiments, constructing a weight matrix corresponding to each image feature according to a preset convolutional neural network model includes:

aiming at each image feature, determining the probability that a training image corresponding to each image feature is an abnormal webpage;

distributing weights to the image features of each training image according to the probability;

and constructing the weight matrix according to the weight corresponding to each image characteristic.

In this embodiment, weights are assigned based on the probabilities, and a weight matrix is constructed based on the weights, so that the high association between the weight matrix and the types of the webpages (i.e., normal webpages or abnormal webpages) can be realized, and the technical effects of improving the reliability and accuracy of training can be achieved.

In some embodiments, each of the training images has first category information, the first category information represents that the training image is a normal webpage or an abnormal webpage; according to the weight corresponding to each image feature, the weight matrix is constructed, and the method comprises the following steps:

based on the first category information of each training image, adjusting the weight corresponding to the image feature of each training image to obtain the adjusted weight of the image feature of each training image;

and obtaining the weight matrix according to each adjusted weight.

In this embodiment, the weight of each training image is adjusted through the first category information of each training image, so that the reliability of the weight matrix can be improved, and the technical effect of improving the accuracy of the webpage recognition model generated by training is further improved.

In some embodiments, adjusting parameters of the convolutional neural network model according to the weight matrix includes

And adjusting the coefficient of each convolution layer in the convolutional neural network model according to the weight matrix, wherein the coefficient of each convolution layer is used for determining that each training image is a normal webpage or an abnormal webpage by combining the weight matrix.

In some embodiments, the sample set further comprises: verifying images of normal web pages and verifying images of abnormal web pages; adjusting parameters of the convolutional neural network model according to the weight matrix, including:

adjusting parameters of the convolutional neural network model according to the weight matrix, and determining verification results corresponding to verification images in the sample set based on the adjusted convolutional neural network model, wherein each verification result represents that the verification image corresponding to each verification result is a normal webpage or an abnormal webpage;

and adjusting the parameters of the adjusted convolutional neural network model according to each verification result.

In the embodiment, verification features are introduced, that is, the adjusted convolutional neural network model is verified through verification images (including the verification image of the normal webpage and the verification image of the abnormal webpage), so that a webpage identification model is obtained, and the technical effects of accuracy and reliability of the webpage identification model are further improved.

In some embodiments, each verification image has second category information characterizing that the verification image is a normal web page or an abnormal web page; adjusting parameters of the adjusted convolutional neural network model according to each verification result, wherein the adjusting comprises the following steps:

and adjusting parameters of the adjusted convolutional neural network model based on each verification result and the second class information of the verification image corresponding to each verification result.

In some embodiments, the pixels of the training images in the sample set are the same.

In the embodiment, the training image with the same pixels is adopted to train the convolutional neural network model, so that the technical effects of the training efficiency and the training reliability can be improved.

In a second aspect, an embodiment of the present application provides a method for web page identification, where the method includes:

acquiring a webpage to be identified;

identifying the webpage to be identified based on a pre-trained webpage identification model to obtain an identification result; the webpage identification model is obtained by constructing a weight matrix corresponding to each training image according to a preset convolutional neural network model and adjusting parameters of the convolutional neural network model according to the weight matrix, wherein each weight in the weight matrix represents the probability that the training image corresponding to each weight is an abnormal webpage, and the identification result is a normal webpage or an abnormal webpage.

In this embodiment, the webpage to be recognized is recognized through the webpage recognition model, so that the problems of low recognition efficiency and low reliability caused by the fact that a manual recognition mode is adopted in the related technology to recognize the webpage can be avoided, and the technical effects of efficiency and accuracy of webpage recognition are provided.

In some embodiments, identifying the web page to be identified based on a pre-trained web page identification model to obtain an identification result includes:

and extracting the features of the webpage to be identified based on the webpage identification model to obtain the image features corresponding to the webpage to be identified, and determining the identification result based on the image features corresponding to the webpage to be identified.

In some embodiments, determining the recognition result based on the image feature corresponding to the web page to be recognized includes:

and determining the identification result based on the image characteristics corresponding to the webpage to be identified and the coefficients of all the convolution layers of the webpage identification model.

In some embodiments, after obtaining the web page to be identified, the method further comprises:

determining pixels of a webpage to be identified;

and if the pixels of the webpage to be recognized are different from the pixels of the training images, adjusting the pixels of the webpage to be recognized to be the same as the pixels of the training images.

In a third aspect, an embodiment of the present application further provides a device for training a web page recognition model, where the device includes:

an acquisition module to acquire a sample set, wherein the sample set comprises: training images of normal web pages and training images of abnormal web pages;

the characteristic extraction module is used for extracting the characteristics of each training image in the sample set to obtain the image characteristics of each training image;

the training module is used for repeating the following steps until a mature webpage recognition model is obtained: constructing a weight matrix corresponding to each image feature according to a preset convolutional neural network model, and adjusting parameters of the convolutional neural network model according to the weight matrix; each weight in the weight matrix represents, and the probability that the training image corresponding to each weight is an abnormal webpage;

In some embodiments, the training module is configured to, for each image feature, determine a probability that a training image corresponding to the each image feature is an abnormal web page, assign a weight to the image feature of the each training image according to the probability, and construct the weight matrix according to the weight corresponding to each image feature.

In some embodiments, each of the training images has first category information, the first category information represents that the training image is a normal webpage or an abnormal webpage; the training module is configured to adjust a weight corresponding to the image feature of each training image based on the first category information of each training image to obtain an adjusted weight of the image feature of each training image, and obtain the weight matrix according to each adjusted weight.

In some embodiments, the training module is configured to adjust coefficients of each convolutional layer in the convolutional neural network model according to the weight matrix, where the coefficients of the convolutional layer are used to determine that each training image is a normal web page or an abnormal web page in combination with the weight matrix.

In some embodiments, the sample set further comprises: verifying images of normal web pages and verifying images of abnormal web pages; the training module is used for adjusting parameters of the convolutional neural network model according to the weight matrix, determining verification results corresponding to the verification images in the sample set based on the adjusted convolutional neural network model, wherein each verification result represents that the verification image corresponding to each verification result is a normal webpage or an abnormal webpage, and adjusting the parameters of the adjusted convolutional neural network model according to each verification result.

In some embodiments, each verification image has second category information characterizing that the verification image is a normal web page or an abnormal web page; and the training module is used for adjusting the parameters of the adjusted convolutional neural network model based on each verification result and the second class information of the verification image corresponding to each verification result.

In a fourth aspect, an embodiment of the present application provides an apparatus for web page identification, where the apparatus includes:

the acquisition module is used for acquiring a webpage to be identified;

the identification module is used for identifying the webpage to be identified based on a pre-trained webpage identification model to obtain an identification result; the webpage identification model is obtained by constructing a weight matrix corresponding to each training image according to a preset convolutional neural network model and adjusting parameters of the convolutional neural network model according to the weight matrix, wherein each weight in the weight matrix represents the probability that the training image corresponding to each weight is an abnormal webpage, and the identification result is a normal webpage or an abnormal webpage.

In some embodiments, the identification module is configured to perform feature extraction on the to-be-identified web page based on the web page identification model to obtain an image feature corresponding to the to-be-identified web page, and determine the identification result based on the image feature corresponding to the to-be-identified web page.

In some embodiments, the identification module is configured to determine the identification result based on image features corresponding to the web page to be identified and coefficients of each convolution layer of the web page identification model.

In some embodiments, the apparatus further comprises:

the determining module is used for determining pixels of the webpage to be identified;

and the adjusting module is used for adjusting the pixels of the webpage to be identified to be the same as the pixels of the training images if the pixels of the webpage to be identified are different from the pixels of the training images.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor;

a memory; a memory for storing the processor-executable instructions;

wherein the processor is configured to perform the method of training a web page recognition model as described above in the first aspect; alternatively, the first and second electrodes may be,

the processor is configured to perform the method of web page identification as described in the second aspect above.

In a sixth aspect, the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and when executed by a processor, the computer-executable instructions are used to implement the method for training a web page recognition model according to the first aspect; alternatively, the first and second electrodes may be,

the computer executable instructions, when executed by a processor, are for implementing a method of web page identification as described in the second aspect above.

According to the training method and device for the webpage recognition model, the webpage recognition method, the electronic device and the storage medium, the convolutional neural network model is trained through the training image based on the normal webpage and the training image based on the abnormal webpage, and the technical means for generating the webpage recognition model for recognizing the webpage are generated.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a schematic flowchart illustrating a method for training a web page recognition model according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a method for training a web page recognition model according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a method for training a web page recognition model according to another embodiment of the present application;

FIG. 4 is a flowchart illustrating a method for identifying a web page according to an embodiment of the present application;

fig. 5 is a schematic diagram of an application scenario of a method for web page identification according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating a method for webpage identification according to another embodiment of the present application;

FIG. 7 is a schematic diagram of an apparatus for training a web page recognition model according to an embodiment of the present application;

FIG. 8 is a diagram illustrating an apparatus for web page recognition according to an embodiment of the present application;

FIG. 9 is a diagram illustrating an apparatus for web page recognition according to another embodiment of the present application;

fig. 10 is a schematic diagram of an electronic device according to an embodiment of the present application.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terms referred to in the embodiments of the present application are explained as follows:

web page: it is a basic element constituting a web site, and is a platform for bearing various web site applications, and web pages are usually provided with pictures by image files and can be read by a web browser.

And (4) normal web pages: the web page is a web page which can be normally opened under the condition that the network is normal.

Abnormal web pages: the web page that cannot be normally opened in the case of a network anomaly is referred to, for example, a web page that reports an error, specifically, a web page that includes error information such as "web page deleted" and "error".

Image characteristics: refers to one or more of color features, pixel features, texture features, shape features, and spatial relationship features of an image.

Fig. 1 is a flowchart illustrating a method for training a web page recognition model according to an embodiment of the present application.

As shown in fig. 1, the method includes:

s101: acquiring a sample set, wherein the sample set comprises: training images of normal web pages, and training images of abnormal web pages.

For example, the execution subject of the embodiment may be a training device of a web page recognition model (hereinafter, referred to as a training device for short), and the training device may be a computer, a server (which may be a cloud server or a local server), a terminal device, a processor, a chip, and the like.

It is worth mentioning that the number of training images can be set based on demand, history, and trial, etc.

For example, the principle to set the number of training images based on demand may be: for training requirements of relatively high accuracy, the number of set training images may be relatively large, and correspondingly, for training requirements of relatively low accuracy, the number of set training images may be relatively small.

The training images include training images of normal web pages and training images of abnormal web pages, the number of the training images of the normal web pages may be the same as that of the training images of the abnormal web pages, the number of the training images of the normal web pages may also be different from that of the training images of the abnormal web pages, and the ratio of the training images to the training images of the abnormal web pages may be specifically set based on the characteristics, requirements and the like of websites, which is not limited in this embodiment.

For example, with reference to the schematic diagram shown in fig. 2, the sample set includes the training images shown in fig. 2, and the training images include training images of N normal web pages and training images of M abnormal web pages, where M and N are integers greater than 1, and the sizes of M and N are not limited in this embodiment.

S102: and performing feature extraction on each training image in the sample set to obtain the image features of each training image.

As can be seen from the above description, the image feature refers to one or more of a color feature, a pixel feature, a texture feature, a shape feature, and a spatial relationship feature of an image, and therefore, in this embodiment, for each training image, the training device may extract one or more of the color feature, the pixel feature, the texture feature, the shape feature, and the spatial relationship feature of each training image to obtain the image feature of each training image.

For example, combining the schematic diagram shown in fig. 2 and the above example, the training apparatus may perform feature extraction on each training image to obtain an image feature of each of the (M + N) training images, that is, to obtain (M + N) image features, such as image feature 1 to image feature (M + N) shown in fig. 2.

S103: repeating the following steps until a mature webpage identification model is obtained: constructing a weight matrix corresponding to each image feature according to a preset convolutional neural network model, and adjusting parameters of the convolutional neural network model according to the weight matrix; each weight in the weight matrix represents the probability that the training image corresponding to each weight is an abnormal webpage.

The webpage identification model is used for identifying normal webpages and abnormal webpages.

It should be noted that, in this embodiment, parameters of the convolutional neural network model are not limited, such as the number of channels and the number of convolutional kernels.

In combination with the schematic diagram shown in fig. 2 and the above example, a convolutional neural network model may be provided in the training apparatus, and the convolutional neural network model may be trained based on each image feature.

In some embodiments, features of each training image may also be extracted by a convolutional neural network model. For example, the convolutional neural network model may include an input layer, and the input layer may be used to receive training images (including training images of normal web pages and training images of abnormal web pages), and the convolutional neural network model may include a feature extraction layer for feature extraction of the training images transmitted by the input layer.

Based on the above analysis, an embodiment of the present application provides a method for training a web page recognition model, where the method includes: acquiring a sample set, wherein the sample set comprises: the method comprises the following steps of carrying out feature extraction on each training image in a sample set by using training images of normal webpages and training images of abnormal webpages to obtain image features of each training image, and repeating the following steps until a mature webpage identification model is obtained: constructing a weight matrix corresponding to each image feature according to a preset convolutional neural network model, and adjusting parameters of the convolutional neural network model according to the weight matrix; in the embodiment, the convolutional neural network model is trained through the training images based on the training images of the normal web pages and the training images of the abnormal web pages to generate the web page identification model for identifying the normal web pages and the abnormal web pages, so that the problem of low reliability caused by a manual identification mode in the related technology can be solved, the identification reliability is improved, and the intelligent technical effect of web page identification is realized.

Fig. 3 is a flowchart illustrating a method for training a web page recognition model according to another embodiment of the present application.

As shown in fig. 3, the method includes:

s201: acquiring a sample set, wherein the sample set comprises: training images of normal web pages, and training images of abnormal web pages.

For example, the description about S201 may refer to S101, and is not described herein again.

That is, the pixels of the training image of the normal web page are the same as those of the training image of the abnormal web page.

It should be noted that, in this embodiment, by selecting the training images of the normal web pages and the training images of the abnormal web pages with the same pixel, each image feature extracted subsequently can have higher comparability, so that the subsequent training time and resources can be saved, the training efficiency is improved, and the technical effect of improving the reliability of the trained web page recognition model is achieved.

In some embodiments, for different websites, the web page recognition models corresponding to the different websites may be trained to provide personalization of the web page recognition models, so as to provide recognition reliability when the web page recognition mode is subsequently applied to recognize the web page. For example, for website a, a web page recognition model for recognizing web pages of website a may be trained so as to recognize normal web pages and/or abnormal web pages from the web pages of website a.

In other embodiments, based on the similarity of the web pages of the websites, for the websites with the similarity greater than the preset similarity threshold, a web page recognition model may be trained, so as to save training resources and improve the flexibility of the web page recognition model.

S202: and performing feature extraction on each training image in the sample set to obtain the image features of each training image.

For example, the description about S202 may refer to S102, which is not described herein.

S203: and constructing a weight matrix corresponding to each image characteristic according to a preset convolutional neural network model.

In some embodiments, S203 may include the steps of:

step 1: and aiming at each image feature, determining the probability that the training image corresponding to each image feature is an abnormal webpage.

Based on the above analysis, the image features refer to one or more of color features, pixel features, texture features, shape features, and spatial relationship features of the image, and therefore, in the embodiment, after the image features of each training image are determined, each image feature may be analyzed to obtain a probability that the training image corresponding to each image feature is an abnormal image.

For example, if the image feature is a pixel feature of a training image, the training apparatus may determine whether the training image includes error information corresponding to an abnormal web page, such as "web page deleted" or "error" based on the pixel feature.

Step 2: weights are assigned to the image features of each training image according to the probabilities.

In an example, the higher the probability is, the larger the assigned weight is, when the web page to be identified is identified based on the web page identification model, the larger the obtained numerical value of the identification result is, the higher the possibility that the web page to be identified is an abnormal web page is, and accordingly, the identification result may be that the web page to be identified is an abnormal web page.

In another example, the lower the probability is, the greater the assigned weight is, when the web page to be identified is identified based on the web page identification model, the smaller the obtained numerical value of the identification result is, the greater the possibility that the web page to be identified is an abnormal web page is, and accordingly, the identification result may be that the web page to be identified is an abnormal web page.

And step 3: and constructing a weight matrix according to the weight corresponding to each image characteristic.

In some embodiments, each training image has first class information, the first class information represents that the training image is a normal web page or an abnormal web page, and step 3 may include: and adjusting the weight corresponding to the image feature of each training image based on the first class information of each training image to obtain the adjusted weight of the image feature of each training image, and obtaining a weight matrix according to each adjusted weight.

For example, the first category information of each training image may be determined by the training device, and each training image may be labeled; or other devices can determine the first class information of each training image and label each training image; the first category information of each training image may also be determined in a manual manner, and labeling processing may be performed on each training image, and the like, which is not limited in this embodiment.

S204: and adjusting the coefficient of each convolution layer in the convolutional neural network model according to the weight matrix, wherein the coefficient of each convolution layer is used for determining that each training image is a normal webpage or an abnormal webpage by combining the weight matrix.

That is, the adjustment of the convolutional neural network model may be substantially the adjustment of the coefficient of each convolutional layer of the convolutional neural network model, and specifically may be the adjustment of the coefficient of each convolutional layer of the convolutional neural network model based on the weight matrix.

In some embodiments, step S204 may include: determining a training value based on the weight matrix and the coefficients of the convolutional layers, determining a loss value between the training value and a preset calibration value (determined based on the first class information), determining an amplitude for adjusting the coefficients of each convolutional layer based on the loss value, and adjusting the coefficients of each convolutional layer based on the amplitude.

S205: and determining the verification result corresponding to each verification image in the sample set based on the adjusted convolutional neural network model.

Wherein the sample set further comprises: the verification images of the normal web pages and the verification images of the abnormal web pages, each verification result represents, and the verification image corresponding to each verification result is a normal web page or an abnormal web page.

Similarly, the number of the verification images is not limited in the embodiment, and the number relationship between the verification images of the normal web page and the verification images of the abnormal web page is not limited.

In the embodiment, the feature of verifying the adjusted convolutional neural network model is introduced, so that the technical effects of accuracy and reliability of the webpage identification model can be improved.

For example, this step may include: and performing feature extraction on each verification image based on the adjusted convolutional neural network model to obtain the image features of each verification image, and determining a verification result based on the image features of each verification image and the coefficients of each convolutional layer of the adjusted convolutional neural network model.

S206: and adjusting the parameters of the adjusted convolutional neural network model according to each verification result.

In some embodiments, S206 may include: and adjusting parameters of the adjusted convolutional neural network model based on each verification result and the second class information of the verification image corresponding to each verification result.

Each verification image has second category information, the second category information represents, and the verification images are normal webpages or abnormal webpages.

Similarly, the step may include: and comparing the verification results with calibration values (determined based on the second type information) corresponding to the verification images to obtain loss values of the verification results and the verification images, determining the amplitude of the parameter for adjusting the adjusted convolutional neural network model based on the loss values, and adjusting the coefficient of each convolutional layer based on the amplitude.

Fig. 4 is a flowchart illustrating a method for web page identification according to an embodiment of the present application.

As shown in fig. 4, the method includes:

s301: and acquiring the webpage to be identified.

For example, the execution subject of the embodiment may be a web page recognition device, and the web page recognition device may be a computer, a server (which may be a cloud server or a local server), a terminal device, a processor, a chip, and the like.

It should be noted that the web page recognition device and the training device may be the same device or different devices, and this embodiment is not limited thereto.

In an example, the web page recognition apparatus may capture a web page output by the browser based on a preset time interval, where the captured web page is a web page to be recognized.

In another example, the web page identification apparatus may determine time information of each web page captured based on time information of each web page output by the browser, and the captured web page is the web page to be identified.

In another example, the device for web page identification may capture a corresponding web page based on feedback information of the abnormal web page, and the captured web page is the web page to be identified.

In still another example, the device for webpage identification may receive a trigger instruction of a worker, and obtain the webpage to be identified based on the trigger instruction.

It should be understood that the above examples are only used for exemplary illustration, and the triggering manner for acquiring the web page to be identified is not to be construed as a limitation on the acquisition of the web page to be identified.

For example, the method for identifying a web page of the present embodiment may be applied to the application scenario shown in fig. 5, as shown in fig. 5, the user terminal 100 (exemplarily shown in fig. 5 by taking the user terminal as a laptop) may install a browser, the user 200 may browse the web page through the browser of the user terminal 100, and the device 300 for identifying a web page (exemplarily shown in fig. 5 by taking the device for identifying a web page as a server) may obtain the web page to be identified based on at least one of the above methods.

It should be noted that, in the related art, when a page browsed by a user is an abnormal page (e.g., a web page displayed by a notebook computer shown in fig. 5), the user may send a prompt message to a background server of a website corresponding to the abnormal page through a user terminal, so as to prompt a web page error, and after a worker of the background server obtains the prompt message of the web page error through the background server, the worker may perform operations such as analysis on a cause of the web page error.

Alternatively, the staff member of the background server may check whether each web page is an abnormal web page periodically or aperiodically.

However, the above method in the related art may cause problems such as missing inspection or misinspection, and the inspection efficiency is low.

The inventor of the present application has obtained the inventive concept of the present embodiment through creative efforts: the webpage is identified based on the webpage identification model, so that the technical effects of improving the identification accuracy and reliability are achieved.

S302: and identifying the webpage to be identified based on the pre-trained webpage identification model to obtain an identification result.

The webpage identification model is obtained by constructing a weight matrix corresponding to each training image according to a preset convolutional neural network model and adjusting parameters of the convolutional neural network model according to the weight matrix, wherein each weight representation in the weight matrix is the probability that the training image corresponding to each weight is an abnormal webpage, and the identification result is a normal webpage or an abnormal webpage.

For a specific training method of the web page recognition model, reference may be made to the above example, for example, the method shown in any one of fig. 1 to fig. 3, which is not described herein again.

In this embodiment, the web page recognition model generated by training of the training device may be embedded into the web page recognition device, and the web page to be recognized is recognized, so as to determine that the web page to be recognized is a normal web page or an abnormal web page.

It is worth to be noted that, in this embodiment, by identifying the web page to be identified based on the web page identification model, it is possible to avoid the following problems caused by a manual identification-based method in the related art: the method has the advantages of low identification efficiency and low accuracy, improves the efficiency and accuracy of webpage identification, and improves the intelligent technical effect of webpage identification.

Fig. 6 is a flowchart illustrating a method for web page identification according to another embodiment of the present application.

As shown in fig. 6, the method includes:

s401: and acquiring the webpage to be identified.

For example, the description about S401 may refer to S101, which is not described again here.

S402: and determining pixels of the webpage to be identified.

S403: and if the pixels of the webpage to be recognized are different from the pixels of the training images, adjusting the pixels of the webpage to be recognized to be the same as the pixels of the training images.

Based on the analysis, it can be known that training of the convolutional neural network model by using training images with the same pixels can improve training efficiency, and can improve accuracy and reliability of identification of the web page by the web page identification model generated by training, therefore, in this embodiment, before the web page to be identified is identified by the web page identification model, the pixels of the web page to be identified are determined, and when the pixels of the web page to be identified are different from those of each training image, the pixels of the web page to be identified are adjusted, so that the pixels of the web page to be identified are the same as those of each training image, thereby achieving the technical effect of improving accuracy and reliability of the identification result.

Exemplarily, S403 may include: judging whether the pixels of the webpage to be recognized are the same as the pixels of the training images or not, if not, adjusting the pixels of the webpage to be recognized to enable the adjusted pixels to be the same as the pixels of the training images; and if so, executing the subsequent step of identifying the webpage to be identified based on the webpage identification model.

It should be noted that, in other embodiments, when the web page recognition apparatus acquires the web page to be recognized, the pixels of each training image may be used as the pixels for acquiring the web page to be recognized, for example, the web page recognition apparatus takes the pixels of each training image as a reference, and captures an image with the same pixels from the web page as the web page to be recognized.

S404: and extracting the features of the webpage to be recognized based on the webpage recognition model to obtain the image features corresponding to the webpage to be recognized.

For an exemplary description about S404, reference may be made to the principle of feature extraction of the training apparatus in the above example, which is not described herein again.

S405: and determining a recognition result based on the image characteristics corresponding to the webpage to be recognized.

In some embodiments, S405 may include: and determining the identification result based on the image characteristics corresponding to the webpage to be identified and the coefficients of all the convolution layers of the webpage identification model.

Fig. 7 is a schematic diagram of a device for training a web page recognition model according to an embodiment of the present application.

As shown in fig. 7, the apparatus includes:

an acquisition module 11 configured to acquire a sample set, wherein the sample set includes: training images of normal web pages and training images of abnormal web pages;

a feature extraction module 12, configured to perform feature extraction on each training image in the sample set to obtain an image feature of each training image;

the training module 13 is configured to repeat the following steps until a mature web page recognition model is obtained: constructing a weight matrix corresponding to each image feature according to a preset convolutional neural network model, and adjusting parameters of the convolutional neural network model according to the weight matrix; each weight in the weight matrix represents, and the probability that the training image corresponding to each weight is an abnormal webpage;

In some embodiments, the training module 13 is configured to, for each image feature, determine a probability that a training image corresponding to the each image feature is an abnormal web page, assign a weight to the image feature of the each training image according to the probability, and construct the weight matrix according to the weight corresponding to each image feature.

In some embodiments, each of the training images has first category information, the first category information represents that the training image is a normal webpage or an abnormal webpage; the training module 13 is configured to adjust the weight corresponding to the image feature of each training image based on the first category information of each training image, to obtain an adjusted weight of the image feature of each training image, and to obtain the weight matrix according to each adjusted weight.

In some embodiments, the training module 13 is configured to adjust coefficients of each convolutional layer in the convolutional neural network model according to the weight matrix, where the coefficients of the convolutional layer are used to determine that each training image is a normal web page or an abnormal web page in combination with the weight matrix.

In some embodiments, the sample set further comprises: verifying images of normal web pages and verifying images of abnormal web pages; the training module 13 is configured to adjust parameters of the convolutional neural network model according to the weight matrix, determine a validation result corresponding to each validation image in the sample set based on the adjusted convolutional neural network model, where each validation result represents that the validation image corresponding to each validation result is a normal web page or an abnormal web page, and adjust the parameters of the adjusted convolutional neural network model according to each validation result.

In some embodiments, each verification image has second category information characterizing that the verification image is a normal web page or an abnormal web page; the training module 13 is configured to adjust parameters of the adjusted convolutional neural network model based on each verification result and the second class information of the verification image corresponding to each verification result.

Fig. 8 is a schematic diagram of an apparatus for web page recognition according to an embodiment of the present application.

As shown in fig. 8, the apparatus includes:

the acquiring module 21 is used for acquiring a webpage to be identified;

the identification module 22 is configured to identify the web page to be identified based on a pre-trained web page identification model to obtain an identification result; the webpage identification model is obtained by constructing a weight matrix corresponding to each training image according to a preset convolutional neural network model and adjusting parameters of the convolutional neural network model according to the weight matrix, wherein each weight in the weight matrix represents the probability that the training image corresponding to each weight is an abnormal webpage, and the identification result is a normal webpage or an abnormal webpage.

In some embodiments, the identification module 22 is configured to perform feature extraction on the web page to be identified based on the web page identification model, obtain an image feature corresponding to the web page to be identified, and determine the identification result based on the image feature corresponding to the web page to be identified.

In some embodiments, the identification module 22 is configured to determine the identification result based on the image features corresponding to the web page to be identified and the coefficients of the respective convolution layers of the web page identification model.

Fig. 9 is a schematic diagram of an apparatus for web page recognition according to another embodiment of the present application.

As shown in fig. 9, the apparatus further includes:

a determining module 23, configured to determine pixels of a web page to be identified;

an adjusting module 24, configured to adjust the pixels of the to-be-identified web page to be the same as the pixels of the training images if the pixels of the to-be-identified web page are different from the pixels of the training images.

According to another aspect of the embodiments of the present application, an electronic device and a readable storage medium are also provided.

Referring to fig. 10, fig. 10 is a schematic view of an electronic device according to an embodiment of the disclosure.

Electronic devices are intended to represent, among other things, various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 10, the electronic apparatus includes: one or more processors 101, memory 102, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 10 illustrates an example of one processor 101.

Memory 102 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the method for training the web page recognition model and the method for web page recognition provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the method of training a web page recognition model and the method of web page recognition provided by the present application.

Memory 102, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules. The processor 101 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 102, that is, implementing the training method of the web page recognition model and the method of web page recognition in the above method embodiments.

The memory 102 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 102 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 102 may optionally include memory located remotely from processor 101, which may be connected to an electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device 103 and an output device 104. The processor 101, the memory 102, the input device 103, and the output device 104 may be connected by a bus or other means, and the bus connection is exemplified in fig. 10.

The input device 103 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 104 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of training a web page recognition model, the method comprising:

2. The method of claim 1, wherein constructing a weight matrix corresponding to each image feature according to a preset convolutional neural network model comprises:

3. The method of claim 2, wherein each training image has first category information, the first category information characterizes the training image as a normal web page or an abnormal web page; according to the weight corresponding to each image feature, the weight matrix is constructed, and the method comprises the following steps:

and obtaining the weight matrix according to each adjusted weight.

4. The method of claim 1, wherein adjusting parameters of the convolutional neural network model according to the weight matrix comprises

5. The method of any of claims 1-4, wherein the sample set further comprises: verifying images of normal web pages and verifying images of abnormal web pages; adjusting parameters of the convolutional neural network model according to the weight matrix, including:

6. The method of claim 5, wherein each verification image has second category information characterizing that the verification image is a normal web page or an abnormal web page; adjusting parameters of the adjusted convolutional neural network model according to each verification result, wherein the adjusting comprises the following steps:

7. The method of any of claims 1 to 4, wherein the pixels of the training images in the sample set are the same.

8. A method of web page identification, the method comprising:

acquiring a webpage to be identified;

9. The method of claim 8, wherein identifying the webpage to be identified based on a pre-trained webpage identification model to obtain an identification result comprises:

10. The method of claim 9, wherein determining the recognition result based on the image feature corresponding to the web page to be recognized comprises:

11. The method according to any one of claims 8 to 10, after acquiring the web page to be identified, the method further comprising:

determining pixels of a webpage to be identified;

12. An apparatus for training a web page recognition model, the apparatus comprising:

13. An apparatus for web page identification, the apparatus comprising:

the acquisition module is used for acquiring a webpage to be identified;

14. An electronic device, comprising: a memory, a processor;

a memory; a memory for storing the processor-executable instructions;

wherein the processor is configured to perform a training method of the web page recognition model according to any one of claims 1 to 7; alternatively, the first and second electrodes may be,

the processor is configured to perform a method of web page identification as claimed in any of claims 8 to 11.

15. A computer-readable storage medium having stored therein computer-executable instructions for implementing a method of training a web page recognition model according to any one of claims 1 to 7 when executed by a processor; alternatively, the first and second electrodes may be,

the computer executable instructions are for implementing a method of web page identification as claimed in any one of claims 8 to 11 when executed by a processor.