CN113222022A

CN113222022A - Webpage classification identification method and device

Info

Publication number: CN113222022A
Application number: CN202110522326.0A
Authority: CN
Inventors: 颜林
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2021-08-06

Abstract

One or more embodiments of the present specification provide a method, an apparatus, an electronic device, and a machine-readable storage medium for webpage classification recognition, where the method includes: generating a corresponding image feature vector based on the image features extracted from the target page; acquiring a text feature vector corresponding to the text extracted from the target page; and inputting the image feature vector and the text feature vector as input data into a pre-trained Bert model for classification calculation to obtain a classification result corresponding to the target page.

Description

Webpage classification identification method and device

Technical Field

One or more embodiments of the present disclosure relate to the technical field of artificial intelligence, and in particular, to a method and an apparatus for webpage classification and identification, an electronic device, and a machine-readable storage medium.

Background

As online and offline industries become increasingly affiliated, more and more offline merchants may access online internet platforms to sell goods or provide services, thereby increasing the number of customers. When a merchant requests to access the internet platform, in order to avoid the risk brought to the internet platform by the merchant related to illegal industries and sensitive industries, the internet platform can carry out risk control on the merchant requesting to access.

In practical application, the internet platform can identify the industry where the merchant requesting access is located, and carry out risk control based on the result of merchant industry identification. For example, the server may first read text data from a merchant webpage provided by a merchant to the user through a crawler, and then input the text data read by the crawler into a classification model trained by text multi-label classification learning, so as to identify an industry classification corresponding to the merchant webpage.

Disclosure of Invention

The application provides a webpage classification and identification method, which comprises the following steps:

generating a corresponding image feature vector based on the image features extracted from the target page;

acquiring a text feature vector corresponding to the text extracted from the target page;

and inputting the image feature vector and the text feature vector as input data into a pre-trained Bert model for classification calculation to obtain a classification result corresponding to the target page.

Optionally, the generating a corresponding image feature vector based on the image feature extracted from the target page includes:

and extracting image features from the target page based on a ResNet model, and generating corresponding image feature vectors based on the image features.

Optionally, before extracting image features from the target page based on the ResNet model, the method further includes:

generating a rendering image corresponding to the target webpage based on a Headless Browser technology;

the extracting image features from the target page based on the ResNet model comprises the following steps:

and extracting image features from the rendered image based on a ResNet model.

acquiring the image loaded in the target page through a crawler program;

and extracting image features from the images loaded in the target page collected by the crawler program based on a ResNet model.

Optionally, the feature dimensions of the image feature vector and the text feature vector are the same;

before the image feature vector and the text feature vector are used as input data and input into a pre-trained Bert model for classification calculation, the method further includes:

and performing dimensionality reduction on the generated image feature vector to obtain an image feature vector with the same feature dimensionality as the text feature vector.

Optionally, the obtaining a text feature vector corresponding to a text extracted from the target page includes:

and inputting the text extracted from the target page into a pre-trained Bert model, performing embedding processing by an embedding layer of the Bert model, and acquiring a text feature vector which is output by the embedding layer of the Bert model and corresponds to text characters in the text.

Optionally, the inputting the pre-trained Bert model to perform classification calculation by using the image feature vector and the text feature vector as input data includes:

splicing the image feature vector and the text feature vector to generate a multi-modal vector;

and inputting the multi-modal vector as the input data into a pre-trained Bert model for classification calculation.

Optionally, the using the multi-modal vector as the input data and inputting a pre-trained Bert model for classification calculation includes:

inputting the multi-modal vector as the input data into a pre-trained Bert model, and carrying out coding processing by a coding layer of the Bert model;

and continuously inputting the encoding processing result aiming at the multi-modal vector output by the encoding layer of the Bert model into the classification layer of the Bert model for classification calculation.

The application also provides a webpage classification recognition device, the device includes:

the image processing unit is used for generating a corresponding image feature vector based on the image features extracted from the target page;

the text processing unit is used for acquiring a text feature vector corresponding to the text extracted from the target page;

and the classification unit is used for inputting the image characteristic vector and the text characteristic vector as input data into a pre-trained Bert model for classification calculation to obtain a classification result corresponding to the target page.

Optionally, the image processing unit is specifically configured to:

Optionally, the image processing unit is further configured to:

the image processing unit is specifically configured to:

and extracting image features from the rendered image based on a ResNet model.

Optionally, the image processing unit is further configured to:

acquiring the image loaded in the target page through a crawler program;

the image processing unit is specifically configured to:

the image processing unit is further configured to:

Optionally, the text processing unit is specifically configured to:

and inputting the text extracted from the target page into a pre-trained Bert model, performing embedding processing by an embedding layer of the Bert model, and obtaining a text feature vector which is output by the embedding layer of the Bert model and corresponds to text characters in the text.

Optionally, the classification unit is specifically configured to:

The application also provides an electronic device, which comprises a communication interface, a processor, a memory and a bus, wherein the communication interface, the processor and the memory are mutually connected through the bus;

the memory stores machine-readable instructions, and the processor executes the method by calling the machine-readable instructions.

The present application also provides a machine-readable storage medium having stored thereon machine-readable instructions which, when invoked and executed by a processor, implement the above-described method.

By the embodiment, the image feature vector corresponding to the image feature extracted from the target page and the text feature vector corresponding to the text extracted from the target page are used as input data, and the pre-trained Bert model is input for classification calculation, so that for the Bert model, the input data can be expanded from a single text feature vector to multi-modal input data comprising the image feature vector and the text feature vector, the text feature and the image feature in the target page can be combined for classification calculation, and the accuracy of classification calculation of the Bert model is improved.

Drawings

FIG. 1 is a diagram illustrating a Bert model in accordance with an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a method for web page category identification in accordance with an exemplary embodiment;

FIG. 3 is a diagram illustrating a web page category identification method, according to an exemplary embodiment;

FIG. 4 is a schematic structural diagram of an electronic device in which a web page classification and identification apparatus is located according to an exemplary embodiment;

fig. 5 is a block diagram illustrating an apparatus for web page classification recognition according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.

It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.

For example, in the case of business industry identification, offline businesses typically need to provide business web pages to users when accessing internet platforms. The internet platform can identify the business industry by identifying the industry classification corresponding to the business webpage, and further carry out risk control on the business accessed to the platform based on the business industry identification result.

For example, during implementation, the server may obtain a URL address of a merchant page that needs to be identified by a merchant industry, extract text data from the merchant page corresponding to the URL address through a crawler program, and further input the extracted text data into a pre-trained classification model to obtain an industry classification result output by the classification model and corresponding to the merchant page.

It should be noted that, in the application scenario shown above, the description is not limited with respect to a specific implementation manner in which the server acquires a merchant page that needs to be identified by a merchant industry; for example, a merchant industry identification interface may be provided for a user, so that the user may input a URL address of a merchant page that needs merchant industry identification through the merchant industry identification interface, and may further obtain a merchant web page corresponding to the URL address input by the user; for another example, when monitoring that a merchant initiates an access request for a platform, the service end may acquire a merchant webpage of the merchant, and automatically perform merchant industry identification on the acquired merchant webpage.

In practical applications, when performing business industry identification on a business page, the classification model may generally adopt a Bert (Bidirectional Encoder Representation from transforms) model. The Bert model is a deep learning model that can perform text classification based on word vectors.

In order to make those skilled in the art better understand the technical solution in the embodiment of the present specification, the following briefly describes the related art of the Bert model related to the embodiment of the present specification.

Referring to fig. 1, fig. 1 is a schematic diagram of a Bert model according to an exemplary embodiment. The Bert model, in general, may include an embedding layer, an encoding layer, and a classification layer.

It should be noted that the embedding layer, the encoding layer, and the classification layer may include non-truly existing physical modules, that is, may include virtual layers partitioned for the Bert model based on actual computing functions. The nomenclature of each layer in the Bert model is not particularly limited in this specification; for example, the embedding layer may also be referred to as an embedding layer, and the encoding layer may also be referred to as a transform encoder layer.

In practical application, a text may be input into the Bert model, an embedding layer of the Bert model performs embedding processing, and a text feature vector (may also be referred to as a word vector) corresponding to each text character in the text output by the embedding layer of the Bert model may be obtained.

Specifically, the embedding layer of the Bert model may perform tokenization splitting on an input text to split a plurality of text characters (tokens); on one hand, adding CLS zone bits before all text characters obtained by splitting to serve as identifiers representing the whole semantics of the text, and on the other hand, respectively adding SEP zone bits after all text characters obtained by splitting to serve as separation identifiers of different sentences (segments); further, word embedding (token embedding), sentence embedding (segment embedding), and position embedding (position embedding) may be performed on each text character obtained by splitting the text, so as to obtain a text feature vector corresponding to each text character.

For example, for any text character obtained by splitting from a text, on one hand, token embedding can be performed on the text character to obtain a first text feature sub-vector corresponding to the text character, on the other hand, segment embedding can be performed according to a sentence to which the text character belongs to obtain a second text feature sub-vector corresponding to the text character, and on the other hand, position embedding can be performed according to the position of the text character in the sentence to obtain a third text feature sub-vector corresponding to the text character; subsequently, the first text feature sub-vector, the second text feature sub-vector and the third text feature sub-vector may be added, and a vector obtained by the addition may be determined as a text feature vector corresponding to the text character.

Note that when segment embedding is performed, different sentences can be identified by A, B, C or the like, or 0, 1, 2, or the like. In addition, for the text feature vectors corresponding to each text character in the text output by the Bert model, the feature dimensions of the text feature vectors are generally the same.

Further, in practical applications, the text feature vector may be input into the Bert model, and the coding layer of the Bert model performs coding processing, so as to obtain a coding processing result for the text feature vector output by the coding layer of the Bert model.

Specifically, the text feature vector output by the embedding layer of the Bert model may be continuously input to the encoding layer of the Bert model for encoding, and an encoding processing result (which may also be referred to as a semantic vector corresponding to the text character) for the text feature vector output by the encoding layer of the Bert model may be obtained. For a specific process of performing the encoding process, please refer to the related art, which is not described herein.

Further, in practical application, the text feature vector or the semantic vector may be input into the Bert model, and classification calculation is performed by the classification layer of the Bert model, so as to obtain a classification result output by the classification layer of the Bert model.

Specifically, the text feature vector or the encoding processing result may be input into the Bert model for classification calculation, that is, the text feature vector or the encoding processing result may be used as an argument of the classification function for classification calculation; and obtaining a corresponding classification result output by the classification layer of the Bert model, that is, calculating a dependent variable corresponding to the independent variable of the classification function.

Wherein, in the process of training the classification function, the classification result may include each classification result and a probability value of each classification result; in the prediction using the classification function, the classification result may include a classification result with a highest probability value. It should be noted that, according to needs, a person skilled in the art may select different classification functions for the classification layer of the Bert model, which is not limited in this specification.

For example, the classification function is a softmax function, which may be denoted as f (x) ═ Wx + b; wherein, x can be a coding processing result output by a coding layer of the Bert model; w may be a matrix of D × C, D being the number of feature dimensions of the encoding process result, C being the total number of classification results; b may be a C-dimensional vector; f (x) classification results that may be output for the classification layer of the Bert model.

Therefore, in the embodiment shown above, the text extracted from the merchant webpage may be used as input data, and the pre-trained Bert model is input for classification calculation, so as to obtain a classification result corresponding to the merchant webpage. However, the input data of the Bert model is usually only text data, so that the obtained classification result of business industry identification is not accurate.

In view of this, the present specification aims to provide a technical solution for expanding input data of a Bert model, using an image feature vector and a text feature vector corresponding to a target page as input data, and inputting the Bert model to perform classification calculation to implement classification and identification for the target web page.

When the method is realized, the server side can generate a corresponding image feature vector based on the image features extracted from the target page; text feature vectors corresponding to texts extracted from the target page can be acquired; further, the image feature vector and the text feature vector may be used as input data, and a pre-trained Bert model is input for performing classification calculation to obtain a classification result corresponding to the target page.

Therefore, in the technical solution in this specification, since the image feature vector corresponding to the image feature extracted from the target page and the text feature vector corresponding to the text extracted from the target page can both be used as input data, a pre-trained Bert model is input for classification calculation; therefore, for the Bert model, the input data can be expanded from a single text feature vector to multi-modal input data containing an image feature vector and a text feature vector, so that the text feature and the image feature in the target page can be combined to perform classification calculation, and the accuracy of classification calculation performed by the Bert model is improved.

The present application is described below with reference to specific embodiments and specific application scenarios.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for classifying and identifying web pages according to an exemplary embodiment, where the method performs the following steps:

step 202: generating a corresponding image feature vector based on the image features extracted from the target page;

step 204: acquiring a text feature vector corresponding to the text extracted from the target page;

step 206: and inputting the image feature vector and the text feature vector as input data into a pre-trained Bert model for classification calculation to obtain a classification result corresponding to the target page.

In this specification, the target page may include a page to be classified and identified.

For example, the target page may specifically include a merchant page to be subject to merchant industry identification.

It should be noted that, regarding the specific type of the target page, the specification is not limited; in practical application, besides performing merchant industry identification on merchant pages, the webpage identification method can also be applied to other scenes in which webpages need to be classified and identified, and the target pages can also include other pages to be classified and identified.

In this specification, in order to improve the accuracy of the classification calculation performed by the Bert model, the basis of the classification calculation performed by the Bert model may be expanded from a single text feature to an image feature and a text feature corresponding to the target page; that is, the input data of the sort calculation performed by the Bert model can be expanded from a single text feature vector to a text feature vector and an image feature vector.

In the implementation process, image features can be respectively extracted from a target page to generate corresponding image feature vectors, text features can be extracted from the target page to generate corresponding text feature vectors, the generated image feature vectors and the generated text feature vectors are used as input data and input into the Bert model together for classification calculation, and then the classification result corresponding to the target page can be obtained.

The webpage classification and identification method is described by three parts, namely generating image feature vectors, acquiring text feature vectors and performing classification calculation on a Bert model.

It should be noted that, the sequence of generating the image feature vector and obtaining the text feature vector is not particularly limited in this specification; that is, the execution order of step 202 and step 204 may be interchanged.

(1) Generating image feature vectors

Since the Bert model generally cannot perform classification calculation on directly input images, before the image features extracted from the target page are input into the Bert model for classification calculation, corresponding image feature vectors may be generated based on the image features extracted from the target page.

In this specification, when generating a corresponding image feature vector based on image features extracted from the target page, image features may be specifically extracted from the target page by a pre-trained image feature extraction model, and an image feature vector corresponding to the extracted image features may be generated.

It should be noted that, when extracting image features from the target page and generating corresponding image feature vectors, the specific type of the image feature extraction model used is not particularly limited in this specification; in practical applications, those skilled in the art can select different types of image feature extraction models according to requirements.

In practical applications, as the depth of the deep learning model increases, the accuracy of the deep learning model generally increases to reach a maximum value, and then a greatly reduced "Degradation" phenomenon suddenly occurs. And the ResNet (residual neural network) model can seek balance between linear transformation and nonlinear transformation by adding linear transformation branches in the deep learning model, so that the problem of difficult model training caused by large depth of the deep learning model is solved.

In one embodiment shown, the image feature extraction model may be a ResNet model; in implementation, image features may be extracted from the target page based on the ResNet model, and corresponding image feature vectors may be generated based on the image features.

For example, image features may be extracted from a merchant page to be subject to merchant industry recognition based on a pre-trained ResNet model, and an image feature vector corresponding to the extracted image features may be generated.

In the above illustrated embodiment, since the image feature extraction model is usually a deep learning model with a large depth, the problem of difficulty in model training due to the large depth of the image feature extraction model can be overcome by extracting image features from the target page by using the ResNet model and generating corresponding image feature vectors based on the image features.

In this specification, when extracting an image feature from the target page, image information corresponding to the target page may be specifically acquired, and the image feature corresponding to the target page may be extracted from the acquired image information; or preprocessing the acquired image information corresponding to the target page, and then extracting the image features corresponding to the target page from the preprocessed image.

A specific implementation manner of acquiring the image information corresponding to the target page is not particularly limited in this specification; in practical application, a person skilled in the art can select different implementation manners to acquire the image information corresponding to the target page according to requirements.

In one embodiment shown, the image information corresponding to the target page may include an image loaded in the target page; when the image features are extracted from the target page, the images loaded in the target page are collected through a crawler program, and then the image features are extracted from the images loaded in the target page collected by the crawler program based on a ResNet model.

For example, a crawler program may collect a plurality of images loaded in the merchant page, and then extract image features from the collected images based on a pre-trained ResNet model.

In the above-described embodiment, since the images acquired by the crawler cannot be directly input to the Bert model for classification calculation, the image features may be extracted from the acquired images based on the ResNet model and the corresponding image feature vectors may be generated, and then the generated image feature vectors corresponding to the images described in the target page may be input to the Bert model for classification calculation.

In another embodiment shown, the image information corresponding to the target page may include a rendered image of the target page; when extracting image features from the target page, a rendered image corresponding to the target page may be generated based on a header Browser technology, and then image features may be extracted from the rendered image based on a ResNet model.

Wherein the rendered image may be understood as an image that is consistent with the style that the target page is presented to the user when opened in the browser.

In practical application, based on a Headless Browser technology, the target webpage can be automatically rendered, and a rendered image corresponding to the target webpage is generated by acquiring screen capture, printing a page DOM and the like; the header Browser refers to a Browser that can run in a command line, and a person skilled in the art can write codes to control the header Browser to automatically perform various tasks.

For example, a merchant page may be rendered based on a header Browser technology to generate a rendered image corresponding to the merchant page, and then image features may be extracted from the generated rendered image based on a pre-trained ResNet model.

In the above illustrated embodiment, on one hand, since the generated rendered image cannot be directly input to the Bert model for classification calculation, image features may be extracted from the generated rendered image based on the ResNet model, and corresponding image feature vectors may be generated, and then the generated image feature vectors corresponding to the rendered image of the target page may be input to the Bert model for classification calculation; on the other hand, for target pages such as mobile H5 sites, applets And dynamic web pages which mostly adopt the AJAX (asynchronous JavaScript And XML) technology, the frame information of the target pages cannot be acquired through the crawler program, and the rendered image of the target webpage may include both the frame information of the target webpage and the image information loaded in the target webpage, so that, by extracting the image features from the rendered image of the target webpage, the Bert model may be helped to learn layout features corresponding to the frame information of the target web page, when the image feature vectors corresponding to the image features are input into the Bert model, the Bert model can combine the image features corresponding to the images loaded in the target page and the image features corresponding to the frame information of the target page to perform classification calculation, so that the accuracy of classification calculation performed by the Bert model is further improved.

The specific mode of preprocessing the image information is not particularly limited in this specification; for example, those skilled in the art may perform image cleansing operations such as size transformation, position transformation, and clipping on an image loaded in a target page acquired by a crawler program or a generated rendered image corresponding to the target page as needed.

In this specification, before the generated image feature vector and the text feature vector are input to the Bert model as input data and classified and calculated, the feature dimension of the generated image feature vector may be reduced by performing a dimension reduction process on the generated image feature vector.

For a specific way of performing the dimension reduction processing on the image feature vector, no particular limitation is imposed in this specification, and for a specific implementation process of the dimension reduction processing, please refer to related art, which is not described herein again.

In practical applications, those skilled in the art may select different implementations to perform the dimension reduction processing on the image feature vector according to requirements. For example, the image feature vector may be subjected to dimensionality reduction by a posing operation or linear transformation.

In one embodiment shown, the feature dimensions of the generated image feature vector may be reduced to be the same as the feature dimensions of the generated text feature vector; in implementation, the generated image feature vector may be subjected to a dimension reduction process to obtain an image feature vector having the same feature dimension as the text feature vector.

For example, before the generated image feature vector is input into the Bert model for classification calculation, the image feature vector may be subjected to dimensionality reduction processing by a posing operation to obtain an image feature vector having the same feature dimension as that of the generated text feature vector.

For another example, before the generated image feature vector is input into the Bert model for classification calculation, the image feature vector may be subjected to dimensionality reduction processing through posing operation to obtain corresponding image vector features of a plurality of 2048 dimensions; and further converting the 2048-dimensional image vector features into image feature vectors with the same feature dimensions as those of the generated text feature vectors through linear transformation.

It should be noted that, in the above illustrated embodiment, for the Bert model, similar encoding processing may be performed on image feature vectors and text feature vectors with the same feature dimension, which is beneficial to improving the efficiency of performing webpage classification and identification on the target page by the Bert model.

(2) Obtaining text feature vectors

Since the image features can be extracted from the target page to generate the corresponding image feature vectors, before the text features extracted from the target page are input into the Bert model for classification calculation, the text feature vectors corresponding to the text extracted from the target page can be acquired, and then the generated image feature vectors and the acquired text feature vectors can be used as input data to be input into the Bert model together for classification calculation.

In this specification, when a text feature vector corresponding to a text extracted from the target page is obtained, a text feature may be specifically extracted from the target page through a pre-trained text feature extraction model, and a text feature vector corresponding to the extracted text feature is generated.

It should be noted that, when extracting text features from the target page and generating corresponding text feature vectors, the specific type of the text feature extraction model used is not particularly limited in this specification; in practical application, a person skilled in the art can select different types of NLP (natural language processing) models and perform pre-training according to requirements to perform feature extraction on the text in the target page and generate a text feature vector corresponding to the extracted text features.

In one embodiment shown, the NLP model may be a Bert model; during implementation, the text extracted from the target page may be input into the Bert model, the embedding layer of the Bert model performs embedding processing, and a text feature vector corresponding to a text character in the text output by the embedding layer of the Bert model may be obtained.

For example, a text extracted from a merchant page may be input into a Bert model, and an embedding layer of the Bert model performs embedding processing, specifically, the Bert model may perform segmentation of the text, perform word embedding (token embedding), sentence embedding (segment embedding), and position embedding (position embedding) on each text character (token) segmented from the text, and may acquire a text feature vector corresponding to the text output by the embedding layer of the Bert model.

It should be noted that, in the above illustrated embodiment, the text extracted from the target page is input into the Bert model for embedding processing, and a text feature vector corresponding to a text character in the text can be obtained without introducing other NLP models.

In this specification, when extracting text features from the target page, text information corresponding to the target page may be specifically acquired, and text features corresponding to the target page may be extracted from the acquired text information; or preprocessing the acquired text information corresponding to the target page, and then extracting text features corresponding to the target page from the preprocessed text.

The specific implementation manner of acquiring the text information corresponding to the target page is not particularly limited in this specification; in practical application, a person skilled in the art can select different implementation manners to obtain text information corresponding to the target page according to requirements.

For example, when extracting text features from the target page, a crawler program may be used to collect texts loaded in a merchant page to be classified and identified, and then text features corresponding to the merchant page may be extracted from the collected texts.

The specific mode of preprocessing the text information is not particularly limited in this specification; for example, a person skilled in the art may remove HTML tag information in the collected text as needed to obtain a text related to the page content of the target page; for another example, information that the semantic relevance of punctuation marks, stop words, and the like in the text is not high can be filtered out.

Before the text feature vector and the image feature vector are input into the Bert model as input data and are classified and calculated, in order to distinguish the text feature vector and the image feature vector in the input data, when sentence embedding (segment embedding) processing is performed, the text feature vector and the image feature vector can be distinguished by different labels.

For example, in one implementation, when the sentence embedding process is performed, the text feature vector may be identified by a and the image feature vector may be identified by B.

(3) Performing classification calculation on the Bert model

In this specification, after the image features are extracted from the target page and the corresponding image feature vector is generated, and the text features are extracted from the target page and the corresponding text feature vector is generated, the generated image feature vector and the generated text feature vector may be used as input data, and may be input into the Bert model together for classification calculation, so as to obtain a classification result corresponding to the target page.

In practical application, when the Bert model is trained, a plurality of classification results corresponding to target pages to be classified and identified can be preset; in the process of predicting the classification result of the target page by using the Bert model, an image feature vector generated based on image features extracted from the target page and a text feature vector generated based on text features extracted from the target page may be used as input data, a pre-trained Bert model is input for classification calculation, and the classification result with the maximum calculated probability value is determined as the classification result corresponding to the target web page.

For example, there are N business categories preset by the user, namely category 1, category 2, … … and category N; when performing merchant industry identification on a merchant webpage, the generated image feature vector corresponding to the image feature extracted from the merchant page and the acquired text feature vector corresponding to the text feature extracted from the merchant page may be used as input data, and input into the Bert model together for classification calculation, so that a merchant industry classification result corresponding to the merchant page may be obtained as classification 1.

It should be noted that, regarding the specific type of the classification result output by the Bert model and the total number of the classification results, a person skilled in the art may flexibly configure the classification result according to the requirement, and the description is not limited herein; for example, in one implementation, the Bert model may output 6 industry classification results corresponding to merchant pages, which are respectively clothing, food, digital, fresh, medicine, and others.

In practical application, the obtained text feature vector may be in a matrix form, and the generated image feature vector may also be in a matrix form; before inputting the text feature vector and the image feature vector into the Bert model for classification calculation, the image feature vector and the text feature vector may be spliced to obtain a matrix including the image feature vector and the text feature vector, that is, a multi-modal vector corresponding to the image feature vector and the text feature vector is generated.

In one embodiment shown, the image feature vectors and the text feature vectors can be input into the Bert model in the form of multi-modal vectors for classification calculation; when the method is implemented, the generated image feature vector and the obtained text feature vector can be spliced to generate a multi-modal vector, and then the multi-modal vector obtained by splicing is used as input data and input into the Bert model for classification calculation.

For example, based on image features extracted from a merchant page, a corresponding image feature vector is generated as a matrix 1, and a text feature vector corresponding to a text extracted from the merchant page is acquired as a matrix 2, and the matrix 1 and the matrix 2 can be spliced to obtain a matrix 3, that is, a multi-modal vector corresponding to the image feature vector and the text feature vector is generated as a matrix 3; further, the generated multi-modal vector can be used as input data and input into the Bert model for classification calculation.

In practical application, when the multi-modal vector is used as input data and is input into the Bert model for classification calculation, the Bert model may perform coding processing on the multi-modal vector first and then perform classification calculation; when the method is implemented, the multi-modal vector is used as input data, the multi-modal vector is input into the Bert model, the coding layer of the Bert model performs coding processing, the coding processing result for the multi-modal vector output by the coding layer of the Bert model is input into the Bert model continuously, and the classification layer of the Bert model performs classification calculation to obtain a classification result corresponding to the target page output by the classification layer of the Bert model.

For example, the generated multi-modal vector is a matrix 3, and the matrix 3 may be first used as input data, and input to the coding layer of the Bert model for coding, so as to obtain a coding processing result for the matrix 3, which is output by the coding layer of the Bert model, as a matrix 3'; and continuously inputting the matrix 3' obtained by splicing into the classification layer of the Bert model for classification calculation, and obtaining a classification result which is output by the classification layer of the Bert model and corresponds to the merchant page as classification 1.

In practical application, when the Bert model is trained, supervised training can be performed on a classification function of the Bert model; in implementation, a preset number of training samples may be obtained first, where the training samples may include a sample page and an actual classification result corresponding to the sample page; and carrying out supervised training on the classification function of the Bert model according to a preset optimization target based on the training sample.

Wherein the optimization objective may include: and aiming at any training sample in the training samples, matching the classification result with the highest matching degree of the sample page with the actual classification result corresponding to the sample page. In order to implement supervised training of the classification function of the Bert model according to a preset optimization target, when the classification function of the Bert model is supervised trained based on the training samples, whether the classification function reaches the optimization target or not is determined by judging whether the cross entropy loss function corresponding to the classification function is converged or not, that is, whether the classification function is trained or not is determined.

For example, if the cross entropy loss function converges, it may be determined that the training of the classification function of the Bert model is completed; if the cross entropy loss function is not converged, it can be determined that the classification function of the Bert model is not trained, so that supervised training can be continuously performed on the classification function.

According to the technical scheme, the image feature vectors corresponding to the image features extracted from the target page and the text feature vectors corresponding to the texts extracted from the target page are used as input data, and the pre-trained Bert model is input for classification calculation, so that for the Bert model, the input data can be expanded from a single text feature vector to multi-modal input data comprising the image feature vectors and the text feature vectors, the text features and the image features in the target page can be combined for classification calculation, and the accuracy of classification calculation of the Bert model is improved.

In order to enable those skilled in the art to better understand the technical solution in the embodiment of the present disclosure, the following embodiment exemplarily describes the webpage classification and identification method by taking an example of extracting image feature vectors from a target page by using a ResNet model, extracting text feature vectors from a target page by using a Bert model, and performing classification calculation on input data by using the Bert model.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a web page classification recognition method according to an exemplary embodiment.

As shown in fig. 3, when performing webpage classification and identification on a target page, text and images may be extracted from the target page.

For example, images loaded in a merchant page to be classified and identified may be collected by a crawler program, or the merchant page may be rendered based on a headset Browser technology to generate a rendered image corresponding to the merchant page; and acquiring the text loaded in the merchant page through a crawler program.

Further, the text extracted from the target page may be input into the Bert model, and the embedded layer of the Bert model performs embedding processing to obtain a text feature vector output by the embedded layer of the Bert model and corresponding to a text character in the text; and extracting image features from the image of the target page based on a ResNet model, and generating corresponding image feature vectors based on the image features.

For example, a text extracted from a merchant page may be input into a Bert model, and an embedding layer of the Bert model performs embedding processing to obtain a text feature vector corresponding to the text and output by the embedding layer of the Bert model

And

and, the image extracted from the merchant page may be input into a ResNet model, image features may be extracted from the merchant page, and an image feature vector corresponding to the extracted image features may be generated. If the images collected by the crawler program are input into a ResNet model, the image feature vectors output by the ResNet model and corresponding to the images loaded in the merchant page can be obtained

And

if the rendered image of the merchant page is input into a ResNet model, the image feature vector output by the ResNet model and corresponding to the image loaded in the merchant page can be obtained

And

and image feature vectors corresponding to the frame information of the merchant pages

Further, after the text feature vector and the image feature vector are generated, the generated image feature vector and the acquired text feature vector may be spliced to generate a multi-modal vector.

For example, the generated text feature vector is

The generated image feature vector is

The two can be spliced to generate a corresponding multi-modal vector

Further, the multi-modal vector obtained by splicing may be used as input data, the Bert model is input, the coding layer of the Bert model is used for coding, and a coding processing result for the multi-modal vector output by the coding layer of the Bert model is obtained; and continuously inputting the coding processing result into the classification layer of the Bert model for classification calculation to obtain a classification result which is output by the classification layer of the Bert model and corresponds to the target page.

For example, the concatenated multimodal vectors can be combined

Inputting the Bert model as input data, and carrying out coding processing by the coding layer of the Bert model to obtain a pin output by the coding layer of the Bert modelFor multi-modal vectors

The result of the encoding process of

Further, the result of the encoding process may be encoded

And continuously inputting the classification layer of the Bert model for classification calculation to obtain a classification result output by the classification layer of the Bert model, namely obtaining the industry classification corresponding to the merchant page.

Corresponding to the embodiment of the webpage classification and identification method, the specification also provides an embodiment of a webpage classification and identification device.

Referring to fig. 4, fig. 4 is a hardware structure diagram of an electronic device where a web page classification recognition apparatus is located according to an exemplary embodiment. At the hardware level, the device includes a processor 402, an internal bus 404, a network interface 406, a memory 408, and a non-volatile memory 410, although it may include hardware required for other services. One or more embodiments of the present description may be implemented in software, such as by processor 402 reading corresponding computer programs from non-volatile storage 410 into memory 408 and then executing. Of course, besides software implementation, the one or more embodiments in this specification do not exclude other implementations, such as logic devices or combinations of software and hardware, and so on, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.

Referring to fig. 5, fig. 5 is a block diagram illustrating an exemplary embodiment of a web page classification recognition apparatus. The webpage classification and identification device can be applied to the electronic equipment shown in fig. 4 to realize the technical scheme of the specification. Wherein, the webpage classification identifying device may include:

an image processing unit 501, configured to generate a corresponding image feature vector based on an image feature extracted from a target page;

a text processing unit 502, configured to obtain a text feature vector corresponding to a text extracted from the target page;

and the classification unit 503 is configured to input the pre-trained Bert model to perform classification calculation by using the image feature vector and the text feature vector as input data, so as to obtain a classification result corresponding to the target page.

In this embodiment, the image processing unit 501 is specifically configured to:

In this embodiment, the image processing unit 501 is further configured to:

the image processing unit 501 is specifically configured to:

and extracting image features from the rendered image based on a ResNet model.

In this embodiment, the image processing unit 501 is further configured to:

acquiring the image loaded in the target page through a crawler program;

the image processing unit 501 is specifically configured to:

In this embodiment, the feature dimensions of the image feature vector and the text feature vector are the same;

the image processing unit 501 is further configured to:

In this embodiment, the text processing unit 502 is specifically configured to:

In this embodiment, the classifying unit 503 is specifically configured to:

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are only illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims

1. A webpage classification identification method is characterized by comprising the following steps:

2. The method of claim 1, wherein generating a corresponding image feature vector based on the image features extracted from the target page comprises:

3. The method according to claim 2, wherein before extracting image features from the target page based on the ResNet model, the method further comprises:

and extracting image features from the rendered image based on a ResNet model.

4. The method according to claim 2, wherein before extracting image features from the target page based on the ResNet model, the method further comprises:

acquiring the image loaded in the target page through a crawler program;

5. The method of claim 1, wherein the image feature vector is the same as the feature dimension of the text feature vector;

6. The method according to claim 1, wherein the obtaining a text feature vector corresponding to the text extracted from the target page comprises:

7. The method according to claim 1, wherein the inputting the image feature vector and the text feature vector as input data into a pre-trained Bert model for classification calculation comprises:

8. The method of claim 7, wherein inputting the multi-modal vector as the input data into a pre-trained Bert model for classification calculation comprises:

9. An apparatus for classifying and identifying web pages, the apparatus comprising:

10. An electronic device is characterized by comprising a communication interface, a processor, a memory and a bus, wherein the communication interface, the processor and the memory are connected with each other through the bus;

the memory stores machine-readable instructions, and the processor executes the method of any one of claims 1 to 8 by calling the machine-readable instructions.

11. A machine-readable storage medium having stored thereon machine-readable instructions which, when invoked and executed by a processor, carry out the method of any of claims 1 to 8.