CN113486175A

CN113486175A - Text classification method, text classification device, computer equipment and storage medium

Info

Publication number: CN113486175A
Application number: CN202110776201.0A
Authority: CN
Inventors: 吴晓东
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2021-10-08
Anticipated expiration: 2041-07-08
Also published as: CN113486175B

Abstract

The application is applicable to the technical field of natural language processing, and discloses a text classification method, a text classification device, computer equipment and a storage medium. The text classification method comprises the steps of extracting text characteristics of a character sequence obtained by dividing a target text to obtain a first matrix, wherein the first matrix is used for representing the text characteristics of the target text; inputting the first matrix and a second matrix obtained based on label information of a target text into a label attention network for feature fusion to obtain a third matrix; and inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text. According to the text classification method and device, the text feature vectors and the label features of the target text are fused, when the fused features are used for text classification, the weight of the actual content of the target text is considered, the weight of the same part or the weight of the different part between the target text and other texts is also considered, and the accuracy of text classification is improved.

Description

Text classification method, text classification device, computer equipment and storage medium

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a text classification method, a text classification device, a computer device, and a storage medium.

Background

Text classification is the most basic task in the field of Natural Language Processing (NLP), the accuracy of text classification is one of the important criteria of a text classification method, and improving the accuracy of text classification can be realized by the steps of character segmentation, data cleaning, feature extraction, model establishment, corpus training and the like. At present, text classification is usually performed based on a Convolutional Neural Network model, a Convolutional Neural Network (CNN) is a feed-forward Neural Network, artificial neurons in the CNN can respond to peripheral units in a part of coverage range, image processing has excellent performance, and when the Convolutional Neural Network is applied to text classification, input texts need to be classified after text feature extraction.

The existing text classification algorithm only performs feature coding on characters or words in an input text, and then classifies the text based on feature vectors obtained by the feature coding, so that the quality of feature extraction of the input text is low due to single text information, and the overall accuracy of text classification is influenced.

Disclosure of Invention

The embodiment of the application provides a text classification method, a text classification device, computer equipment and a storage medium, and aims to solve the problem that when the existing text classification method is used for extracting the features of a target text, the text classification accuracy is low due to the fact that the considered text information is single and the text feature extraction quality is low.

In a first aspect, an embodiment of the present application provides a text classification method, including:

and performing text feature extraction on a character sequence obtained by dividing the target text to obtain a first matrix, wherein the first matrix is used for representing the text features of the target text.

And inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix.

And inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text.

In a second aspect, an embodiment of the present application provides a text classification apparatus, including:

the feature extraction module is used for performing text feature extraction on a character sequence obtained by dividing a target text to obtain a first matrix, and the first matrix is used for representing text features of the target text.

And the feature fusion module is used for inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix.

And the text classification module is used for inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text.

In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the text classification method when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the text classification method.

The text classification method, the text classification device, the computer equipment and the storage medium provided by the application perform text feature extraction on a character sequence obtained by dividing a target text to obtain a first matrix, wherein the first matrix is used for representing text features of the target text; inputting the first matrix and a second matrix obtained based on label information of a target text into a label attention network for feature fusion to obtain a third matrix; and inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text. The text feature description is the association among the words in the target text and is obtained by feature extraction based on the actual content of the target text, namely the text feature is related to the actual content of the target text, the label feature is the vectorization of the label information of the target text, and the label of the target text is used for indicating the same part or different parts between the target text and other texts, so that the text feature vector of the target text and the label feature are fused, when the fused feature is used for text classification, the weight of the actual content of the target text is considered, the weight of the same part or different parts between the target text and other texts is also considered, and the accuracy of the text classification is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a diagram of an application environment of a text classification method according to an embodiment of the present application;

FIG. 2 is a flow chart of an implementation of a text classification method in an embodiment of the present application;

FIG. 3 is a flowchart of step S10 of the text classification method in an embodiment of the present application;

FIG. 4 is a flowchart of step S20 of the text classification method in an embodiment of the present application;

FIG. 5 is a schematic diagram of a tag attention network structure of a text classification method in an embodiment of the present application;

FIG. 6 is a flowchart of step S30 of the text classification method in an embodiment of the present application;

FIG. 7 is a functional block diagram of a text classification apparatus according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The text classification method provided by this embodiment can be applied to an application environment shown in fig. 1, where a client communicates with a server, where the client includes but is not limited to various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server can be implemented by an independent server or a server cluster formed by multiple servers.

The text classification method provided by the embodiment can be executed on the server. For example, a user sends a target text to be classified to a server through a client, the server executes the text classification method provided by the embodiment based on the target text to be classified, so as to obtain a classification result for classifying the target text, and finally, the classification result is sent to the client.

In some scenarios other than fig. 1, the client may also execute the text classification method, obtain a classification result of the target text by executing the text classification method provided in this embodiment directly according to the determined target text, and then send the classification result of the target text to the server for storage.

In an embodiment, as shown in fig. 2, a text classification method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps S10-S30:

s10: and performing text feature extraction on a character sequence obtained by dividing the target text to obtain a first matrix, wherein the first matrix is used for representing the text features of the target text.

At S10, the target text is chinese text including chinese characters and punctuation marks. The character sequence is a character set formed by separating Chinese characters and punctuation marks in a target text into independent characters and then separating the characters by spaces or other marks. In the natural language processing task, how a word is represented in a computer needs to be considered first, and generally, there are two ways of representation: discrete representations and distributed representations. Traditional rule-based or statistical-based natural language processing methods treat words as an atomic symbol, called discrete representation, which represents each word as a long vector. The dimension of the vector is the size of the word list, only one dimension in the vector has a value of 1, and the other dimensions are 0, and the dimension represents the current word. For example: the vector dispersion of the word "apple" is expressed as: [0, 0, 0,1, 0, 0, 0, 0, 0], discrete representation is equivalent to assigning an id to each word, but this representation does not show the relationship between words. The distributed representation is to represent words into a continuous dense vector with a fixed length, so that the similarity relation between words can be embodied in the corresponding word vector, and the existing word vector generation methods include word vector generation based on word2Vec, word vector generation based on ELMO, word vector generation based on BERT and the like. In this embodiment, the characters in the character sequence are represented in a distributed manner, and because the character sequence is obtained based on the content of the target text, the association between words in the target text can also be embodied by word vectors represented in a distributed manner by the characters, and all word vectors generated according to the characters in the character sequence form a first matrix, so that the first matrix represents the characteristic information of the target text.

In the text classification task, word segmentation is the most basic step for a target text, English is in word units, words are separated by spaces, Chinese is in word units, and all words in a sentence can be described with one meaning after being connected. For example, the English sentence I am a student, in Chinese, is: "i am a student" the computer can know that student is a word simply by spacing, but it is not easy to understand that two words "learn" and "give" together represent a word. For the sentence "I am a student," the result of performing the word segmentation is: "i _ is _ a _ student". In this embodiment, the word segmentation of the target text is different from a common word segmentation process, and the word segmentation of the target text means that the chinese characters and punctuations in the target text are split into independent characters, for example, for a sentence "i is a student", the word segmentation results in that: "i _ is _ one _ student _ raw", and the punctuation mark in the target text is also a character to be processed, the characters are independent of each other, all independent characters form a character sequence, the arrangement order of the characters in the character sequence is the order of the characters in the target text, and the length of the character sequence is the number of the characters in the target text.

S20: and inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix.

In S20, the label information of the target text is single-label multi-classification information, that is, based on a certain feature of the target text, the target text is classified into different categories according to the difference of the feature of different target texts, for example, if the label is a color, the target text can be specifically classified into blue, yellow, red, and other categories according to the feature of the color. And obtaining a second matrix based on the label information of the target text, determining the size of the matrix through the preset label classification number and the dimensionality of the vector embedded in the BERT model, further processing the matrix to obtain the second matrix, and obtaining the second matrix based on the label information of the target text, so that the second matrix represents the label characteristics of the target text.

The existing text classification algorithm only performs feature coding on words in an input text and performs text classification, label information of the text is not considered, and feature association between the input text and the label information is ignored. In this embodiment, a first matrix and a second matrix are input into a tag Attention network, and text features and tag features of a target text are fused to obtain a third matrix, where the tag Attention network is obtained based on an Attention Mechanism (Attention Mechanism) in a neural network, the Attention Mechanism is derived from research on human vision, in cognitive science, due to a bottleneck of information processing, a human selectively pays Attention to a part of all information while ignoring other visible information, for example, when reading, only a small number of words to be read are generally paid Attention to and processed. In the embodiment, the tag attention network constructs a new attention mechanism, and fuses the text features of the target text and the tag features of the target text, so that the feature association between the target text content and the tag information is enhanced, and the quality of feature extraction on the target text is improved.

S30: and inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text.

In S30, the classification convolutional network is obtained based on a convolutional neural network, where the convolutional neural network includes a plurality of convolutional layers, pooling layers, full-link layers, and the like, and the classification convolutional network in this embodiment adjusts the size of a convolutional kernel in a convolutional layer by connecting the plurality of convolutional layers and pooling layers, performs feature extraction on the third matrix, and then inputs the third matrix to the full-link layers for specific classification, thereby obtaining a classification result of the target text.

In the embodiment, text feature extraction is performed on a character sequence obtained by dividing a target text to obtain a first matrix, and the first matrix is used for representing text features of the target text; inputting the first matrix and a second matrix obtained based on label information of a target text into a label attention network for feature fusion to obtain a third matrix; and inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text. Therefore, the extraction of the text features and the label features of the target text is realized, the text features and the label features are fused, the quality of text feature extraction is improved, and the classification accuracy of text classification input into a classification convolutional network is improved.

Fig. 3 shows a flowchart of step S10 of the text classification method of the present application. As shown in fig. 3, as one embodiment, step S10 includes steps S11 to S13, specifically:

s11: and dividing the target text into characters to obtain a character sequence.

In S11, the character sequence includes chinese characters and punctuation marks, each character in the character sequence is independent of another, the arrangement order among the characters is the arrangement order in the target text, and the length of the character sequence is the number of the characters.

S12: and inputting the character sequence into a BERT model to obtain an initial embedded vector corresponding to each character in the character sequence.

In the NLP domain, the BERT model is based on a Transformer encoder and can be used for feature extraction. BERT (Bidirectional Encoder textual representation model) is a large-scale pre-training language model based on transformations encoders. The BERT adopts a pre-training mode and is divided into two stages: in the first stage, a double-layer bidirectional Transformer model is adopted to perform pre-training through an MLM strategy and an NSP strategy; and in the second stage, a Fine-Tuning mode is applied to downstream tasks, wherein the transform model has no length limitation problem compared with an LSTM model, has better capability for capturing context information characteristics, and can capture context information more comprehensively compared with a one-way training mode.

S13: and forming a matrix by using the initial embedded vectors corresponding to all the characters to obtain the first matrix.

In S13, after the character sequence is input into the BERT model to obtain an initial embedded vector corresponding to each character in the character sequence, all the initial embedded vectors are arranged according to the arrangement order of the corresponding character in the target text, and a first matrix is obtained according to a preset batch size, where the first matrix is a three-dimensional matrix, and the three dimensions of the first matrix are determined by the batch size, the length of the character sequence, and the dimension of the initial embedded vector, and in this embodiment, the dimension of the initial embedded vector is fixed to 768 dimensions based on the BERT model.

In the embodiment, the character sequence is obtained after the target text is divided into the characters, and then the character sequence is input into the BERT model for feature extraction to obtain the first matrix, so that the first matrix represents the text features of the target text, and the text features are vectorized to facilitate the subsequent mathematical computation.

Fig. 4 shows a flowchart of step S20 of the text classification method of the present application. As shown in fig. 4, as one embodiment, step S20 includes steps S21 to S24, specifically:

s21: and determining an initial label matrix according to the label information of the target text, and performing random initialization on the initial label matrix to obtain a second matrix.

In S21, the label information in this embodiment refers to single-label multi-classification information, for example, the feature of "color" is used as a label, and the multiple classifications are specifically classified as "red", "yellow", and "blue", and correspondingly, the multi-label multi-classification is based on the single label and is added with other labels, such as "shape", and further classified as "circular", "square", and "triangular" according to the feature of shape. The method comprises the steps of determining an initial tag matrix according to tag information of a target text, specifically, determining the size of the initial tag matrix according to the classification number of a single tag and the dimensionality of an initial embedded vector in a BERT model, and then performing random initialization on the initial tag matrix to obtain a second matrix, wherein the random initialization is to generate random numbers in the initial tag matrix according to Gaussian distribution so as to obtain the second matrix.

S22: and respectively carrying out normalization processing on the first matrix and the second matrix to obtain a first normalization matrix corresponding to the first matrix and a second normalization matrix corresponding to the second matrix.

Normalization is a common data preprocessing means in mathematical statistics, and in machine learning, normalization usually maps data of each dimension of a data vector to an interval between (0,1) or (-1,1) or maps a certain norm of the data vector to 1, and the benefits of normalization are two: firstly, the influence of data units can be eliminated, the data with units is converted into standard data without units, such as the height of an adult is 150-200cm, the weight of the adult is 50-90Kg, the height is centimeter, the weight is kilogram, the data units with different dimensions are different, so that the original data cannot be directly substituted into machine learning for processing, and the data are uniformly mapped to the interval of (0,1) through a specific method, so that the value ranges of all the data are in the same interval. Secondly, the convergence speed of the machine learning model can be improved, if normalization processing is not carried out, the input vector received by the machine learning model is assumed to only have two dimensions x1 and x2, wherein the value of x1 is 0-2000, and the value of x2 is 0-3, so that data corresponds to a very flat ellipse when gradient is carried out in gradient descent calculation, a large number of zigzag routes are easy to go in the direction perpendicular to the contour line, the iterative calculation amount is large, the iteration times are large, and the machine learning model is caused to be slow in convergence.

In this embodiment, the first matrix and the second matrix are normalized by the norm L2, and each vector in the matrices is processed separately, for example, one vector X is normalized by the norm L2 to obtain a vector X2, and the other vector Y is normalized by the norm L2 to obtain a vector Y2, at this time, the euclidean distances and the cosine similarities of X2 and Y2 are equivalent, so after normalization by the norm L2, the euclidean distances and the cosine similarities of a group of vectors can be equivalent. The great advantage is that after the Euclidean distance of a group of vectors normalized by L2 norm is calculated, the cosine similarity is calculated, and the similarity can be directly calculated in O (1) time according to a formula. Furthermore, in some machine learning processing packages, only the euclidean distance calculation has no cosine similarity calculation, such as the Kmeans clustering package of sklern, which can only process the data clustering of the euclidean distance calculation. In the NLP field, the similarity of many words or documents is defined as the cosine similarity of data vectors, and if the Kmeans clustering package of sklern is called directly, clustering cannot be performed. Therefore, the word vectors of the word objects or the text vectors corresponding to the documents need to be normalized by the norm of L2, and because the euclidean distance and the cosine similarity after the normalization processing by the norm of L2 are equivalent, the clustering processing can be performed by using Kmeans of sklern.

In this embodiment, the first matrix and the second matrix are respectively normalized by a norm of L2 to obtain a first normalized matrix corresponding to the first matrix and a second normalized matrix corresponding to the second matrix, and after normalization, the dimensions of the matrices are not changed.

S23: and carrying out matrix multiplication on the first normalization matrix and the second normalization matrix to obtain a preprocessing matrix.

For two-dimensional matrices, matrix multiplication can be performed only when the number of columns of a first matrix is the same as the number of rows of a second matrix, and similarly, in this embodiment, matrix multiplication is performed on a first normalization matrix and a second normalization matrix, which is defined as the first normalization matrix multiplying the second normalization matrix, that is, the first normalization matrix is in front of the second normalization matrix during multiplication, and the second normalization matrix is behind the first normalization matrix, so that a preprocessing matrix is obtained, the preprocessing matrix is a three-dimensional matrix, and the three dimensions of the preprocessing matrix are respectively determined by batch size, character sequence length and single-label classification number.

S24: and performing feature extraction on the preprocessing matrix to obtain an attention matrix, and performing matrix dot multiplication on the attention matrix and the first matrix to obtain a third matrix.

In S24, the extracting features of the preprocessing matrix to obtain an attention matrix includes: performing convolution pooling on the preprocessing matrix to obtain a matrix to be activated; and processing the matrix to be activated by using an activation function to obtain the attention matrix.

Performing convolution and pooling calculations on the matrix is a common operation in the field of machine learning, and is not described herein again. By adopting convolution cores with different sizes to carry out convolution calculation on the preprocessing matrix, the matrix characteristics, namely the text characteristics of the target text, can be extracted. After further text features are obtained through convolution, parameters are reduced while main features are reserved through pooling, namely the latitude is reduced, so that the calculated amount is reduced, overfitting is prevented, and common pooling has maximum pooling, mean pooling and the like. The activation function is used for increasing the nonlinear expression capability of the model, is used in linear input and output, introduces nonlinear factors, enables the neural network to be applied to the nonlinear model, and commonly used activation functions comprise a Sigmoid function, a Tanh function and a softmax function. In this embodiment, a softmax function is used as an activation function, and softmax is commonly used in the classification processing, and the result output by the convolution pooling is mapped into a (0,1) interval. After convolution and pooling are carried out on the preprocessing matrix, a matrix to be activated is obtained, an attention matrix is obtained after the matrix to be activated is processed through an activation function, and matrix dot multiplication is carried out on the attention matrix and the first matrix to obtain a third matrix. The preprocessing matrix is convolved and pooled to extract the characteristics of each classification, correspondingly, the dimension of the label classification number in the three-dimensional matrix is changed into 1, and the dimension of the matrix is not changed after the label classification number is processed by an activation function, so that the three dimensions of the attention matrix are respectively batch size, character sequence length and 1.

Fig. 5 shows a schematic diagram of a tag attention network structure of the text classification method of the present application. As shown in fig. 5, as an example, the first input layer and the second input layer of the tag attention network are respectively used for inputting the first matrix and the second matrix, and the first matrix and the second matrix are normalized by the normalization layer of the tag attention network. Specifically, L2 norm normalization processing may be performed to obtain a first normalization matrix and a second normalization matrix corresponding to the first matrix and the second matrix, respectively, and the first fusion layer of the tag attention network is used to perform first matrix fusion on the first normalization matrix and the second normalization matrix, that is, the first normalization matrix and the second normalization matrix are subjected to matrix multiplication to obtain a preprocessing matrix. And finally, performing second matrix fusion on the attention matrix and the first matrix of the first input layer by a second fusion layer, namely performing matrix dot multiplication on the attention matrix and the first matrix of the first input layer to obtain a third matrix of the output layer.

In this embodiment, a second matrix is obtained based on the tag information of the target text, the second matrix represents the tag information of the target text, and further, the first matrix and the second matrix are input to a tag attention network for further processing, and text features and tag features of the target text are fused in matrix multiplication, normalization, convolution, pooling and other ways, so that the quality of text feature extraction is improved.

Fig. 6 shows a flowchart of step S30 of the text classification method of the present application. As shown in fig. 6, as one embodiment, step S30 includes steps S31 to S32, specifically:

s31: and carrying out sampling convolution on the third matrix according to a preset receptive field in the classification convolution network to obtain a fourth matrix.

In the convolutional neural network, a Receptive Field (Receptive Field) is the area size of a pixel point on a feature map (feature map) output by each layer of the convolutional neural network, which is mapped on an original image, wherein the feature map is usually expressed in a matrix form. The image content outside the neuron receptive field does not affect the value of the neuron, so it is necessary to ensure that the neuronal receptive field covers all relevant image areas. In application engineering, the size of the receptive field of the network is controlled by adjusting parameters such as the depth of the network and a convolution kernel of convolution.

In S31, the performing sampling convolution on the third matrix according to the preset receptive field in the classification convolutional network to obtain a fourth matrix includes: performing sampling convolution on the third matrix through N sampling convolution layers in the classification convolution network to obtain a fourth matrix; wherein N is an integer greater than 1, and each of the sampled convolutional layers has a different receptive field. Specifically, the sampling convolution layer performs sampling convolution on the third matrix, including sequentially performing convolution, batch normalization and maximum pooling on the third matrix. In the N sampling convolution layers, the convolution kernel size of each sampling convolution layer is configured to be sequentially decreased according to the order of performing sampling convolution on the third matrix.

Batch Normalization (Batch Normalization) is a deep neural network training method, which not only accelerates the convergence rate of the model, but also alleviates the problem of gradient dispersion in the deep network to a certain extent, thereby enabling the deep network model to be trained more easily and stably. Batch normalization, as the name implies, normalizes each batch of data by averaging and normalizing the variance of the data { x1, x 2.., xn } of a certain batch of data in the training. In particular, before batch normalization occurs, the normalization operation is typically performed at the data input level, and batch normalization may be either input data or output at some level in the middle of the neural network. The common pooling methods are max-pooling and mean-pooling, and the error of feature extraction by deep learning mainly comes from two aspects: the variance of the estimated value is increased due to the limited size of the neighborhood, and the deviation of the estimated mean value is caused by parameter errors of the convolutional layer, so that the first error can be reduced by mean pooling, and the second error can be reduced by maximum pooling.

Specifically, in this embodiment, the number of sampling convolutional layers is N, where N is an integer greater than 1, and the sampling convolutional layers perform convolution calculation on the third matrix, and then perform batch normalization and maximum pooling to obtain an intermediate matrix, where the intermediate matrix may be input to the full-link layer as the fourth matrix for classification processing, and may also be input to the next sampling convolutional layer as an intermediate result for performing convolution, batch normalization and maximum pooling again. That is, the process of sequentially performing convolution, batch normalization and maximum pooling on the third matrix may be repeated several times, and since the convolution kernel size of each sampling convolution layer in the N sampling convolution layers is configured to be sequentially decreased according to the sequence of performing sampling convolution on the third matrix, the preset receptive field is continuously decreased. The output matrix of any sampling convolution layer can be used as an input matrix to be input into the sampling convolution layer again for sampling convolution, namely, the convolution kernel size is kept unchanged, and the convolution, batch normalization and maximum pooling processing are repeatedly performed. For example, if the convolution kernel size for convolution calculation in the first sample convolutional layer is 7 × 7, the convolution kernel size of 7 × 7 is kept unchanged, the convolution, batch normalization and maximum pooling are repeatedly performed 3 times to obtain an intermediate matrix, the intermediate matrix is input to the second sample convolutional layer, the convolution kernel size is 5 × 5 at this time, the convolution, batch normalization and maximum pooling are repeatedly performed 2 times to obtain another intermediate matrix, the intermediate matrix is input to the third sample convolutional layer, the convolution kernel size is 3 × 3 at this time, the convolution, batch normalization and maximum pooling are repeatedly performed 2 times to obtain a fourth matrix. It can be seen that the size of the convolution kernel varies by 7 × 7, 5 × 5, 3 × 3 in sequence, and the size of the convolution kernel decreases as a whole, which corresponds to the preset increasingly smaller receptive field.

S32: and inputting the fourth matrix into a full connection layer of the classification convolutional network for classification processing to obtain a classification result of the target text.

In the convolutional neural network, the function of the fully-connected layer is to extract features, and the function of the fully-connected layer is to classify. In the convolutional neural network structure, after passing through a plurality of convolutional layers and pooling layers, 1 or more than 1 fully-connected layer is connected, each neuron in the fully-connected layer is fully connected with all neurons in the previous layer, and local information with category distinction in the convolutional layers or the pooling layers is integrated. In an embodiment, the classification result of the target text obtained by performing classification processing on the full-link layer is represented in a form of a two-dimensional matrix, one dimension of the matrix represents the number of label categories, the other dimension represents the batch size, wherein the value in the matrix represents the probability of the corresponding category, and the category with the high probability value is taken as the classification result of the target text.

In the embodiment, the classification layer structure based on the pre-training model BERT model is improved from the existing single-layer full connection to a character-level classification convolutional network, wherein the classification layer structure comprises a plurality of convolution pooling layers, the receptive field of the whole classification convolutional network is kept to be continuously reduced, the classification performance of a classifier is improved, and the integral accuracy of classifying the target text is improved.

The method comprises the steps of performing text feature extraction on a character sequence obtained by dividing a target text to obtain a first matrix, wherein the first matrix is used for representing text features of the target text; inputting the first matrix and a second matrix obtained based on label information of a target text into a label attention network for feature fusion to obtain a third matrix; and inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text. The text feature description is the association among the words in the target text and is obtained by feature extraction based on the actual content of the target text, namely the text feature is related to the actual content of the target text, the label feature is the vectorization of the label information of the target text, and the label of the target text is used for indicating the same part or different parts between the target text and other texts, so that the text feature vector of the target text and the label feature are fused, when the fused feature is used for text classification, the weight of the actual content of the target text is considered, the weight of the same part or different parts between the target text and other texts is also considered, and the accuracy of the text classification is improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

In an embodiment, a text classification device is provided, and the text classification device corresponds to the text classification method in the above embodiments one to one. As shown in fig. 7, the text classification apparatus includes a feature extraction module 10, a feature fusion module 20, and a text classification module 30, and each functional module is described in detail as follows:

the feature extraction module 10 is configured to perform text feature extraction on a character sequence obtained by dividing a target text to obtain a first matrix, where the first matrix is used to represent text features of the target text.

And the feature fusion module 20 is configured to input the first matrix and a second matrix obtained based on the tag information of the target text into a tag attention network for feature fusion, so as to obtain a third matrix.

And the text classification module 30 is configured to input the third matrix into a classification convolutional network for classification processing, so as to obtain a classification result of the target text.

For the specific definition of the text classification device, reference may be made to the above definition of the text classification method, which is not described herein again. The modules in the text classification device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a client or a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a readable storage medium and an internal memory. The readable storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the readable storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of text classification.

In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the text classification method in the above embodiments is implemented.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the method for text classification in the above-mentioned embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method of text classification, comprising:

performing text feature extraction on a character sequence obtained by dividing a target text to obtain a first matrix, wherein the first matrix is used for representing text features of the target text;

inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix;

2. The text classification method according to claim 1, wherein the text feature extraction of the character sequence obtained by dividing the target text to obtain the first matrix comprises:

dividing the target text into characters to obtain a character sequence;

inputting the character sequence into a BERT model to obtain an initial embedded vector corresponding to each character in the character sequence;

and forming a matrix by using the initial embedded vectors corresponding to all the characters to obtain the first matrix.

3. The text classification method according to claim 1, wherein the step of inputting the first matrix and a second matrix obtained based on the tag information of the target text into a tag attention network for feature fusion to obtain a third matrix comprises:

determining an initial label matrix according to the label information of the target text, and performing random initialization on the initial label matrix to obtain a second matrix;

respectively carrying out normalization processing on the first matrix and the second matrix to obtain a first normalization matrix corresponding to the first matrix and a second normalization matrix corresponding to the second matrix;

performing matrix multiplication on the first normalization matrix and the second normalization matrix to obtain a preprocessing matrix;

and performing feature extraction on the preprocessing matrix to obtain an attention matrix, and performing matrix dot multiplication on the attention matrix and the first matrix to obtain a third matrix.

4. The method of text classification according to claim 3, wherein said extracting features from said pre-processing matrix to obtain an attention matrix comprises:

performing convolution pooling on the preprocessing matrix to obtain a matrix to be activated;

and processing the matrix to be activated by using an activation function to obtain the attention matrix.

5. The text classification method according to claim 1, wherein the inputting the third matrix into a classification convolutional network for classification to obtain a classification result of the target text, comprises:

sampling convolution is carried out on the third matrix according to a preset receptive field in the classification convolution network, and a fourth matrix is obtained;

and inputting the fourth matrix into a full connection layer of the classification convolutional network for classification processing to obtain a classification result of the target text.

6. The text classification method according to claim 5, wherein the performing sampling convolution on the third matrix according to a preset receptive field in the classification convolutional network to obtain a fourth matrix comprises:

performing sampling convolution on the third matrix through N sampling convolution layers in the classification convolution network to obtain a fourth matrix; wherein N is an integer greater than 1, and each of the sampled convolutional layers has a different receptive field.

7. The text classification method according to claim 6, characterized in that, in the N sampled convolutional layers, the convolutional kernel size of each of the sampled convolutional layers is configured to be sequentially decreased according to the precedence order of the sampled convolutions on the third matrix.

8. A text classification apparatus, characterized in that the text classification apparatus comprises:

the character extraction module is used for extracting the text characteristics of a character sequence obtained by dividing a target text to obtain a first matrix, and the first matrix is used for representing the text characteristics of the target text;

the feature fusion module is used for inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix;

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the text classification method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the text classification method according to any one of claims 1 to 7.