CN113486175B - Text classification method, text classification device, computer device, and storage medium - Google Patents
Text classification method, text classification device, computer device, and storage medium Download PDFInfo
- Publication number
- CN113486175B CN113486175B CN202110776201.0A CN202110776201A CN113486175B CN 113486175 B CN113486175 B CN 113486175B CN 202110776201 A CN202110776201 A CN 202110776201A CN 113486175 B CN113486175 B CN 113486175B
- Authority
- CN
- China
- Prior art keywords
- matrix
- text
- classification
- target text
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 239000011159 matrix material Substances 0.000 claims abstract description 265
- 239000013598 vector Substances 0.000 claims abstract description 40
- 238000012545 processing Methods 0.000 claims abstract description 36
- 230000004927 fusion Effects 0.000 claims abstract description 19
- 238000010606 normalization Methods 0.000 claims description 45
- 238000011176 pooling Methods 0.000 claims description 32
- 238000000605 extraction Methods 0.000 claims description 28
- 238000005070 sampling Methods 0.000 claims description 21
- 238000007781 pre-processing Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 14
- 230000004913 activation Effects 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 8
- 230000007423 decrease Effects 0.000 claims description 4
- 238000003058 natural language processing Methods 0.000 abstract description 9
- 238000007500 overflow downdraw method Methods 0.000 abstract 1
- 239000010410 layer Substances 0.000 description 55
- 238000013527 convolutional neural network Methods 0.000 description 12
- 238000012549 training Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 210000002569 neuron Anatomy 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000003213 activating effect Effects 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- -1 and similarly Substances 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application is applicable to the technical field of natural language processing, and discloses a text classification method, a text classification device, computer equipment and a storage medium. The text classification method comprises the steps of extracting text features of a character sequence obtained by dividing a target text to obtain a first matrix, wherein the first matrix is used for representing the text features of the target text; inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix; and inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text. According to the text feature vector and the label feature fusion method, the text feature vector and the label feature of the target text are fused, so that when the fused feature is used for text classification, the weight of the actual content of the target text is considered, the weight of the same point or the weight of different points between the target text and other texts is considered, and the accuracy of text classification is improved.
Description
Technical Field
The present disclosure relates to the field of natural language processing technologies, and in particular, to a text classification method, a text classification device, a computer device, and a storage medium.
Background
Text classification is the most basic task in the field of natural language processing (Natural Language Processing, NLP), the accuracy of text classification is one of the important evaluation criteria of a text classification method, and the improvement of the accuracy of text classification can be realized through the steps of character classification, data cleaning, feature extraction, model building, corpus training and the like. At present, text classification is generally performed based on a convolutional neural network model, the convolutional neural network (Convolutional Neural Network, CNN) is a feedforward neural network, and artificial neurons in the CNN can respond to surrounding units in a part of coverage range, so that excellent performance is achieved in image processing, and when the convolutional neural network is applied to text classification, input text needs to be classified after text feature extraction.
The existing text classification algorithm only performs feature coding on characters or words in an input text, then classifies the text based on feature vectors obtained by the feature coding, and the considered text information is single, so that the quality of feature extraction of the input text is low, and the overall accuracy of text classification is affected.
Disclosure of Invention
The embodiment of the application provides a text classification method, a text classification device, computer equipment and a storage medium, so as to solve the problem of low text classification accuracy caused by single text information and low text feature extraction quality when the existing text classification method is used for extracting features of a target text.
In a first aspect, an embodiment of the present application provides a text classification method, including:
and extracting text features of a character sequence obtained by dividing the target text to obtain a first matrix, wherein the first matrix is used for representing the text features of the target text.
And inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix.
And inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text.
In a second aspect, an embodiment of the present application provides a text classification apparatus, including:
the character extraction module is used for extracting text characteristics of a character sequence obtained by dividing a target text to obtain a first matrix, and the first matrix is used for representing the text characteristics of the target text.
And the feature fusion module is used for inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix.
And the text classification module is used for inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text.
In a third aspect, embodiments of the present application provide a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the text classification method when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the text classification method.
According to the text classification method, the text classification device, the computer equipment and the storage medium, text feature extraction is carried out on a character sequence obtained by dividing a target text, so that a first matrix is obtained, and the first matrix is used for representing text features of the target text; inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix; and inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text. Because the text feature describes the association among the words in the target text, the text feature is obtained by feature extraction based on the actual content of the target text, namely the text feature is related to the actual content of the target text, the tag feature is vectorization of tag information of the target text, and the tag of the target text is used for indicating the same point or different points between the target text and other texts, the text feature vector of the target text is fused with the tag feature, and when the fused feature is used for text classification, the weight of the actual content of the target text is considered, the weight of the same point or the weight of the different points between the target text and other texts is considered, and the accuracy of text classification is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic view of an application environment of a text classification method according to an embodiment of the present application;
FIG. 2 is a flow chart of an implementation of a text classification method in an embodiment of the present application;
FIG. 3 is a flow chart of step S10 of a text classification method in an embodiment of the present application;
FIG. 4 is a flowchart of step S20 of a text classification method in an embodiment of the present application;
FIG. 5 is a schematic diagram of a tag attention network architecture of a text classification method according to an embodiment of the present application;
FIG. 6 is a flowchart of step S30 of a text classification method in an embodiment of the present application;
FIG. 7 is a functional block diagram of a text classification device in an embodiment of the present application;
FIG. 8 is a schematic diagram of a computer device in an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The text classification method provided in this embodiment may be applied in an application environment as shown in fig. 1, where a client communicates with a server, where the client includes, but is not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server may be implemented by an independent server or a server cluster formed by multiple servers.
The text classification method provided by the embodiment can be executed on the server side. For example, a user sends a target text to be classified to a server through a client, the server executes the text classification method provided by the embodiment based on the target text to be classified, so as to obtain a classification result of classifying the target text, and finally sends the classification result to the client.
In some scenarios other than fig. 1, the text classification method may be executed by the client, and the text classification method provided in this embodiment is executed directly according to the determined target text to obtain a classification result of the target text, and then the classification result of the target text is sent to the server for storage.
In one embodiment, as shown in fig. 2, a text classification method is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps S10-S30:
s10: and extracting text features of a character sequence obtained by dividing the target text to obtain a first matrix, wherein the first matrix is used for representing the text features of the target text.
In S10, the target text is chinese text, including kanji and punctuation marks. The character sequence is a character set formed by separating Chinese characters and punctuation marks in the target text into independent characters and separating the independent characters by spaces or other symbols. In the task of natural language processing, how words are expressed in a computer needs to be considered first, and in general, there are two expression modes: discrete representations and distributed representations. Traditional rule-based or statistical-based natural language processing methods treat words as an atomic symbol, known as discrete representations, which represent each word as a long vector. The dimension of the vector is the vocabulary size, only one dimension in the vector has a value of 1, the other dimensions are 0, and the dimension represents the current word. For example: the vector discrete representation of the term "apple" is: [0,0,0,1,0,0,0,0,0] discrete representation corresponds to assigning an id to each word, but this representation does not show word-to-word relationships. The distributed representation is to represent words as a continuous dense vector with a fixed length, so that the similarity relationship between words can be represented in the corresponding word vector, and the existing word vector generating method includes word2Vec based word vector generating, ELMO based word vector generating, BERT based word vector generating and the like. In this embodiment, a distributed representation is adopted for the characters in the character sequence, and because the character sequence is obtained based on the content of the target text, the word-word association in the target text can also be represented by word vectors of the character distributed representation, and all word vectors generated according to the characters in the character sequence form a first matrix, so that the first matrix characterizes the feature information of the target text.
In the text classification task, word segmentation is the most basic step, english is in terms of words, words are separated by spaces, chinese is in terms of words, and all words in a sentence are connected to describe a meaning. For example, english sentence I am a student, in chinese, is: "I are a student", the computer can simply know that student is a word by space, but it is not readily apparent that "student" and "student" words taken together represent a word. For the sentence "I are a student", the result of word segmentation is: "I_are_one_student". In this embodiment, the word segmentation of the target text is different from the common word segmentation process, and the word segmentation of the target text refers to splitting the kanji and punctuation marks in the target text into independent characters, for example, for the sentence "i are a student", the word segmentation results are: the method is characterized in that I_is_one_learning_generating, the punctuation marks in the target text are characters to be processed, the characters are mutually independent, all independent characters form a character sequence, the arrangement sequence of the characters in the character sequence is the sequence of the characters in the target text, and the length of the character sequence is the number of the characters in the target text.
S20: and inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix.
In S20, the tag information of the target text is single-tag multi-classification information, that is, the target text is classified into different categories according to the difference of different target texts on the basis of a certain feature of the target text, for example, the tags are color, and the target text can be specifically classified into blue, yellow, red and other categories according to the feature of the color. The second matrix is obtained based on the label information of the target text, the size of the matrix can be determined through the preset label classification number and the dimension of the embedded vector in the BERT model, the matrix is further processed to obtain the second matrix, and the second matrix is obtained based on the label information of the target text, so that the second matrix characterizes the label characteristics of the target text.
The existing text classification algorithm only performs feature coding on words in an input text and performs text classification, does not consider tag information of the text, and ignores feature association between the input text and the tag information. In this embodiment, the first matrix and the second matrix are input into a tag attention network, and the text features and the tag features of the target text are fused to obtain a third matrix, where the tag attention network is obtained based on an attention mechanism (Attention Mechanism) in a neural network, the attention mechanism is derived from research on human vision, in cognitive science, due to the bottleneck of information processing, a human selectively focuses on a part of all information, and other visible information is ignored, for example, only a small number of words to be read are focused and processed in reading. In the embodiment, the label attention network constructs a new attention mechanism, fuses the text characteristics of the target text and the label characteristics of the target text, enhances the characteristic association between the content of the target text and the label information, and improves the quality of extracting the characteristics of the target text.
S30: and inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text.
In S30, the classification convolutional network is obtained based on a convolutional neural network, where the convolutional neural network includes a plurality of convolutional layers, a pooling layer, a full-connection layer, and the like, and in this embodiment, the classification convolutional network adjusts the size of a convolutional kernel in the convolutional layer by connecting the plurality of convolutional layers and the pooling layer, performs feature extraction on a third matrix, and then inputs the third matrix to the full-connection layer for specific classification, thereby obtaining a classification result of the target text.
In the embodiment, text feature extraction is performed on a character sequence obtained by dividing a target text to obtain a first matrix, wherein the first matrix is used for representing text features of the target text; inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix; and inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text. Therefore, the extraction of the text features and the label features of the target text is realized, the text features and the label features are fused, the quality of text feature extraction is improved, and the classification accuracy of text classification in a classification convolution network is improved.
Fig. 3 shows a flow chart of the text classification method step S10 of the present application. As shown in fig. 3, step S10 includes steps S11 to S13, specifically:
s11: and dividing the target text into words to obtain a character sequence.
In S11, the character sequence comprises Chinese characters and punctuation marks, all the characters in the character sequence are mutually independent, the arrangement sequence among the characters is the arrangement sequence of the characters in the target text, and the length of the character sequence is the number of the characters.
S12: and inputting the character sequence into a BERT model to obtain an initial embedded vector corresponding to each character in the character sequence.
In the NLP field, BERT models based on a transducer encoder can be used as feature extraction. BERT (Bidirectional Encoder Representationsfrom Transformers, bi-directional encoder text representation model) is a large-scale pre-trained language model based on the transducers encoder. BERT adopts a pre-training mode, in two phases: the first stage adopts a double-layer bidirectional transducer model to pretrain through two strategies of MLM and NSP; the second stage adopts a Fine-Tuning mode to apply to a downstream task, wherein a Transformer model has no length limitation problem compared with an LSTM model, has better capability of capturing context information characteristics, and is more comprehensive than a unidirectional training mode in which a bidirectional training model captures context information, and the BERT model in the embodiment is formed by connecting a plurality of layers Transformer Encoder in series and can be 12 layers or 24 layers and the like, so that the problem is not limited.
S13: and forming a matrix by the initial embedded vectors corresponding to all the characters to obtain the first matrix.
In S13, after inputting the character sequence into the BERT model to obtain an initial embedded vector corresponding to each character in the character sequence, arranging all the initial embedded vectors according to the arrangement sequence of the corresponding characters in the target text, and obtaining a first matrix according to a preset batch size, where the first matrix is a three-dimensional matrix, and the three dimensions are respectively determined by the batch size, the character sequence length, and the initial embedded vector dimension.
In this embodiment, the target text is first divided into words to obtain a character sequence, and then the character sequence is input into the BERT model to perform feature extraction, so as to obtain a first matrix, so that the first matrix characterizes the text features of the target text, and the text features are vectorized, so that subsequent mathematical calculation is facilitated.
Fig. 4 shows a flowchart of step S20 of the text classification method of the present application. As shown in fig. 4, step S20 includes steps S21 to S24, specifically:
s21: and determining an initial tag matrix according to the tag information of the target text, and carrying out random initialization on the initial tag matrix to obtain a second matrix.
In S21, the label information in this embodiment refers to multi-classification information of a single label, for example, a feature of "color" is used as a label, and specifically, multi-classification is "red", "yellow", and "blue", and correspondingly, multi-label multi-classification is newly added with other labels, for example, "shape", based on the feature of shape, and then can be classified into "circle", "square", and "triangle". Determining an initial tag matrix according to tag information of a target text, specifically, determining the size of the initial tag matrix according to the classification number of single tags and the dimension of an initial embedded vector in a BERT model, and then carrying out random initialization on the initial tag matrix to obtain a second matrix, wherein the random initialization is to generate random numbers in the initial tag matrix according to Gaussian distribution, so as to obtain the second matrix.
S22: and respectively carrying out normalization processing on the first matrix and the second matrix to obtain a first normalization matrix corresponding to the first matrix and a second normalization matrix corresponding to the second matrix.
Normalization is a common data preprocessing means in mathematical statistics, where normalization generally maps data of each dimension of a data vector to an interval between (0, 1) or (-1, 1) or maps a certain norm of the data vector to 1, and has two benefits: firstly, the influence of data units can be eliminated, the data with units are converted into standard data without units, such as the height of an adult is 150-200cm, the weight of the adult is 50-90Kg, the height is in centimeters, the weight of the adult is in kilograms, and the data units with different dimensions are different, so that the original data cannot be directly substituted into machine learning for processing, and therefore the data are uniformly mapped to the interval (0, 1) through a specific method, and the value ranges of all the data are in the same interval. If normalization processing is not performed, the input vector accepted by the machine learning model is assumed to have only two dimensions x1 and x2, wherein the value of x1 is 0-2000, and the value of x2 is 0-3, so that the data corresponds to a very flat ellipse when gradient descent calculation is performed, a large number of zigzag routes can be easily walked in the direction vertical to the contour line, the iteration calculation amount is large, the iteration times are large, and the machine learning model is slow to converge.
In this embodiment, the first matrix and the second matrix are respectively normalized by L2 norms, and each vector in the matrix is respectively processed, for example, one vector X is normalized by L2 norms to obtain a vector X2, and the other vector Y is normalized by L2 norms to obtain a vector Y2, where the euclidean distance and cosine similarity of X2 and Y2 are equivalent, so that after normalization by L2 norms, the euclidean distance of a group of vectors and their cosine similarity can be equivalent. The great advantage is that after the Euclidean distance of a group of vectors subjected to L2 norm normalization is calculated, the cosine similarity of the vectors is calculated, and the cosine similarity can be directly calculated in the O (1) time according to a formula. Furthermore, in some machine learning processing packages, only the euclidean distance computation has no cosine similarity computation, such as the Sklearn Kmeans cluster package, which can only process data clusters for euclidean distance computation. In the NLP field, the similarity of many words or documents is defined as cosine similarity of data vectors, and if the Kmeans clustering package of Sklearn is directly called, the clustering process cannot be performed. Therefore, the L2 norm normalization processing needs to be performed on the word vector of the word object or the text vector corresponding to the document, and since the euclidean distance and cosine similarity after the L2 norm normalization processing are equivalent, the clustering processing can be performed with Kmeans of Sklearn at this time.
In this embodiment, the L2 norm normalization processing is performed on the first matrix and the second matrix, so as to obtain a first normalized matrix corresponding to the first matrix and a second normalized matrix corresponding to the second matrix, and after normalization processing, the dimensions of the matrices are not changed.
S23: and multiplying the first normalized matrix and the second normalized matrix by a matrix to obtain a preprocessing matrix.
For two-dimensional matrixes, matrix multiplication can be performed only when the number of columns of the first matrix is the same as the number of rows of the second matrix, and similarly, matrix multiplication is performed on the first normalized matrix and the second normalized matrix in the embodiment, so that the first normalized matrix is limited to be multiplied by the second normalized matrix, that is, the first normalized matrix is in front of the second normalized matrix during multiplication, then a preprocessing matrix is obtained, the preprocessing matrix is a three-dimensional matrix, and the size of three dimensions of the preprocessing matrix is determined by batch size, character sequence length and single label classification number.
S24: and performing feature extraction on the preprocessing matrix to obtain an attention matrix, and performing matrix dot multiplication on the attention matrix and the first matrix to obtain a third matrix.
In S24, the feature extraction of the preprocessing matrix to obtain an attention matrix includes: carrying out convolution pooling on the pretreatment matrix to obtain a matrix to be activated; and processing the matrix to be activated by using an activation function to obtain the attention matrix.
Convolution and pooling of matrices are common operations in the machine learning field, and are not described in detail herein. By adopting convolution check preprocessing matrixes with different sizes to carry out convolution calculation, matrix characteristics, namely text characteristics of target texts can be extracted. After further text features are obtained by convolution, the parameters are reduced while retaining the main features by pooling, i.e. the latitude is reduced, thereby reducing the amount of computation, preventing overfitting, and the common pooling has maximum pooling, mean pooling, etc. The function of the activation function is to increase the nonlinear expression capacity of the model, and the activation function is used for introducing nonlinear factors into linear input and output, so that the neural network can be applied to the nonlinear model, and common activation functions include Sigmoid function, tanh function and softmax function. In this embodiment, a softmax function is used as the activation function, and softmax is commonly used in classification processing to map the convolutionally pooled output into (0, 1) intervals. After convolution and pooling, the pretreatment matrix is subjected to convolution and pooling to obtain a matrix to be activated, the matrix to be activated is subjected to activation function processing to obtain an attention matrix, and matrix dot multiplication is performed on the attention matrix and the first matrix to obtain a third matrix. The preprocessing matrix is convolved and pooled to extract the characteristics of each classification, and accordingly, the dimension of the label classification number in the three-dimensional matrix is changed into 1, and the dimension of the matrix is not changed after the processing of the activation function, so that the three dimensions of the attention matrix are respectively batch size, character sequence length and 1.
Fig. 5 shows a schematic diagram of a tag attention network structure of the text classification method of the present application. As shown in fig. 5, as an example, the first input layer and the second input layer of the tag attention network are used for inputting a first matrix and a second matrix, respectively, and normalization processing is performed on the first matrix and the second matrix by the normalization layer of the tag attention network. Specifically, the normalization processing may be performed for the L2 norm, so as to obtain a first normalization matrix and a second normalization matrix corresponding to the first matrix and the second matrix, and perform first matrix fusion on the first normalization matrix and the second normalization matrix by using a first fusion layer of the tag attention network, that is, perform matrix multiplication operation on the first normalization matrix and the second normalization matrix, so as to obtain a preprocessing matrix. And then, sequentially convoluting and pooling the preprocessing matrix by using a convolution pooling layer and an activating layer of the tag attention network to obtain a matrix to be activated, activating the matrix to be activated by the activating layer to obtain an attention matrix, and finally, performing matrix dot-multiplying operation on the attention matrix and a first matrix of a first input layer by a second fusion layer to obtain a third matrix of an output layer.
In this embodiment, a second matrix is obtained based on the tag information of the target text, the second matrix characterizes the tag information of the target text, and further, the first matrix and the second matrix are input into the tag attention network to be further processed, and text features and tag features of the target text are fused in the modes of matrix multiplication, normalization, convolution, pooling and the like, so that the quality of text feature extraction is improved.
Fig. 6 shows a flowchart of step S30 of the text classification method of the present application. As shown in fig. 6, step S30 includes steps S31 to S32, specifically:
s31: and carrying out sampling convolution on the third matrix according to a preset receptive field in the classification convolution network to obtain a fourth matrix.
In the convolutional neural network, a Receptive Field (Receptive Field) is a region size mapped on an original image by pixel points on a feature map (feature map) output by each layer of the convolutional neural network, wherein the feature map is typically represented in a matrix form. The image content outside the neuronal receptive field does not have an effect on the value of the neuron, so it must be ensured that the receptive field of the neuron covers all relevant image areas. In the application engineering, parameters such as the depth of the network, convolution kernel of convolution and the like are adjusted to control the receptive field size of the network.
In S31, the performing sampling convolution on the third matrix according to a preset receptive field in the classification convolution network to obtain a fourth matrix, including: carrying out sampling convolution on the third matrix through N sampling convolution layers in the classification convolution network to obtain a fourth matrix; wherein N is an integer greater than 1 and each of the sampled convolutional layers has a different receptive field. Specifically, the sampling convolution layer performs sampling convolution on the third matrix, including performing convolution, batch normalization and maximum pooling processing on the third matrix in sequence. Wherein, in the N sampling convolution layers, the convolution kernel size of each sampling convolution layer is configured to decrease sequentially according to the sequence of sampling convolution on the third matrix.
Batch normalization (Batch Normalization) is a method for training a deep neural network, which not only accelerates the convergence rate of the model, but also alleviates the problem of gradient dispersion in the deep network to a certain extent, thereby enabling the training of the deep network model to be easier and more stable. Batch normalization, as the name implies, normalizes each batch of data, and normalizes the data { x1, x2,..xn } for a batch of data in training by averaging and variance. In particular, the normalization operation is typically performed at the data input layer before the batch normalization occurs, which may be the input data or some layer of output in the middle of the neural network. The usual pooling methods include max-pooling (max-pooling) and mean-pooling (mean-pooling), and the error in feature extraction by deep learning comes mainly from two aspects: firstly, the variance of the estimated value is increased due to the limited size of the neighborhood, secondly, the estimated mean value is shifted due to the parameter error of the convolution layer, the first error can be reduced by mean value pooling, and the second error can be reduced by maximum pooling.
Specifically, in this embodiment, the number of sampling convolution layers is N, where N is an integer greater than 1, and after the sampling convolution layer performs convolution calculation on the third matrix, the third matrix is subjected to batch normalization and maximum pooling to obtain an intermediate matrix, where the intermediate matrix may be input as a fourth matrix to the full connection layer for classification processing, or may be input as an intermediate result to the next sampling convolution layer for further convolution, batch normalization and maximum pooling processing. That is, the process of sequentially performing convolution, batch normalization and maximum pooling on the third matrix may be repeated several times, and since the convolution kernel size of each sampling convolution layer is configured to decrease sequentially according to the sequential order of performing sampling convolution on the third matrix among the N sampling convolution layers, the preset receptive field is continuously reduced. The output matrix of any one sampling convolution layer can be input into the sampling convolution layer again as an input matrix to carry out sampling convolution, namely, the convolution kernel size is kept unchanged, and convolution, batch normalization and maximum pooling processing are repeatedly carried out. For example, if the convolution kernel size for convolution calculation in the first sampled convolution layer is 7*7, the convolution kernel size of 7*7 is kept unchanged, convolution, batch normalization and maximum pooling are repeatedly performed 3 times to obtain an intermediate matrix, the intermediate matrix is input to the second sampled convolution layer, the convolution kernel size is 5*5, the convolution, batch normalization and maximum pooling are repeatedly performed 2 times to obtain another intermediate matrix, the intermediate matrix is input to the third sampled convolution layer, the convolution kernel size is 3*3, and the convolution, batch normalization and maximum pooling are repeatedly performed 2 times to obtain a fourth matrix. As can be seen, the convolution kernel size varies from 7×7,5×5,3×3 in sequence, and as a whole, the convolution kernel size decreases, conforming to a predetermined decreasing receptive field.
S32: and inputting the fourth matrix into a full-connection layer of the classification convolutional network to perform classification processing to obtain a classification result of the target text.
In convolutional neural networks, the role before the fully connected layer is to extract features, and the role of the fully connected layer is to classify. In the convolutional neural network structure, after passing through a plurality of convolutional layers and pooling layers, 1 or more than 1 fully connected layers are connected, each neuron in the fully connected layers is fully connected with all neurons in the previous layer, and local information with category differentiation in the convolutional layers or pooling layers is integrated. In an embodiment, a classification result of the target text obtained through the classification processing of the full connection layer is represented in a two-dimensional matrix, one dimension of the matrix represents the number of label categories, the other dimension represents the batch size, wherein a numerical value in the matrix represents the probability of a corresponding category, and the category with a large probability value is taken as the classification result of the target text.
In the embodiment, the classifying layer structure based on the pre-training model BERT model is improved from the existing single-layer full connection to a character-level classifying convolutional network, wherein the classifying layer structure comprises a plurality of convolutional pooling layers, the receptive field of the whole classifying convolutional network is kept to be continuously reduced, and the classifying performance of a classifier is improved, so that the overall accuracy of classifying target texts is improved.
According to the text feature extraction method, text feature extraction is carried out on a character sequence obtained by dividing a target text, so that a first matrix is obtained, and the first matrix is used for representing text features of the target text; inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix; and inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text. Because the text feature describes the association among the words in the target text, the text feature is obtained by feature extraction based on the actual content of the target text, namely the text feature is related to the actual content of the target text, the tag feature is vectorization of tag information of the target text, and the tag of the target text is used for indicating the same point or different points between the target text and other texts, the text feature vector of the target text is fused with the tag feature, and when the fused feature is used for text classification, the weight of the actual content of the target text is considered, the weight of the same point or the weight of the different points between the target text and other texts is considered, and the accuracy of text classification is improved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.
In one embodiment, a text classification device is provided, where the text classification device corresponds to the text classification method in the above embodiment one by one. As shown in fig. 7, the text classification device includes a feature extraction module 10, a feature fusion module 20, and a text classification module 30, and each functional module is described in detail as follows:
the feature extraction module 10 is configured to perform text feature extraction on a character sequence obtained by word segmentation on a target text, so as to obtain a first matrix, where the first matrix is used for representing text features of the target text.
And the feature fusion module 20 is configured to input the tag attention network with the first matrix and a second matrix obtained based on the tag information of the target text, and perform feature fusion to obtain a third matrix.
And the text classification module 30 is used for inputting the third matrix into a classification convolution network to perform classification processing to obtain a classification result of the target text.
For specific limitations of the text classification apparatus, reference may be made to the above limitations of the text classification method, and no further description is given here. The respective modules in the above text classification apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a client or a server, and the internal structure of which may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a readable storage medium, an internal memory. The readable storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the readable storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a text classification method.
In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the text classification method of the above embodiments when the computer program is executed by the processor.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the text classification method of the above embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.
Claims (9)
1. A method of text classification, comprising:
extracting text features of a character sequence obtained by dividing a target text to obtain a first matrix, wherein the first matrix is used for representing the text features of the target text;
inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix;
inputting the third matrix into a classification convolution network for classification processing to obtain a classification result of the target text;
and inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix, wherein the method comprises the following steps of:
determining an initial tag matrix according to the tag information of the target text, and carrying out random initialization on the initial tag matrix to obtain a second matrix;
respectively carrying out normalization processing on the first matrix and the second matrix to obtain a first normalization matrix corresponding to the first matrix and a second normalization matrix corresponding to the second matrix;
performing matrix multiplication on the first normalization matrix and the second normalization matrix to obtain a preprocessing matrix;
and performing feature extraction on the preprocessing matrix to obtain an attention matrix, and performing matrix dot multiplication on the attention matrix and the first matrix to obtain a third matrix.
2. The text classification method of claim 1, wherein the text feature extraction of the character sequence obtained by word segmentation of the target text to obtain a first matrix comprises:
dividing the target text into words to obtain a character sequence;
inputting the character sequence into a BERT model to obtain an initial embedded vector corresponding to each character in the character sequence;
and forming a matrix by the initial embedded vectors corresponding to all the characters to obtain the first matrix.
3. The text classification method of claim 1, wherein said feature extraction of said pre-processing matrix to obtain an attention matrix comprises:
carrying out convolution pooling on the pretreatment matrix to obtain a matrix to be activated;
and processing the matrix to be activated by using an activation function to obtain the attention matrix.
4. The text classification method of claim 1, wherein said classifying the third matrix input into a classification convolutional network to obtain a classification result of the target text, comprises:
sampling convolution is carried out on the third matrix according to a preset receptive field in the classified convolution network, so that a fourth matrix is obtained;
and inputting the fourth matrix into a full-connection layer of the classification convolutional network to perform classification processing to obtain a classification result of the target text.
5. The text classification method of claim 4, wherein said performing a sample convolution on said third matrix according to a predetermined receptive field in said classification convolution network to obtain a fourth matrix comprises:
carrying out sampling convolution on the third matrix through N sampling convolution layers in the classification convolution network to obtain a fourth matrix; wherein N is an integer greater than 1 and each of the sampled convolutional layers has a different receptive field.
6. The text classification method of claim 5, wherein the convolution kernel size of each of the N sample convolution layers is configured to sequentially decrease according to a sequential order of sample convolutions to the third matrix.
7. A text classification device, the text classification device comprising:
the character extraction module is used for extracting text characteristics of a character sequence obtained by dividing a target text to obtain a first matrix, and the first matrix is used for representing text characteristics of the target text;
the feature fusion module is used for inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix;
the text classification module is used for inputting the third matrix into a classification convolution network to perform classification processing to obtain a classification result of the target text;
and inputting the first matrix and a second matrix obtained based on the label information of the target text into a label attention network for feature fusion to obtain a third matrix, wherein the method comprises the following steps of:
determining an initial tag matrix according to the tag information of the target text, and carrying out random initialization on the initial tag matrix to obtain a second matrix;
respectively carrying out normalization processing on the first matrix and the second matrix to obtain a first normalization matrix corresponding to the first matrix and a second normalization matrix corresponding to the second matrix;
performing matrix multiplication on the first normalization matrix and the second normalization matrix to obtain a preprocessing matrix;
and performing feature extraction on the preprocessing matrix to obtain an attention matrix, and performing matrix dot multiplication on the attention matrix and the first matrix to obtain a third matrix.
8. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the text classification method according to any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the text classification method according to any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110776201.0A CN113486175B (en) | 2021-07-08 | 2021-07-08 | Text classification method, text classification device, computer device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110776201.0A CN113486175B (en) | 2021-07-08 | 2021-07-08 | Text classification method, text classification device, computer device, and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113486175A CN113486175A (en) | 2021-10-08 |
CN113486175B true CN113486175B (en) | 2024-03-15 |
Family
ID=77938258
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110776201.0A Active CN113486175B (en) | 2021-07-08 | 2021-07-08 | Text classification method, text classification device, computer device, and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113486175B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114880474A (en) * | 2022-05-06 | 2022-08-09 | 江苏大学 | Mathematical problem text multi-label classification method based on mathematical characteristic extraction |
CN115293138B (en) * | 2022-08-03 | 2023-06-09 | 北京中科智加科技有限公司 | Text error correction method and computer equipment |
CN117746167B (en) * | 2024-02-20 | 2024-04-19 | 四川大学 | Training method and classifying method for oral panorama image swing bit error classification model |
CN117877043B (en) * | 2024-03-11 | 2024-07-09 | 深圳市壹倍科技有限公司 | Model training method, text recognition method, device, equipment and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180137168A (en) * | 2017-06-16 | 2018-12-27 | (주)이스트소프트 | Apparatus for classifying category of a text based on neural network, method thereof and computer recordable medium storing program to perform the method |
CN109492101A (en) * | 2018-11-01 | 2019-03-19 | 山东大学 | File classification method, system and medium based on label information and text feature |
CN110209823A (en) * | 2019-06-12 | 2019-09-06 | 齐鲁工业大学 | A kind of multi-tag file classification method and system |
CN111428026A (en) * | 2020-02-20 | 2020-07-17 | 西安电子科技大学 | Multi-label text classification processing method and system and information data processing terminal |
CN111666406A (en) * | 2020-04-13 | 2020-09-15 | 天津科技大学 | Short text classification prediction method based on word and label combination of self-attention |
-
2021
- 2021-07-08 CN CN202110776201.0A patent/CN113486175B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180137168A (en) * | 2017-06-16 | 2018-12-27 | (주)이스트소프트 | Apparatus for classifying category of a text based on neural network, method thereof and computer recordable medium storing program to perform the method |
CN109492101A (en) * | 2018-11-01 | 2019-03-19 | 山东大学 | File classification method, system and medium based on label information and text feature |
CN110209823A (en) * | 2019-06-12 | 2019-09-06 | 齐鲁工业大学 | A kind of multi-tag file classification method and system |
CN111428026A (en) * | 2020-02-20 | 2020-07-17 | 西安电子科技大学 | Multi-label text classification processing method and system and information data processing terminal |
CN111666406A (en) * | 2020-04-13 | 2020-09-15 | 天津科技大学 | Short text classification prediction method based on word and label combination of self-attention |
Non-Patent Citations (1)
Title |
---|
基于神经网络融合标签相关性的多标签情感预测研究;陈玮 等;中文信息学报;第35卷(第1期);104-112 * |
Also Published As
Publication number | Publication date |
---|---|
CN113486175A (en) | 2021-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113486175B (en) | Text classification method, text classification device, computer device, and storage medium | |
WO2023134084A1 (en) | Multi-label identification method and apparatus, electronic device, and storage medium | |
US20220382553A1 (en) | Fine-grained image recognition method and apparatus using graph structure represented high-order relation discovery | |
CN109948149B (en) | Text classification method and device | |
CN111652332B (en) | Deep learning handwritten Chinese character recognition method and system based on two classifications | |
CN107169485B (en) | Mathematical formula identification method and device | |
CN111639186B (en) | Multi-category multi-label text classification model and device with dynamic embedded projection gating | |
Naseer et al. | Meta features-based scale invariant OCR decision making using LSTM-RNN | |
CN108985442B (en) | Handwriting model training method, handwritten character recognition method, device, equipment and medium | |
CN113849648A (en) | Classification model training method and device, computer equipment and storage medium | |
CN111666931A (en) | Character and image recognition method, device and equipment based on mixed convolution and storage medium | |
CN111985525A (en) | Text recognition method based on multi-mode information fusion processing | |
Inunganbi et al. | Handwritten Meitei Mayek recognition using three‐channel convolution neural network of gradients and gray | |
CN114881169A (en) | Self-supervised contrast learning using random feature corruption | |
CN110929724A (en) | Character recognition method, character recognition device, computer equipment and storage medium | |
Hegadi et al. | Recognition of Marathi handwritten numerals using multi-layer feed-forward neural network | |
Dan et al. | PF‐ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition | |
CN112287662A (en) | Natural language processing method, device and equipment based on multiple machine learning models | |
CN112307749A (en) | Text error detection method and device, computer equipment and storage medium | |
Liu et al. | Multi-digit recognition with convolutional neural network and long short-term memory | |
Nayak et al. | Odia character recognition using backpropagation network with binary features | |
Maity et al. | Handwritten Bengali character recognition using deep convolution neural network | |
Prashanth et al. | Handwritten recognition of Tamil vowels using deep learning | |
Karim et al. | Bangla Sign Language Recognition using YOLOv5 | |
CN114694150A (en) | Method and system for improving generalization capability of digital image classification model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |