CN109299246B - Text classification method and device - Google Patents

Text classification method and device Download PDF

Info

Publication number
CN109299246B
CN109299246B CN201811475663.3A CN201811475663A CN109299246B CN 109299246 B CN109299246 B CN 109299246B CN 201811475663 A CN201811475663 A CN 201811475663A CN 109299246 B CN109299246 B CN 109299246B
Authority
CN
China
Prior art keywords
vector
text
weight
linear transformation
linear
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811475663.3A
Other languages
Chinese (zh)
Other versions
CN109299246A (en
Inventor
王栋
曾国卿
许志强
孙昌勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ronglian Yitong Information Technology Co ltd
Original Assignee
Beijing Ronglian Yitong Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ronglian Yitong Information Technology Co ltd filed Critical Beijing Ronglian Yitong Information Technology Co ltd
Priority to CN201811475663.3A priority Critical patent/CN109299246B/en
Publication of CN109299246A publication Critical patent/CN109299246A/en
Application granted granted Critical
Publication of CN109299246B publication Critical patent/CN109299246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a text classification method and a text classification device, wherein the method comprises the following steps: acquiring a text vector of a target text; respectively performing first linear transformation and second linear transformation on the text vector to obtain a first text vector after the first linear transformation and a second text vector after the second linear transformation, wherein weights of the first linear transformation and the second linear transformation are different; calculating the first text vector and the second text vector to obtain a weight vector; obtaining a target feature vector according to the weight vector and the text vector; mapping the target characteristic vector into a one-dimensional vector according to category mapping of a full connection layer, wherein the dimensionality of the one-dimensional vector corresponds to a preset category one to one; and determining the text category of the target text according to the dimension of the maximum value in the one-dimensional vector. The embodiment of the application improves the comprehensiveness of feature extraction, and further improves the accuracy of text classification.

Description

Text classification method and device
Technical Field
The application relates to the technical field of text classification, in particular to a text classification method and device.
Background
The text classification is widely applied in actual life, and can be used for identifying whether the mails received by the mailbox are junk mails, identifying the emotional tendency of the text, acquiring investment information and other problems. With the rapid increase of the text data volume, the traditional text classification method cannot meet the requirement. The deep learning algorithm comes along with the production and obtains good effect on large-scale text classification.
In the prior art, CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) are commonly used deep learning networks. When text classification is performed, the CNN is to decompose an input text into a plurality of small-range character sequences and extract important information in the small-range character sequences, and by adopting the method, the associated information between characters is easily lost. In addition, since the RNN can only process sentences of a specific length, characters that are out of range are automatically omitted when the input sentence exceeds a set length. Therefore, a great deal of text information is lost by adopting the CNN or the RNN alone, and the integrity of extracted important information is poor, so that the accuracy of text classification is reduced.
Disclosure of Invention
In view of the above, an object of the present application is to provide a method and an apparatus for text classification, so as to improve comprehensiveness of feature extraction and further improve accuracy of text classification.
In a first aspect, an embodiment of the present application provides a text classification method, including:
acquiring a text vector of a target text;
respectively performing first linear transformation and second linear transformation on the text vector to obtain a first text vector after the first linear transformation and a second text vector after the second linear transformation, wherein weights of the first linear transformation and the second linear transformation are different;
calculating the first text vector and the second text vector to obtain a weight vector;
obtaining a target feature vector according to the weight vector and the text vector;
mapping the target characteristic vector into a one-dimensional vector according to category mapping of a full connection layer, wherein the dimensionality of the one-dimensional vector corresponds to a preset category one to one;
and determining the text category of the target text according to the dimension of the maximum value in the one-dimensional vector.
With reference to the first aspect, an embodiment of the present application provides a first possible implementation manner of the first aspect, where performing a first linear transformation and a second linear transformation on the text vector respectively includes:
initializing a first linear weight and a first offset of a first fully-connected layer and a second linear weight and a second offset of a second fully-connected layer, wherein the first linear weight is different from the second linear weight and/or the first offset is different from the second offset;
respectively inputting the text vectors into a first initialized full-connection layer and a second initialized full-connection layer for linear transformation, obtaining a first optimized linear weight and a first optimized offset of the first full-connection layer, which enable the loss function to be minimum, and obtaining a second optimized linear weight and a second optimized offset of the second full-connection layer, which enable the loss function to be minimum;
and respectively carrying out linear transformation on the text vector according to the first full-connection layer of the first optimized linear weight and the first optimized offset and the second full-connection layer of the second optimized linear weight and the second optimized offset.
With reference to the first aspect, an embodiment of the present application provides a second possible implementation manner of the first aspect, where performing an operation on the first text vector and the second text vector to obtain a weight vector includes:
transposing the second text vector to obtain a transposed vector;
obtaining a product of the first text vector and the transposed vector to obtain an initial weight vector;
and calculating the initial weight vector by using a regression algorithm to obtain a weight vector.
With reference to the first aspect, an embodiment of the present application provides a third possible implementation manner of the first aspect, where obtaining a target feature vector according to the weight vector and the text vector includes:
performing third linear transformation on the text vector to obtain a third text vector after the third linear transformation, wherein the coefficient of the third linear transformation is different from the coefficients of the first linear transformation and the second linear transformation;
obtaining a product of the weight matrix and the third text vector to obtain a feature vector;
and extracting the features of the feature vector to obtain a target feature vector.
With reference to the first aspect, an embodiment of the present application provides a fourth possible implementation manner of the first aspect, where determining a text category of a target text according to a dimension of a maximum value in the one-dimensional vector includes:
normalizing the one-dimensional vector through an output layer to obtain a normalized one-dimensional vector;
and determining the category corresponding to the maximum value in the normalized one-dimensional vector as the text category of the target text.
With reference to the first aspect, the first possible implementation manner of the first aspect, the second possible implementation manner of the first aspect, the third possible implementation manner of the first aspect, or the fourth possible implementation manner of the first aspect, an embodiment of the present application provides a fifth possible implementation manner of the first aspect, where after the obtaining of the target feature vector, before mapping the target feature vector into a one-dimensional vector according to a class mapping of a fully-connected layer, the method further includes:
judging whether the frequency of obtaining the target feature vector reaches a preset frequency threshold value or not, if not, updating the text vector by using the target feature vector to obtain an updated text vector, and replacing the text vector by using the updated text vector;
and executing the steps of respectively performing first linear transformation and second linear transformation on the text vector until the frequency of obtaining the target characteristic vector reaches a preset frequency threshold value.
In a second aspect, an embodiment of the present application provides a text classification apparatus, including:
a text representation module: the method comprises the steps of obtaining a text vector of a target text;
a feature extraction module: the text vector processing device is used for respectively carrying out first linear transformation and second linear transformation on the text vector to obtain a first text vector after the first linear transformation and a second text vector after the second linear transformation, wherein weights of the first linear transformation and the second linear transformation are different, the first text vector and the second text vector are operated to obtain a weight vector, and a target feature vector is obtained according to the weight vector and the text vector;
and the text classification module is used for mapping the target characteristic vector into a one-dimensional vector according to class mapping of the full connection layer, wherein the dimensionality of the one-dimensional vector corresponds to a preset class one by one, and the text class of the target text is determined according to the dimensionality of the maximum value in the one-dimensional vector.
With reference to the second aspect, an embodiment of the present application provides a first possible implementation manner of the second aspect, where the feature extraction module obtains the first text vector and the second text vector in the following manner:
initializing a first linear weight and a first offset of a first fully-connected layer and a second linear weight and a second offset of a second fully-connected layer, wherein the first linear weight is different from the second linear weight and/or the first offset is different from the second offset;
respectively inputting the text vectors into a first initialized full-connection layer and a second initialized full-connection layer for linear transformation, obtaining a first optimized linear weight and a first optimized offset of the first full-connection layer, which enable the loss function to be minimum, and obtaining a second optimized linear weight and a second optimized offset of the second full-connection layer, which enable the loss function to be minimum;
and respectively carrying out linear transformation on the text vector according to the first full-connection layer of the first optimized linear weight and the first optimized offset and the second full-connection layer of the second optimized linear weight and the second optimized offset.
With reference to the second aspect, an embodiment of the present application provides a second possible implementation manner of the second aspect, where the feature extraction module obtains a weight vector according to the following operation on the first text vector and the second text vector:
transposing the second text vector to obtain a transposed vector;
obtaining a product of the first text vector and the transposed vector to obtain an initial weight vector;
and calculating the initial weight vector by using a regression algorithm to obtain a weight vector.
With reference to the second aspect, an embodiment of the present application provides a third possible implementation manner of the second aspect, where the feature extraction module obtains a target feature vector according to the weight vector and the text vector in the following manner:
performing third linear transformation on the text vector to obtain a third text vector after the third linear transformation, wherein the coefficient of the third linear transformation is different from the coefficients of the first linear transformation and the second linear transformation;
obtaining a product of the weight matrix and the third text vector to obtain a feature vector;
and extracting the features of the feature vector to obtain a target feature vector.
According to the text classification method and device provided by the embodiment of the application, a self-attention mechanism is adopted, the user input content is selectively concerned, the features of important contents are extracted, finally the text is classified according to the extracted features, compared with the prior art that text classification is carried out by singly adopting CNN or RNN, a large amount of text information is lost, the extracted features are not comprehensive, and the accuracy of text classification is low, a large amount of text information is reserved, the comprehensiveness of extracted features is improved, and the accuracy of text classification is further improved.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a flowchart illustrating a text classification method provided in an embodiment of the present application;
FIG. 2 is a diagram illustrating a text vector provided by an embodiment of the present application;
FIG. 3 is a flow chart illustrating a method for classifying text provided by an embodiment of the present application;
fig. 4 is a schematic diagram illustrating an operation process of a full connection layer according to an embodiment of the present application.
Illustration of the drawings: a target feature vector 1; convolution kernel 2 of the full connection layer; one-dimensional vector 3.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
In view of the fact that in the prior art, text classification is performed by singly adopting a CNN or an RNN, a large amount of text information is lost, extraction characteristics are not comprehensive, and thus the accuracy of text classification is low, embodiments of the present application provide a method and an apparatus for text classification, which are described below by embodiments.
In order to solve the problem of low classification accuracy caused by incomplete text information, an embodiment of the present application provides a method for classifying texts, which includes the following steps, as shown in fig. 1:
step S101, obtaining a text vector of a target text.
In the embodiment of the application, the target text refers to one or several sentences input by the user, and the text vector refers to the text representation of the sentences input by the user. Optionally, step S101 includes:
(1) and obtaining a word vector corresponding to each character in the target text according to the corresponding relation between the pre-trained character and the word vector.
Specifically, the word vector is a d-dimensional real number column vector, and each word corresponds to a word vector. Wherein d may be an integer greater than 1. After a user inputs a sentence, a word vector corresponding to each character of the sentence is automatically obtained according to the corresponding relation.
(2) And constructing a vector matrix for the word vector corresponding to each acquired word according to the input sequence of the words in the target text to obtain a text vector.
Specifically, the obtained word vectors are used as a line of a vector matrix according to the input sequence of the characters in the target text, the line number of the vector matrix is the number of the characters in the target text, and the column number of the vector matrix is the dimension of the word vectors. Finally, a word vector representation of the target text, i.e. a text vector, is formed. As shown in fig. 2, n is the length of the sentence (the number of words included in the sentence), d is the dimension of the word vector, and the dimension of the text vector is n × d. For example, the dimension of the word vector is 10, the user inputs a sentence with a length of 8 words, and the resulting text vector is a vector matrix with a dimension of 8 × 10.
Step S102, respectively carrying out first linear transformation and second linear transformation on the text vector to obtain a first text vector after the first linear transformation and a second text vector after the second linear transformation, wherein weights of the first linear transformation and the second linear transformation are different.
The weight values comprise linear weight values and offset values, and the difference of the weight values means that at least one value is different in the linear weight values and the offset values of the linear transformation.
In this embodiment, as an alternative embodiment, the text vector may be linearly transformed through a full-connected layer.
Specifically, the performing the first linear transformation and the second linear transformation on the text vector, respectively, includes:
initializing a first linear weight and a first offset of a first fully connected layer and a second linear weight and a second offset of a second fully connected layer, wherein the first linear weight is different from the second linear weight and/or the first offset is different from the second offset;
respectively inputting the text vectors into the initialized first full-link layer and the initialized second full-link layer for linear transformation, acquiring a first optimized linear weight and a first optimized offset of the first full-link layer, which enable the loss function to be minimum, and acquiring a second optimized linear weight and a second optimized offset of the second full-link layer, which enable the loss function to be minimum;
and respectively carrying out linear transformation on the text vector according to the first full-connection layer of the first optimized linear weight and the first optimized offset and the second full-connection layer of the second optimized linear weight and the second optimized offset.
In the embodiment of the application, the text vector is subjected to linear transformation according to a first optimized linear weight and a first full-connection layer of a first optimized offset to obtain a first text vector; and performing linear transformation on the text vector according to the second optimized linear weight and the second full-connection layer of the second optimized offset to obtain a second text vector.
In the embodiment of the present application, the processing procedure of the full connection layer for performing linear transformation on the text vector is as follows: setting the input text vector as x, initializing a linear weight w and an offset b in a linear transformation formula of the full connection layer, namely initializing the linear transformation formula: y is wx + b. Wherein y is a text vector after linear transformation, and x is a text vector. And the optimizer of the full connection layer continuously adjusts the linear weight and the offset through a gradient descent algorithm to obtain the linear weight and the offset which enable the loss function to be minimum, wherein the loss function refers to the inconsistency degree of the predicted value and the true value, and the inconsistency degree of the text vector after linear transformation and the target text is referred to here.
In the embodiment of the application, two groups of different linear weights and offsets are set by initializing the linear weights and offsets of the two full-connection layers. Respectively carrying out linear transformation on the text vector by two initialized fully-connected layers, and respectively assuming that the two fully-connected layers minimize the loss function, the linear weight and the offset are respectively (w)1,b1) And (w)2,b2) The obtained linear weight and the first optimization bias are includedThe first linear transformation formula of the shifted first fully-connected layer, and the second linear transformation formula of the second fully-connected layer including the second optimized linear weight and the second optimized offset amount may be expressed as formula (1) and formula (2), respectively.
y1=w1x+b1 (1)
y2=w2x+b2 (2)
Wherein, y1Representing the first linearly varied text vector, y2Representing the second linearly varied text vector.
And (3) respectively inputting the text vectors into a formula (1) and a formula (2) to obtain a first text vector and a second text vector.
Step S103, the first text vector and the second text vector are operated to obtain a weight vector.
The weight vector represents the importance degree of a word or a word in a sentence input by a user, and the importance degree refers to the degree of influence on text classification. The calculation methods of the weight vector include Dot product (Dot product), General (weight network mapping), and concat (concatenation mapping), wherein the Dot product operation is the simplest and most common.
Optionally, an initial weight vector is obtained according to formula (3) based on the first text vector and the second text vector:
M=Q*KT (3),
wherein Q is a first text vector, K is a second text vector, and M is an initial weight vector.
And calculating the initial weight vector by using a regression algorithm to obtain the weight vector.
In particular, the regression algorithm is specifically referred to as softmax regression algorithm, which has the function of normalizing the initial weight vectors.
And step S104, obtaining a target characteristic vector according to the weight vector and the text vector.
Wherein the attention mechanism algorithm mimics the internal process of biological observation behavior, i.e., an algorithm that aligns internal experience with external perception to increase the observation fineness of a partial region. The self-attention mechanism algorithm is an improvement of the attention mechanism algorithm, reduces dependence on external information, and is better at capturing internal correlation of data or characteristics. The self-attention mechanism algorithm may enable selective attention to the user-entered sentence.
In the embodiment of the application, different weights are given to the text vector according to the numerical values in the weight vector, so that the sentence input by the user can be selectively focused on based on the self-attention mechanism.
In the embodiment of the present application, optionally, obtaining the target feature vector according to the weight vector and the text vector includes steps S1041 to S1043, as shown in fig. 3, which are specifically as follows:
step S1041, performing a third linear transformation on the text vector to obtain a third text vector after the third linear transformation, where coefficients of the third linear transformation are different from coefficients of the first linear transformation and the second linear transformation.
In the embodiment of the present application, the coefficient includes a weight and an offset. Specifically, the third fully-connected layer is initialized, a third weight for a third linear transformation of the third fully-connected layer is different from the first weight and the second weight, and/or a third offset for the third linear transformation of the third fully-connected layer is different from the first offset and the second offset.
And step S1042, obtaining a feature vector according to the weight matrix and the third text vector.
In the embodiment of the present application, the feature vector is calculated according to formula (4):
T=S*V (4)
wherein S is a weight vector, V is a third text vector, and T is a feature vector.
Wherein the feature vector is an intermediate result of obtaining the target feature vector.
And step S1043, extracting the features of the feature vectors to obtain target feature vectors.
Optionally, step S1043 specifically includes:
(1) and normalizing the feature vector to obtain the normalized feature vector.
Specifically, normalization can be achieved by layer normalization, which is based on the principle of fixing the input mean and variance of a layer and adjusting the input data according to the fixed mean and variance. Normalization can accelerate losses to a minimum.
(2) And extracting the features of the normalized feature vector to obtain a target feature vector.
Specifically, feature extraction is performed on the normalized feature vector by a position-wise feed forward network (position feed forward network). The operation of the position feedforward network is equivalent to the operation of two continuous one-dimensional convolution layers, wherein the convolution kernel sizes of the two one-dimensional convolution layers are both 1 x 1, and the input of the former one-dimensional convolution layer is connected with the input of the latter one-dimensional convolution layer. The former one-dimensional convolutional layer is used for extracting the characteristics of the target characteristic vector, and the latter one-dimensional convolutional layer is used for converting the output result of the former one-dimensional convolutional layer into a matrix with the same dimension as the characteristic vector. The output result of the latter one-dimensional convolutional layer is normalized through layer normalization, and the normalized result is a target feature vector.
In the embodiment of the present application, in order to be able to retain information included in the target text as much as possible, after obtaining the target feature vector, the method further includes:
a11, judging whether the frequency of obtaining the target characteristic vector reaches a preset frequency threshold value, if not, updating the text vector by using the target characteristic vector to obtain an updated text vector, and replacing the text vector by using the updated text vector;
in the embodiment of the present application, the text vector is updated according to formula (5):
X'=X+N (5)
wherein X' is an updated text vector, X is a text vector, and N is a target feature vector.
In the embodiment of the present application, if the number of times of obtaining the target feature vector reaches the preset number threshold, the subsequent step 105 is executed according to the obtained target feature vector.
And A12, performing the steps of performing first linear transformation and second linear transformation on the text vector respectively until the times of obtaining the target feature vector reach a preset time threshold.
Specifically, the updated text vector is used as the text vector of the target text, and the target feature vector is calculated again in accordance with the method of steps S102 to S104. The above process may be repeated for a plurality of times, and preferably, the preset threshold number of times is 6 times. The method can keep the original text information, does not lose the original text information in the process of extracting the characteristics, and continuously strengthens the importance degree of the characters or words with large importance degree in the process of extracting.
Step S105, mapping the target characteristic vector into a one-dimensional vector according to class mapping of the full connection layer, wherein the dimensions of the one-dimensional vector correspond to preset classes one by one;
wherein the number and content of the categories can be set according to needs. And mapping the target characteristic vector into a one-dimensional vector, specifically, inputting the target characteristic vector into a category mapping sublayer of the full connection layer, and then outputting the one-dimensional vector by the full connection layer. The role of the fully-connected layer is to map the target feature vector to the class space. The one-dimensional vector is the result of mapping the target feature vector to a class space, and each dimension of the one-dimensional vector corresponds to a unique class.
The operation process of the full connection layer, as shown in fig. 4, specifically includes the following steps: assuming that the dimension of the target feature vector is n x d, m convolution kernels with the size of n x d should be arranged on the fully-connected layer with the number of preset categories being m, the numerical values in each convolution kernel are different, the target feature vector and each convolution kernel are subjected to convolution operation to obtain one numerical value, the m convolution kernels obtain m numerical values, and the m numerical values form a one-dimensional column vector. The target feature vector is calculated by convolution of the full connection layer, and a one-dimensional column vector with the dimension of 1 x m can be obtained.
And step S106, determining the text type of the target text according to the dimension of the maximum value in the one-dimensional vector.
Optionally, step S106 includes:
(1) and normalizing the one-dimensional vector through the output layer to obtain the normalized one-dimensional vector.
(2) And determining the category corresponding to the maximum value in the normalized one-dimensional vector as the text category of the target text.
Specifically, the one-dimensional vector is input to the output layer, which may be a softmax layer, and the one-dimensional vector is normalized by a softmax function so that the sum of the elements of the one-dimensional vector is 1, and in this case, each element represents a probability value of a category corresponding to the corresponding dimension. The category with the highest probability value is the category of the target text.
An embodiment of the present application further provides a text classification apparatus, including:
a text representation module: the method comprises the steps of obtaining a text vector of a target text;
a feature extraction module: the text vector processing device is used for respectively carrying out first linear transformation and second linear transformation on the text vector to obtain a first text vector after the first linear transformation and a second text vector after the second linear transformation, wherein weights of the first linear transformation and the second linear transformation are different, the first text vector and the second text vector are operated to obtain a weight vector, and a target feature vector is obtained according to the weight vector and the text vector;
and the text classification module is used for mapping the target characteristic vector into a one-dimensional vector according to the class mapping of the full connection layer, wherein the dimensionality of the one-dimensional vector corresponds to the preset class one by one, and the text class of the target text is determined according to the dimensionality of the maximum value in the one-dimensional vector.
In the embodiment of the application, the target text refers to one or several sentences input by the user, and the text vector refers to the text representation of the sentences input by the user.
Optionally, the text representation module is specifically configured to: (1) and obtaining a word vector corresponding to each character in the target text according to the corresponding relation between the pre-trained character and the word vector. (2) And constructing a vector matrix for the word vector corresponding to each acquired word according to the input sequence of the words in the target text to obtain a text vector.
Specifically, the word vector is a d-dimensional real number column vector, and each word corresponds to a word vector. Wherein d may be an integer greater than 1. Finally, a word vector representation of the target text, i.e. a text vector, is formed.
Optionally, the feature extraction module obtains the first text vector and the second text vector as follows:
initializing a first linear weight and a first offset of a first fully connected layer and a second linear weight and a second offset of a second fully connected layer, wherein the first linear weight is different from the second linear weight and/or the first offset is different from the second offset;
respectively inputting the text vectors into the initialized first full-link layer and the initialized second full-link layer for linear transformation, acquiring a first optimized linear weight and a first optimized offset of the first full-link layer, which enable the loss function to be minimum, and acquiring a second optimized linear weight and a second optimized offset of the second full-link layer, which enable the loss function to be minimum;
and respectively carrying out linear transformation on the text vector according to the first full-connection layer of the first optimized linear weight and the first optimized offset and the second full-connection layer of the second optimized linear weight and the second optimized offset.
Specifically, the loss function refers to the degree of inconsistency between the predicted value and the true value, and here refers to the degree of inconsistency between the linearly transformed text vector and the target text.
Optionally, the feature extraction module obtains the weight vector by performing an operation on the first text vector and the second text vector as follows:
transposing the second text vector to obtain a transposed vector;
obtaining a product of the first text vector and the transposed vector to obtain an initial weight vector;
and calculating the initial weight vector by using a regression algorithm to obtain the weight vector.
In particular, the regression algorithm is specifically referred to as softmax regression algorithm, which has the function of normalizing the initial weight vectors.
Optionally, the feature extraction module obtains the target feature vector according to the weight vector and the text vector in the following manner:
performing third linear transformation on the text vector to obtain a third text vector after the third linear transformation, wherein the coefficient of the third linear transformation is different from the coefficients of the first linear transformation and the second linear transformation;
obtaining a product of the weight matrix and the third text vector to obtain a feature vector;
and extracting the characteristic of the characteristic vector to obtain a target characteristic vector.
In the embodiment of the present application, the coefficient includes a weight and an offset. Specifically, the third fully-connected layer is initialized, a third weight for a third linear transformation of the third fully-connected layer is different from the first weight and the second weight, and/or a third offset for the third linear transformation of the third fully-connected layer is different from the first offset and the second offset.
Optionally, the feature extraction module performs feature extraction on the feature vector according to the following manner to obtain a target feature vector: standardizing the characteristic vector to obtain a standardized characteristic vector; and extracting the features of the normalized feature vector to obtain a target feature vector.
Specifically, the standardization can be achieved by layer normalization. Normalization can accelerate losses to a minimum. And (3) performing feature extraction on the normalized feature vector through a position-wise feed forward network. The output result of the position feedforward network is standardized through layer normalization, and the standardized result is a target feature vector.
In the embodiment of the present application, in order to keep information included in the target text as much as possible, after the feature extraction module obtains the target feature vector, the feature extraction module is further configured to:
a11, judging whether the frequency of obtaining the target characteristic vector reaches a preset frequency threshold value, if not, updating the text vector by using the target characteristic vector to obtain an updated text vector, and replacing the text vector by using the updated text vector;
in the embodiment of the present application, the text vector is updated according to formula (5):
X'=X+N (5)
wherein X' is an updated text vector, X is a text vector, and N is a target feature vector.
And A12, performing the steps of performing first linear transformation and second linear transformation on the text vector respectively until the times of obtaining the target feature vector reach a preset time threshold.
Specifically, the updated text vector is used as the text vector of the target text, and the feature extraction module calculates the target feature vector again. The above process may be repeated for a plurality of times, and preferably, the preset threshold number of times is 6 times.
Optionally, the feature extraction module determines the text category of the target text according to the dimension of the maximum value in the one-dimensional vector in the following manner: normalizing the one-dimensional vector through an output layer to obtain a normalized one-dimensional vector; and determining the category corresponding to the maximum value in the normalized one-dimensional vector as the text category of the target text.
Specifically, the one-dimensional vector is input to the output layer, which may be a softmax layer, and the one-dimensional vector is normalized by a softmax function so that the sum of the elements of the one-dimensional vector is 1, and in this case, each element represents a probability value of a category corresponding to the corresponding dimension. The category with the highest probability value is the category of the target text.
Based on the analysis, compared with the method in the related art that text classification is performed by singly adopting CNN or RNN, a large amount of text information is lost, the extracted features are incomplete, and the accuracy of the text classification is low, the text classification method provided by the embodiment of the application selectively pays attention to the user input content by using a self-attention mechanism, extracts the features of important content, and classifies the text according to the extracted features.
The computer program product of the method and the apparatus for text classification provided in the embodiment of the present application includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method in the foregoing method embodiment, and specific implementation may refer to the method embodiment, and is not described herein again.
The text classification device provided by the embodiment of the present application may be specific hardware on a device, or software or firmware installed on a device, etc. The device provided by the embodiment of the present application has the same implementation principle and technical effect as the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments where no part of the device embodiments is mentioned. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a division of one logic function, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the technical solutions of the present application, and the scope of the present application is not limited thereto, although the present application is described in detail with reference to the foregoing examples, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (6)

1. A method of text classification, comprising:
acquiring a text vector of a target text;
respectively performing first linear transformation and second linear transformation on the text vector to obtain a first text vector after the first linear transformation and a second text vector after the second linear transformation, wherein weights of the first linear transformation and the second linear transformation are different; the weight values comprise linear weight values and offset values, and the difference of the weight values means that at least one value is different in the linear weight values and the offset values of linear transformation; the linear weight and the offset are preset values in a linear transformation formula y ═ wx + b; in the linear transformation formula: y is a text vector after linear transformation, x is a text vector, w is a linear weight, and b is an offset;
calculating the first text vector and the second text vector to obtain a weight vector;
obtaining a target feature vector according to the weight vector and the text vector;
mapping the target characteristic vector into a one-dimensional vector according to category mapping of a full connection layer, wherein the dimensionality of the one-dimensional vector corresponds to a preset category one to one;
determining the text type of the target text according to the dimension of the maximum value in the one-dimensional vector;
calculating the first text vector and the second text vector to obtain a weight vector, including: transposing the second text vector to obtain a transposed vector; obtaining a product of the first text vector and the transposed vector to obtain an initial weight vector; calculating the initial weight vector by using a regression algorithm to obtain a weight vector;
obtaining a target feature vector according to the weight vector and the text vector, wherein the obtaining of the target feature vector comprises: performing third linear transformation on the text vector to obtain a third text vector after the third linear transformation, wherein the coefficient of the third linear transformation is different from the coefficients of the first linear transformation and the second linear transformation; obtaining a product of the weight matrix and the third text vector to obtain a feature vector; and extracting the features of the feature vector to obtain a target feature vector.
2. The method of claim 1, wherein performing a first linear transformation and a second linear transformation on the text vector comprises:
initializing a first linear weight and a first offset of a first fully-connected layer and a second linear weight and a second offset of a second fully-connected layer, wherein the first linear weight is different from the second linear weight and/or the first offset is different from the second offset;
respectively inputting the text vectors into a first initialized full-connection layer and a second initialized full-connection layer for linear transformation, obtaining a first optimized linear weight and a first optimized offset of the first full-connection layer, which enable the loss function to be minimum, and obtaining a second optimized linear weight and a second optimized offset of the second full-connection layer, which enable the loss function to be minimum;
and respectively carrying out linear transformation on the text vector according to the first full-connection layer of the first optimized linear weight and the first optimized offset and the second full-connection layer of the second optimized linear weight and the second optimized offset.
3. The method of claim 1, wherein determining the text category of the target text according to the dimension of the maximum value in the one-dimensional vector comprises:
normalizing the one-dimensional vector through an output layer to obtain a normalized one-dimensional vector;
and determining the category corresponding to the maximum value in the normalized one-dimensional vector as the text category of the target text.
4. The method according to any of claims 1-3, wherein after obtaining the target feature vector, before mapping the target feature vector into a one-dimensional vector according to a class mapping of a full connection layer, the method further comprises:
judging whether the frequency of obtaining the target feature vector reaches a preset frequency threshold value or not, if not, updating the text vector by using the target feature vector to obtain an updated text vector, and replacing the text vector by using the updated text vector;
and executing the steps of respectively performing first linear transformation and second linear transformation on the text vector until the frequency of obtaining the target characteristic vector reaches a preset frequency threshold value.
5. An apparatus for classifying text, the apparatus comprising:
a text representation module: the method comprises the steps of obtaining a text vector of a target text;
a feature extraction module: the text vector processing device is used for respectively carrying out first linear transformation and second linear transformation on the text vector to obtain a first text vector after the first linear transformation and a second text vector after the second linear transformation, wherein weights of the first linear transformation and the second linear transformation are different, the first text vector and the second text vector are operated to obtain a weight vector, and a target feature vector is obtained according to the weight vector and the text vector; the weight values comprise linear weight values and offset values, and the difference of the weight values means that at least one value is different in the linear weight values and the offset values of linear transformation; the linear weight and the offset are preset values in a linear transformation formula y ═ wx + b; in the linear transformation formula: y is a text vector after linear transformation, x is a text vector, w is a linear weight, and b is an offset;
the text classification module is used for mapping the target characteristic vector into a one-dimensional vector according to class mapping of a full connection layer, wherein the dimensionality of the one-dimensional vector corresponds to a preset class one by one, and the text class of the target text is determined according to the dimensionality of the maximum value in the one-dimensional vector;
the feature extraction module obtains a weight vector by operating the first text vector and the second text vector according to the following mode:
transposing the second text vector to obtain a transposed vector;
obtaining a product of the first text vector and the transposed vector to obtain an initial weight vector;
calculating the initial weight vector by using a regression algorithm to obtain a weight vector;
the feature extraction module obtains a target feature vector according to the weight vector and the text vector in the following way:
performing third linear transformation on the text vector to obtain a third text vector after the third linear transformation, wherein the coefficient of the third linear transformation is different from the coefficients of the first linear transformation and the second linear transformation;
obtaining a product of the weight matrix and the third text vector to obtain a feature vector;
and extracting the features of the feature vector to obtain a target feature vector.
6. The apparatus of claim 5, wherein the feature extraction module obtains the first text vector and the second text vector as follows:
initializing a first linear weight and a first offset of a first fully-connected layer and a second linear weight and a second offset of a second fully-connected layer, wherein the first linear weight is different from the second linear weight and/or the first offset is different from the second offset;
respectively inputting the text vectors into a first initialized full-connection layer and a second initialized full-connection layer for linear transformation, obtaining a first optimized linear weight and a first optimized offset of the first full-connection layer, which enable the loss function to be minimum, and obtaining a second optimized linear weight and a second optimized offset of the second full-connection layer, which enable the loss function to be minimum;
and respectively carrying out linear transformation on the text vector according to the first full-connection layer of the first optimized linear weight and the first optimized offset and the second full-connection layer of the second optimized linear weight and the second optimized offset.
CN201811475663.3A 2018-12-04 2018-12-04 Text classification method and device Active CN109299246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811475663.3A CN109299246B (en) 2018-12-04 2018-12-04 Text classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811475663.3A CN109299246B (en) 2018-12-04 2018-12-04 Text classification method and device

Publications (2)

Publication Number Publication Date
CN109299246A CN109299246A (en) 2019-02-01
CN109299246B true CN109299246B (en) 2021-08-03

Family

ID=65142453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811475663.3A Active CN109299246B (en) 2018-12-04 2018-12-04 Text classification method and device

Country Status (1)

Country Link
CN (1) CN109299246B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160016B (en) * 2019-04-15 2022-05-03 深圳碳云智能数字生命健康管理有限公司 Semantic recognition method and device, computer readable storage medium and computer equipment
CN110717023B (en) * 2019-09-18 2023-11-07 平安科技(深圳)有限公司 Method and device for classifying interview answer text, electronic equipment and storage medium
CN111241263A (en) * 2020-04-24 2020-06-05 支付宝(杭州)信息技术有限公司 Text generation method and device and electronic equipment
CN112528016B (en) * 2020-11-19 2024-05-07 重庆兆光科技股份有限公司 Text classification method based on low-dimensional spherical projection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108462A (en) * 2017-12-29 2018-06-01 河南科技大学 A kind of text emotion analysis method of feature based classification
CN108875000A (en) * 2018-06-14 2018-11-23 广东工业大学 A kind of semantic relation classification method merging more syntactic structures

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897268B (en) * 2017-02-28 2020-06-02 科大讯飞股份有限公司 Text semantic understanding method, device and system
CN107665248A (en) * 2017-09-22 2018-02-06 齐鲁工业大学 File classification method and device based on deep learning mixed model
CN107885853A (en) * 2017-11-14 2018-04-06 同济大学 A kind of combined type file classification method based on deep learning
CN108763216A (en) * 2018-06-01 2018-11-06 河南理工大学 A kind of text emotion analysis method based on Chinese data collection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108462A (en) * 2017-12-29 2018-06-01 河南科技大学 A kind of text emotion analysis method of feature based classification
CN108875000A (en) * 2018-06-14 2018-11-23 广东工业大学 A kind of semantic relation classification method merging more syntactic structures

Also Published As

Publication number Publication date
CN109299246A (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN110347835B (en) Text clustering method, electronic device and storage medium
CN109299246B (en) Text classification method and device
CN108628971B (en) Text classification method, text classifier and storage medium for unbalanced data set
CN109271521B (en) Text classification method and device
CN110598206A (en) Text semantic recognition method and device, computer equipment and storage medium
EP3499384A1 (en) Word and sentence embeddings for sentence classification
CN110858269B (en) Fact description text prediction method and device
CN110569500A (en) Text semantic recognition method and device, computer equipment and storage medium
CN111191457B (en) Natural language semantic recognition method, device, computer equipment and storage medium
US11062092B2 (en) Few-shot language model training and implementation
JP2005158010A (en) Apparatus, method and program for classification evaluation
CN111611374A (en) Corpus expansion method and device, electronic equipment and storage medium
CN101138001A (en) Learning processing method, learning processing device, and program
CN107329954B (en) Topic detection method based on document content and mutual relation
JP6738769B2 (en) Sentence pair classification device, sentence pair classification learning device, method, and program
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN112106040A (en) Event prediction device, prediction model generation device, and event prediction program
CN111611807A (en) Keyword extraction method and device based on neural network and electronic equipment
WO2019085332A1 (en) Financial data analysis method, application server, and computer readable storage medium
KR102666635B1 (en) User equipment, method, and recording medium for creating recommendation keyword
JP2020521408A (en) Computerized method of data compression and analysis
CN113449084A (en) Relationship extraction method based on graph convolution
CN112632256A (en) Information query method and device based on question-answering system, computer equipment and medium
Schofield et al. Identifying hate speech in social media
CN113239668B (en) Keyword intelligent extraction method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant