CN113220874A

CN113220874A - Multi-label text classification method and system

Info

Publication number: CN113220874A
Application number: CN202110272724.1A
Authority: CN
Inventors: 解福; 郑兴芳; 徐传杰
Original assignee: Shandong Normal University
Current assignee: Shandong Normal University
Priority date: 2021-03-13
Filing date: 2021-03-13
Publication date: 2021-08-06
Anticipated expiration: 2041-03-13
Also published as: CN113220874B

Abstract

The present disclosure provides a multi-label text classification method and system, which combines a neural convolutional network and a self-attention mechanism as an encoder, and designs a novel decoder to decode and generate a label sequence, and the proposed method not only fully considers interpretable fine information in the source text, but also effectively utilizes the information to generate the label sequence. When the label is predicted, the global information and the local information can be effectively combined, the accuracy of label prediction is improved, and the accurate classification of the multi-label text is further realized.

Description

Multi-label text classification method and system

Technical Field

The disclosure belongs to the technical field of computer processing, and particularly relates to a multi-label text classification method and system.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

With the advent of the big data age, although we are constantly alerted whether private information is leaked, convenience of life is brought about thereby. Advertisement recommendation, search optimization, text summarization and the like are convenient for you all the time. In natural language processing, multi-label text classification is a very complex task. Because some labels often appear highly correlated, classification is made more difficult. A typical example is when we see a financial news item, you may see "fund", "stock", "bond" and so on with similar looking terms, which is often difficult to distinguish.

On the basis of the above, many feasible methods have been tried, and certain results have been achieved. However, the inventors have found that in some earlier approaches they tend to suffer from some disadvantages: the supportable data set is small, slow, algorithm complex, etc. With the continuous development of modern computer technology, the appearance of the neural network provides a more novel way for solving the problem by using a convolutional neural network model. However, these methods fail to fully consider that we obtain tag dependencies and interpretive semantics from the source file and are not friendly enough for data sets that are orders of magnitude smaller.

Disclosure of Invention

In order to solve the above problems, the present disclosure provides a method and a system for classifying multi-label texts, in which local and global features of a text sequence are obtained through a convolutional neural network and an attention mechanism, so that prediction accuracy of text sequence labels is effectively improved, and accuracy of multi-label text classification is further improved.

According to a first aspect of the embodiments of the present disclosure, there is provided a multi-label text classification method, including:

determining a label space in advance according to text content;

performing word segmentation on a text to be classified to obtain a text sequence;

embedding a position vector in the text sequence, inputting the position vector into a trained multi-label text classification model, and outputting label prediction of the text sequence;

the multi-label text classification model comprises an encoder and a decoder, the text sequence respectively obtains local information and global information of the text sequence through a convolution block in the encoder and an attention mechanism, and the combined local information and global information are decoded through the decoder to obtain a label prediction result.

Further, a convolution block in the encoder adopts a one-dimensional convolution and a nonlinear activation function, wherein the width of a convolution kernel of the one-dimensional convolution is the same as the number of words of a text sequence, and meanwhile, in order to obtain higher-level local information, a stacked network is selected and used, and the residual connection is added to the block output; the nonlinear activation function adopts a gate control linear unit, and a gate control mechanism of convolution output is realized through the gate control linear unit.

Furthermore, in consideration of the correlation among the labels, the label result predicted at the previous moment is used for predicting the current label, and the decoder adopts a long-time memory neural network as a basic cycle unit to decode the sequence so as to obtain the final predicted label.

According to a second aspect of the embodiments of the present disclosure, there is provided a multi-label text classification system, including:

a tag space acquisition unit for determining a tag space in advance from text contents;

the text sequence acquisition unit is used for segmenting words of a text to be classified to obtain a text sequence;

a label prediction unit for embedding a position vector in the text sequence, inputting the position vector into a trained multi-label text classification model, and outputting a label prediction of the text sequence;

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the memory, wherein the processor implements the multi-label text classification method when executing the program.

According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a multi-tag text classification method as described.

Compared with the prior art, the beneficial effect of this disclosure is:

(1) according to the scheme, the local information and the global information of the text sequence are respectively acquired through the convolution block and the self-attention mechanism in the encoder, and the accuracy of text characteristic information extraction is guaranteed through the combination of the local information and the global information, so that the accuracy of text label prediction is improved.

(2) The scheme of the disclosure provides a multi-label text classification model, which can accurately extract information and effectively convert a text sequence into a multi-label sequence.

Advantages of additional aspects of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, are incorporated in and constitute a part of this disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure.

Fig. 1 is a diagram of a multi-label text classification model structure according to a first embodiment of the disclosure.

Detailed Description

The present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular is intended to include the plural unless the context clearly dictates otherwise, and it should be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of features, steps, operations, devices, components, and/or combinations thereof.

The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

The first embodiment is as follows:

the embodiment aims to provide a multi-label text classification method.

A multi-label text classification method comprises the following steps:

determining a label space in advance according to text content;

Specifically, for ease of understanding, the embodiments of the present disclosure are described in detail below with reference to the accompanying drawings:

basic definition of (A)

First we shall explicitly define some symbols that define the multi-label text classification. Assume that there are N tags in a tag space, which is Y ═ Y₁,y₂......,y_n}. Yet another text sequence is X. What I want to satisfy is to assign a set of labels y to a text sequence X with the goal of findingAnd (d) the optimized target tag sequence y with the maximum conditional probability p (y | x). The formula is as follows:

suppose there are L words in the text sequence X, whose composition is (w)₁,w₂,....,w_l). From the embedding matrix E ∈ R^d ^×|V|To obtain a word representation matrix e ═ (e)₁,e₂,...,e_L)，e_i∈R^d. Where d is the dimension of the embedded vector, | v | is the size of the word quantity; the embedded matrix is obtained by vectorizing a text by using word2 vec;

as shown in fig. 1, the overall structure of the multi-label text classification model in the solution of the present disclosure is shown. First step local information h is first obtained in the encoder using the rolling block and self-attention mechanism^lAnd full text information h^g. Before the text sequence X is input, we first embed the position vector p ═ p (p)₁,p₂,...,p_L), p_i∈R^dThe position relation of the words is fully analyzed, the position vectors are used for recording the position information of the words, the position information of the words can be added into the embedded position vectors, and the position information of some words is very important, so that the position information is embedded, and the accuracy of text classification is further improved. The text sequence that is actually input to the encoder should be X ═ (e)₁+p₁,e₂+p₂,...,e_L+p_L). After which the decoder starts decoding. After the sequence is coded, a convolutional neural network is used for acquiring local characteristic information of the sequence, a self-attention mechanism is used for acquiring global information of the sequence, then the acquired local information and the global information are combined, and finally a decoder is used for decoding so as to complete the prediction of the label.

(II) model construction

(1) Encoder for encoding a video signal

A fixed size input is used in the convolutional neural network and an output of constant size is obtained. This means that the convolution block will scan the text sequence using a kernel of fixed size to obtain local information. The self-attention mechanism is beneficial to the correlation between contexts and can solve the problem of full-text dependence. The convolutional network and the self-attention mechanism use the same original text embedding vector at the same time. Finally output h^lAnd h^g。

(1.1) convolution Block

According to long-term research experience, we choose to use one-dimensional convolution and a non-linear activation function, which can intensively process fewer words and filter out irrelevant information in the input text sequence by fully utilizing the information, which is very suitable for fully utilizing the input text sequence.

We set the convolution kernel width of the one-dimensional convolution to k. And the input sequence X belongs to R^k×dThis sequence consists of k consecutive words. To be able to get a higher level of local information we choose to use a stacked network and add the remaining connections to the block output. In the non-linear activation function we choose to use gated linear cells. This function implements a simple gating mechanism on the output of the convolution, where A, B, V ([ A | B |)])∈R^d. The output is [ A | B]∈R^2d，

Is a point-by-point multiplication.

Where a denotes the currently embedded text, B denotes a convolution kernel, and V ([ a | B ]) denotes the result obtained by convolving the text vector.

Furthermore, to ensure an even distribution of the input data. We use layer normalization to control the distribution and variance of the input sequence, the output of the ith convolution block is calculated as follows:

herein, w^l∈R^2d×k^dIs a weight parameter, hli is local information around the ith word. Since the information obtained by the convolution block is further away from the headword as the depth of the text increases. But we need advanced information around the ith word. The width of the convolution kernel of the volume block is set to be monotonically decreasing. This may reduce interference from information from more distant contexts, and also reduce the number of parameters and computational complexity.

(1.2) self-attention mechanism

The solution described in this disclosure selects a multi-headed self-attentive mechanism, which, in short, is a multiple addition of self-attentive mechanisms. Here we set the number of heads to M. Wherein d is_v、d_k、d_qValues, keys, and depth of queries are represented, respectively, and we can further represent values, keys, and depth of queries on the mth head. For a given input sequence X ∈ R^l×dThe multi-head self-attention output is calculated as follows:

MHA(X)＝Concat[head₁,head₂,…,head_M]W_O (6)

wherein the projection is a parametric matrix

Similar to the volume blocks in a neural network. The same operation is performed for the self-attention module, so we can get the formula for the g-th self-attention module:

h^G＝MHA(h^g-1)+h^g-1 (7)

finally we combine the obtained H and H to get the final state H of the encoder:

H＝h^l⊙h^G (8)

wherein |, indicates a dot product.

(2) Decoder

In this section, the sequence is decoded using LSTM as the basic cyclic unit to obtain the final predicted label. And in consideration of the correlation among the labels, using the label result predicted at the previous moment in the prediction of the current label.

S_tIs the hidden state at t, since we need the hidden state at t-1 here, the calculation method is as follows:

s_t-1＝LSTM(s_t-1,y_t-2) (9)

in (9) y_t-2Is the predictive tag at time t-2. After s is obtained, the final label prediction is completed by passing it through the softmax layer to obtain y:

y_t＝soft max(s_t-1+U_t-1) (10)

in the above formula, U_t∈R^NIs a mask vector for preventing the prediction of duplicate tags, and there are two possibilities that the current tag is given an infinitesimal value when it appears before, e.g. no over-assignment of 0 occurs. f is the nonlinear activation function.

(U_t)_nTag of ∞ previous prediction (11)

(U_t)_nOther than 0 (12)

Example two:

the embodiment aims to provide a multi-label text classification system.

A multi-label text classification system comprising:

The scheme of the disclosure considers multi-label text classification from the viewpoint of label sequence generation, and continues to use a sequence-sequence model in a basic form (namely from a text sequence to a multi-label sequence), provides a novel multi-label text classification model which is designed by extracting information from the viewpoints of whole office and local and comprises a coding block and a decoding block with a self-attention mechanism. The encoder is mainly composed of a neural convolution network and a self-attention mechanism, and the decoder is composed of an LSTM as a basic unit and the self-attention mechanism. The method comprises the steps of firstly, acquiring local information of a sequence by using a convolutional neural network, acquiring global information of the sequence by using a self-attention mechanism, then combining the acquired global information with the local information, and finally, completing the prediction of a label by using a decoder.

In further embodiments, there is also provided:

an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, implement the method of the first embodiment. For brevity, no further description is provided herein.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits asic, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.

A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of embodiment one.

The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.

Those of ordinary skill in the art will appreciate that the elements of the various examples, i.e., the algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or in combination with computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The multi-label text classification method and the multi-label text classification system can be realized and have wide application prospects.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims

1. A multi-label text classification method is characterized by comprising the following steps:

determining a label space in advance according to text content;

2. The method of claim 1, wherein the convolution block in the encoder employs a one-dimensional convolution and a non-linear activation function, wherein the width of the convolution kernel of the one-dimensional convolution is the same as the number of words in the text sequence, and a stacking network is selected for obtaining higher level local information, and the remaining connections are added to the block output; the nonlinear activation function adopts a gate control linear unit, and a gate control mechanism of convolution output is realized through the gate control linear unit.

3. The method of claim 1, wherein in order to ensure uniform distribution of input data, distribution and variance of input sequences are controlled using layer normalization by setting width of convolution kernel of convolution block to be monotonically decreasing form.

4. The method as claimed in claim 1, wherein the self-attention mechanism in the encoder is a multi-head self-attention mechanism, that is, a plurality of self-attention mechanisms are superimposed.

5. The method as claimed in claim 1, wherein the predicted label result from the previous time is used to predict the current label in consideration of the correlation between labels, and the decoder uses a long-and-short memory neural network as a basic cyclic unit to decode the sequence to obtain the final predicted label.

6. A multi-label text classification system, comprising:

the label prediction unit is used for embedding a position vector in the text sequence, inputting the position vector into a trained multi-label text classification model and outputting label prediction of the text sequence;

7. The multi-label text classification system according to claim 6, characterized in that the convolution block in the encoder uses a one-dimensional convolution and a non-linear activation function, wherein the width of the convolution kernel of the one-dimensional convolution is the same as the number of words in the text sequence, and in order to obtain higher level of local information, the stacked network is selected and the remaining connections are added to the block output; the nonlinear activation function adopts a gate control linear unit, and a gate control mechanism of convolution output is realized through the gate control linear unit.

8. The system of claim 6, wherein the predicted label result from the previous time is used to predict the current label in consideration of the correlation between labels, and the decoder uses a long-time memory neural network as a basic cyclic unit to decode the sequence to obtain the final predicted label.

9. An electronic device comprising a memory, a processor and a computer program stored and executed on the memory, wherein the processor implements a multi-label text classification method according to any one of claims 1 to 5 when executing the program.

10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a multi-label text classification method according to any one of claims 1 to 5.