CN108334499B

CN108334499B - Text label labeling device and method and computing device

Info

Publication number: CN108334499B
Application number: CN201810129331.3A
Authority: CN
Inventors: 郭龙; 张东祥; 陈李江
Original assignee: Hainan Yunjiang Technology Co ltd
Current assignee: Hainan Avanti Technology Co ltd
Priority date: 2018-02-08
Filing date: 2018-02-08
Publication date: 2022-03-18
Anticipated expiration: 2038-02-08
Also published as: CN108334499A

Abstract

The invention discloses a text label labeling device, which is used for labeling a text label, and comprises: the input module is suitable for receiving text input and converting and outputting the text into a vector matrix; the convolutional neural network module is connected with the input module and is suitable for outputting the local semantic features of the text according to the vector matrix; the cyclic neural network module is connected with the input module and is suitable for outputting the long-distance semantic features of the text according to the vector matrix; the attention model module is connected with the convolutional neural network module and the cyclic neural network module and is suitable for outputting the weight of each single character in the text according to the local semantic features and the long-distance semantic features; and the output module is connected with the attention model module and is suitable for receiving the weight of each single character in the text and outputting the text label and the probability of each label. The invention also discloses a training method of the text label labeling equipment, and a corresponding text label labeling method and corresponding computing equipment.

Description

Text label labeling device and method and computing device

Technical Field

The invention relates to the field of text data analysis, in particular to text label labeling equipment, a training method, a text label labeling method and computing equipment.

Background

With the development of computer and internet technology, the practice and examination questions in primary and secondary education, even university education, are electronically stored and uploaded to the network for students to use. The number of topics will become larger and larger as time goes on. Since each topic relates to a specific knowledge point and has a specific difficulty, it becomes very difficult to find a topic covering some knowledge points and having a specific individual from a huge amount of topics. The current common solutions are: the teacher and the instructor manually label the question to specify which knowledge points the question corresponds to. However, this increases the working strength of the teacher, and is time-consuming, labor-consuming and inefficient.

Therefore, there is a need for an artificial intelligence tagging technique for automatically tagging subject labels.

Disclosure of Invention

In view of the above problems, the present invention provides a text label labeling apparatus and training method, a text label labeling method and a computing apparatus, which aim to solve or at least solve the above problems.

According to an aspect of the present invention, there is provided a text label labeling apparatus for labeling a text label, the apparatus comprising: the input module is suitable for receiving text input and converting and outputting the text into a vector matrix; the convolutional neural network module is connected with the input module and is suitable for outputting the local semantic features of the text according to the vector matrix; the cyclic neural network module is connected with the input module and is suitable for outputting the long-distance semantic features of the text according to the vector matrix; the attention model module is connected with the convolutional neural network module and the cyclic neural network module and is suitable for outputting the weight of each single character in the text according to the local semantic features and the long-distance semantic features; and the output module is connected with the attention model module and is suitable for receiving the weight of each single character in the text and outputting text labels and the probability of each label.

Optionally, in the text label labeling apparatus according to the present invention, the convolutional neural network module includes: the first input layer is suitable for receiving the vector matrix output by the input module; the convolution layers are respectively connected with the first input layer in parallel and are suitable for performing convolution operation on the vector matrix to obtain a plurality of characteristic vectors; the first pooling layer is connected with the plurality of convolution layers and is suitable for pooling the plurality of characteristic vectors and outputting a pooling result; and the first full connection layer is connected with the first pooling layer and is suitable for performing dimensionality reduction operation on the pooling result to obtain the output of the convolutional neural network module, and the output represents the local semantic features of the text.

Optionally, in the text label labeling apparatus according to the present invention, a plurality of convolution layers are adapted to perform convolution operation on the vector matrix at the same time, each convolution layer obtains one feature vector, and a numerical type included in each feature vector is a floating point decimal; the first pooling layer is suitable for respectively extracting the maximum floating point decimal number in each feature vector to form a multi-dimensional vector.

Optionally, in the text label labeling apparatus according to the present invention, an input dimension of the convolutional neural network module is w × h, h is a height of the input text matrix, w is a width of the input text matrix, and an output dimension is 200; the convolutional neural network module comprises convolution layers with 3 different convolutional kernel sizes, wherein the convolutional kernels of the first convolutional layer, the second convolutional layer and the third convolutional layer are respectively 3x h, 4x h and 5 x h, and each convolution layer comprises 256 feature maps; the output vector dimension of the first pooling layer is 768, the weight parameter dimension of the first fully-connected layer is 768 × 200, and the output vector dimension is 200.

Optionally, in the text label labeling apparatus according to the present invention, the recurrent neural network module includes: the second input layer is suitable for receiving the vector matrix output by the input module; the hidden layer is connected with the second input layer and is suitable for representing the word vector of each single word in the text as a new form vector formed by connecting the word vector with a forward context vector and a backward context vector; the second pooling layer is connected with the hidden layer and is suitable for pooling the new-form vectors of all the single characters and outputting a pooling result; and the second full connection layer is connected with the second pooling layer and is suitable for performing dimensionality reduction operation on the pooling result to obtain the output of the recurrent neural network module, and the output represents the long-distance semantic features of the text.

Optionally, in the text label labeling apparatus according to the present invention, the hidden layer employs a bidirectional LSTM long-term memory network or a bidirectional GRU hidden unit; the second pooling layer is adapted to retain the maximum value in the corresponding columns of all word vectors to obtain a one-dimensional vector of fixed length.

Optionally, in the text label labeling apparatus according to the present invention, the new form vector xi of the current single word obtained by using the bidirectional LSTM is:

x_i＝[c_l(w_i)；e(w_i)；c_r(w_i)]

c_l(w_i)＝f(W^(l)c_l(w_i-1)+W^(sl)e(w_i-1))

c_r(w_i)＝f(W^(r)c_r(w_i+1)+W^(sr)e(w_i+1))

wherein, c_l(w_i) And c_r(w_i) Respectively representing the output of the current LSTM cell, c_l(w_i-1) And c_r(w_i+1) Representing the outputs of the preceding and succeeding LSTM cells, e (w), respectively_i-1) And e (w)_i+1) Word-embedding vectors, W, representing the previous and the next individual word, respectively^(l)And W^(r)Representing the weights of the preceding and succeeding LSTM elements, W, respectively^(sl)And W^(sr)Representing the weights of the previous and next word embedding vectors, respectively, and f represents the activation function.

Optionally, in the text label labeling apparatus according to the present invention, the input of the attention model module includes a context input and an n-dimensional sequence input y₁,y₂,…,y_nWherein the context input is the output p of the convolutional neural network module, the n-dimensional sequence input is the output h of the recurrent neural network module, wherein y_iFor outputting a one-dimensional vector in h, the attention model module is adapted to calculate the vector y_iSimilarity with the contextual input to calculate the input y_iThe weight of (c).

Optionally, in the text label labeling apparatus according to the present invention, the attention model module is adapted to calculate the vector y by a dot product operation_iSimilarity with the context input to obtain a similarity vector u_iAnd calculating u by softmax function_iCorresponding weight vector a_i。

Alternatively, in the text label labeling apparatus according to the present invention, the output Z of the attention model module is adapted to be calculated according to the following formula:

u＝tanh(W_hh+b_h)

Z＝ah

wherein u is a vector obtained by performing nonlinear conversion on an input h through an activation function, a is a weight vector obtained by vector dot product operation and softmax function operation, tanh is the activation function, W_hAnd b_hRespectively, represents the weight parameter in the activation function, T represents a time step or a step in a sentence sequence, and T is the inversion of the matrix.

Alternatively, in the text labeling apparatus according to the present invention, the weight of each single word in the output of the attention model module is expressed in floating point decimals and the sum of all the floating point decimals is 1.

Optionally, in the text label labeling apparatus according to the present invention, the output module performs classification using a softmax classifier, which takes the output of the attention model module as input, calculates the probability of the text sequence on each label, and selects the label with the highest probability as the correct label of the text.

Optionally, in the text label labeling apparatus according to the present invention, the text is title text.

Optionally, in the text label labeling apparatus according to the present invention, the input module is adapted to: collecting subject texts of each subject, and training a word embedding corpus for each subject; and converting each single word in the subject text of the corresponding subject into a word vector according to the word embedding corpus of each subject, thereby converting the subject text into a vector matrix.

Optionally, in the text labeling apparatus according to the present invention, the input module is adapted to convert each single word in the text into a one-dimensional floating-point vector, thereby converting the text into a two-dimensional floating-point matrix.

According to a further aspect of the present invention, there is provided a training method for a text labeling apparatus, adapted to train the text labeling apparatus as described above, executed in a computing apparatus, the method comprising the steps of: collecting a training sample set, wherein each sample in the training sample set comprises a text and one or more labels preset for the text; and inputting the training sample set into text label labeling equipment to obtain a predicted label of each text, and performing iterative training on the text label labeling equipment according to the predicted label and a text actual label corresponding to the predicted label to obtain the trained text label labeling equipment.

According to a further aspect of the present invention, there is provided a text label labeling method, adapted to be executed in a computing device, in which a trained text label labeling device is stored, the text label labeling device being trained by the training method of the text label labeling device as described above, the text label labeling method including the steps of: acquiring a text to be marked; and inputting the text into the trained text label labeling equipment to obtain one or more labels of the text and the probability of each label.

According to yet another aspect of the invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be executed by the at least one processor, the program instructions comprising instructions for performing the training method and/or the text labeling method of the text labeling apparatus as described above.

According to yet another aspect of the present invention, there is provided a readable storage medium storing program instructions which, when read and executed by a computing device, cause the computing device to perform the training method and/or the text label labeling method of the text label labeling device as described above.

According to the technical scheme of the invention, the convolutional neural network CNN model can well capture the local semantic features of a sentence for text classification, but is limited by the fixed visual field of the convolutional layer, and the long-distance semantic dependence and the sequence of the sentence cannot be considered. The recurrent neural network RNN model can model longer sequence information, but the recurrent neural network model can only treat words in a text sequence equally and cannot distinguish the importance of different words. The invention provides a CNN-RNN mixed model CRAN based on an attention model based on respective advantages and disadvantages of a convolutional neural network model and a cyclic neural network model. The convolutional neural network model and the cyclic neural network model are effectively combined by using an attention model mechanism, so that the advantages of the two models can be simultaneously kept, the defects of the two models in processing the text sequence are overcome, and a better classification effect can be achieved compared with the existing deep learning text classification model.

Drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.

FIG. 1 shows a block diagram of a computing device 100, according to one embodiment of the invention;

FIG. 2 is a block diagram of a text label labeling apparatus 200 according to an embodiment of the present invention;

FIG. 3 is a detailed block diagram of a text label labeling apparatus according to an embodiment of the present invention;

FIG. 4 shows a block diagram of the convolutional neural network module 220, according to one embodiment of the present invention;

FIG. 5 shows a block diagram of the recurrent neural network module 230, according to one embodiment of the present invention;

FIG. 6 illustrates a schematic structural diagram of an attention model module 240 according to one embodiment of the present invention;

FIG. 7 illustrates a flow diagram of a method 700 of training a text label labeling apparatus according to one embodiment of the present invention; and

FIG. 8 is a flowchart of a method 800 for labeling text labels according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a block diagram of an example computing device 100. In a basic configuration 102, computing device 100 typically includes system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.

Depending on the desired configuration, the processor 104 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. The processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations the memory controller 118 may be an internal part of the processor 104.

Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some embodiments, application 122 may be arranged to operate with program data 124 on an operating system. The program data 124 includes instructions, and in the computing device 100 according to the present invention, the program data 124 includes instructions for performing the training method 700 and/or the text label labeling method 800 of the text label labeling device.

Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164.

A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.

Computing device 100 may be implemented as a server, such as a file server, a database server, an application server, a WEB server, etc., or as part of a small-form factor portable (or mobile) electronic device, such as a cellular telephone, a Personal Digital Assistant (PDA), a personal media player device, a wireless WEB-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Computing device 100 may also be implemented as a personal computer including both desktop and notebook computer configurations. In some embodiments, the computing device 100 is configured to perform the training method 800 and/or the text label labeling method 800 of the text label labeling device according to the present invention.

Fig. 2 shows a schematic structural diagram of a text label labeling apparatus 200 according to an embodiment of the present invention, and fig. 3 shows a detailed structural diagram of the text label labeling apparatus. As shown in fig. 2 and 3, the model includes an input module 210, a convolutional neural network module 220, a recurrent neural network module 230, an attention model module 240, and an output module 250. The equipment is mainly based on a Hybrid Attention Neural Network (CRAN, Hybrid CNN-RNN Attention-based Neural Network), and a Convolutional Neural Network (CNN) model and a Recurrent Neural Network (RNN) model are fused through an Attention model mechanism, so that the advantages of the two models are integrated, and the defects of the two models are made up.

In particular, the input module 210 is adapted to receive text input, which may be topic text, and to convert the text output as a vector matrix. Generally, the input module 210 may convert the text sequence into a high-dimensional space vector matrix through a currently popular word embedding technique, and of course, other text vector conversion methods may also be adopted, which is not limited in the present invention. Further, the input module 210 may convert each single word in the text into a one-dimensional floating-point vector, thereby converting the text into a two-dimensional floating-point matrix (two-dimensional value matrix), where each circle in fig. 3 represents a floating-point type element.

According to one embodiment, for the topic text, the input module 210 may collect the topic text of each subject, train a word embedding corpus for each subject, and convert each single word in the topic text of the corresponding subject into a word vector according to the word embedding corpus of each subject, thereby converting the topic text into a vector matrix. Specifically, for each subject, all subject texts belonging to the subject in the subject library are collected first, and then the text set is input into a word embedding model (such as word2vec model developed by google, though not limited thereto) to obtain a word embedding corpus. Thus, for each word in a question, the word can be converted into a one-dimensional floating-point vector, such as [0.14, 0.52.,. 0.23], using the corresponding word embedding model, and finally the question is converted into a two-dimensional floating-point matrix.

For example, one topic in the known topic library is as follows: known as 2x²The value of +3x +1 is 10, then the algebraic expression 4x²The value of +6x +1 is. The input module 210 converts each individual word in the title (including single chinese character, english alphabet, numeral or punctuation mark, etc.) into a one-dimensional vector of fixed length (the length can be freely specified, such as 300) using word embedding technology. For example, the word "has" is converted to [0.6425,0.5252,0.1923]. In this way, the topic is converted into a two-dimensional matrix with the size of m × n, where m is the number of words of the topic (where m equals 32) and n is the length of the one-dimensional vector (where n equals 300).

The convolutional neural network module 220 is connected to the input module 210 and is adapted to output local semantic features of the text according to the vector matrix, and its main purpose is to extract a phrase sequence with most abundant semantics from the input text matrix.

According to one embodiment, as shown in fig. 4, the convolutional neural network module 220 may include a first input layer 221, a plurality of convolutional layers 222, a first pooling layer 223, and a first fully-connected layer 224. The first input layer 221 is adapted to receive the vector matrix output by the input module. The convolution layers 222 are connected to the first input layer 221 in parallel, and are adapted to perform convolution operation on the vector matrix to obtain a plurality of feature vectors. Further, the plurality of convolutional layers 222 are adapted to perform convolution operation on the vector matrix simultaneously, each convolutional layer obtains a feature vector, each feature vector represents a feature of the text, and finally the N convolutional layers can obtain N vectors corresponding to the N features of the text. Typically, the feature vector contains numerical types that are floating point decimal numbers.

The first pooling layer 223 is connected to the plurality of convolutional layers 222, and is adapted to perform a pooling operation on the plurality of feature vectors and output a pooling result. Wherein the pooling operation may be maximal pooling, where the first pooling layer takes the largest fractional value in each vector to form an N-dimensional vector. Of course, the pooling operation may be an average pooling, which is not a limitation of the present invention. Further, the first pooling layer 223 is adapted to extract the maximum floating point decimal in each feature vector separately, forming a multi-dimensional vector. The first fully connected layer 224 is connected to the first pooling layer 223 and is adapted to perform a dimensionality reduction operation on the pooling result to obtain an output of the convolutional neural network module 220, where the output represents a local semantic feature of the text.

That is, the convolutional neural network module 220 performs a convolution operation on the text sequence, obtains a one-dimensional vector with a fixed dimension through a maximum pooling operation, and finally adjusts the dimension of the vector through a full connection layer to obtain an output p of the CNN layer.

The specific structure and parameters of the convolutional neural network can be set by those skilled in the art according to the needs. According to one embodiment, the convolutional neural network comprises convolutional layers of 3 different convolutional kernel sizes, wherein the convolutional kernel size of the first convolutional layer is 3 × h, h is the height of the input text matrix and contains 256 feature maps, the convolutional kernel size of the second convolutional layer is 4 × h and contains 256 feature maps, and the convolutional kernel size of the third convolutional layer is 5 × h and contains 256 feature maps. The output vector dimension of the first pooling layer is 768, the weight parameter dimension of the first fully-connected layer is 768 × 200, and the output vector dimension is 200. The input dimension of the whole convolutional neural network is w x h, w is the width of the input text matrix, and the output dimension is 200.

It should be appreciated that if a classifier layer (e.g., softmax classifier) is added after the full-link layer in the convolutional neural network module 220, a C-dimensional vector can be obtained by transferring the N-dimensional vector output by the pooling layer to the full-link + softmax layer, where C represents the number of text labels. Each element in the vector corresponds to a label, and the numerical size of the label represents the probability that the text belongs to the label, so that the first labels with the highest probability can be selected by a person skilled in the art as the labels of the text. Although this method can also output text labels and label probabilities, it can only capture local semantic features of sentences during text classification, is limited by the fixed view of the convolutional layer, and cannot consider long-distance semantic dependence and orderliness of sentences, which have a crucial role in text classification. Therefore, an accurate classification result cannot be obtained only by using the convolutional neural network method, and the recurrent neural network (such as the text recurrent neural network TextRNN) can better make up for the defects of the convolutional neural network in long-distance feature capture, so that a better classification effect is obtained. Therefore, the text label labeling apparatus 200 of the present invention further adds the recurrent neural network module 230 to perform more comprehensive feature output.

In particular, the recurrent neural network module 230 is also connected to the input module 210 and is adapted to output long-distance semantic features of the text according to the vector matrix. The recurrent neural network module 230 also uses word embedding techniques to convert text into word vectors as feature inputs, with the primary purpose of extracting sequential and long-range semantic dependent features from the input text sequence.

According to one embodiment, as shown in fig. 5, the recurrent neural network module 230 may include a second input layer 231, a hidden layer 232, a second pooling layer 233, and a second fully-connected layer 234. The second input layer 231 is adapted to receive the vector matrix output by the input module 210. The hidden layer 232 is connected to the second input layer 231 and is adapted to represent the word vector of each individual word in the text as a new form vector in which the word vector is connected to a forward and backward context vector. In general, the hidden layer 232 may employ a bidirectional LSTM long-term memory network or bidirectional GRU hidden units. The second pooling layer 233 is connected to the hidden layer 232 and is adapted to perform pooling operations on the new form vectors of all the words and output pooled results. In general, the second pooling layer 233 is adapted to retain the maximum value in the corresponding columns of all word vectors to obtain a one-dimensional vector of fixed length. The second fully connected layer 234 is connected to the second pooling layer 233 and is adapted to perform a dimensionality reduction operation on the pooled result to obtain an output h of the recurrent neural network module 230, which represents the long-distance semantic features of the text. That is, the recurrent neural network module 230 first extracts context information of the text sequence through the hidden layer, then connects the context information and passes through a full connection layer, and finally obtains an output of the RNN layer.

The specific structure and parameters of the recurrent neural network can be set by those skilled in the art according to the needs. According to one embodiment, the hidden unit layer dimensions of the forward GRU and the backward GRU in the recurrent neural network are both set to 200, the input dimension is w h (w is the width of the input text matrix, h is the height of the input text matrix), and the output dimension is w 200.

According to another embodiment, to better capture long-range semantic dependencies and orderliness in text, the recurrent neural network module 230 can use bi-directional LSTM to get forward and backward context representations of text:

c_l(w_i)＝f(W^(l)c_l(w_i-1)+W^(sl)e(w_i-1))

c_r(w_i)＝f(W^(r)c_r(w_i+1)+W^(sr)e(w_i+1))

wherein, c_l(w_i) And c_r(w_i) Respectively representing the output of the current LSTM cell, c_l(w_i-1) And c_r(w_i+1) Representing the outputs of the preceding and succeeding LSTM cells, e (w), respectively_i-1) And e (w)_i+1) Word-embedding vectors, W, representing the previous and the next individual word, respectively^(l)And W^(r)Representing the weights of the preceding and succeeding LSTM elements, W, respectively^(sl)And W^(sr)Respectively represent the weights of the previous and next word embedding vectors, and f represents an activation function, such as a sigmoid function or a tanh activation function, although not limited thereto. That is, an LSTM cell has two inputs, one of which is the output of the previous LSTM cell and the other of which is the word-embedded representation of the current word, while producing an output. The first formula considers the previous word of the current word to generate the corresponding feature vector, and the second formula considers the next word of the current word to generate the corresponding feature vector. Such a single word representation becomes a concatenation of the single word embedding vector and the forward and backward context vectorsI.e. the new form vector x of the current word obtained at this time_iComprises the following steps:

x_i＝[c_l(w_i)；e(w_i)；c_r(w_i)]

and then, the maximum value in the corresponding columns of all the word vectors is reserved through a maximum pooling layer to obtain a one-dimensional vector with a fixed size. For example, if three vectors corresponding to a text of length 3 are [0.13,0.24,0.69], [0.04,0.29,0.23] and [0.88,0.23,0.28], respectively, then after the max pooling operation, a one-dimensional vector of length 3 [0.88,0.29,0.69] is obtained, where each element is the maximum value of the corresponding position of the three vectors. The pooling operation may enable the output of a fixed length vector for any length of text, since the length of the vector is dependent only on the length of the word embedding vector and the context vector, and not on the length of the text.

It should also be understood that text classification may also be performed if a classifier layer (e.g., softmax classifier) is added after the fully connected layer in the recurrent neural network module 230. Although the classification method can model longer sequence information, the classification method can only treat words in the text sequence equally and cannot distinguish the importance of different words. The present invention thus integrates the recurrent neural network module 230(RNN Layer) and the convolutional neural network module 220(CNN Layer) and effectively combines the outputs of the two together through the Attention model module 240(Attention Layer), while preserving the advantages of the two modules and making up for the respective disadvantages.

Specifically, the attention model module 240 is connected to the convolutional neural network module 220 and the cyclic neural network module 230, and is adapted to output the weight of each word in the text according to the local semantic features and the long-distance semantic features respectively output by the two modules.

According to one embodiment, as shown in FIG. 6, the attention model module 240 includes two inputs, a Context input (i.e., a Context input) and an n-dimensional sequence input y₁,y₂,…,y_nAnd returns a vector Z based on these two inputs. Wherein the n-dimensional sequence input is usuallyFor example, the word embedding matrix may be a word embedding matrix converted from text, or a two-dimensional matrix converted from the word embedding matrix after passing through the recurrent neural network module 240. The latter two-dimensional matrix can be taken in the present invention, i.e., the n-dimensional sequence input is the output h of the recurrent neural network module 240, and y_iI.e. a one-dimensional vector in the output h. In addition, the weight of each word in the vector Z of the returned output of the attention model module 240 is expressed in floating-point decimals and the sum of all floating-point decimals is 1. Each floating-point number represents the weight of a word in the corresponding text sequence, e.g., [ next, generation, number, formula ]]Corresponding vector Z is [0.05,0.05,0.3,0.3,0.3 ]]Wherein the weight of the single word "generation" is 0.3.

The attention model mechanism can give higher weight to important words in the input sequence, so as to achieve better classification effect. And what supports this mechanism is the Context input (Context input) of the attention model module 240, which is the output p of the convolutional neural network module 220.

According to one embodiment, the attention model module 240 assigns different weights to each input by calculating the relevance of the n inputs to the contextual input as directed by the contextual input. In particular, the module is adapted to calculate the vector y_iSimilarity with the contextual input to calculate the input y_iThe weight of (c). At this time, it may calculate the vector y by a dot product operation_iSimilarity with the context input to obtain a similarity vector u_iAnd calculating u by softmax function_iCorresponding weight vector a_i。

The present invention uses the output of the recurrent neural network module 230 as the n-dimensional input to the attention model module 240, so that the input represents the long-distance semantic features of the original text sequence; the convolutional neural network module 220 is used as the context input to the attention model module 240, so that the context represents the local semantic features of the original text sequence. This design may allow the attention model module 240 to preserve the text order and long-range semantic features extracted by the recurrent neural network module 230 while computing different weights for the n-dimensional input sequence through the local semantic features extracted by the convolutional neural network module 220. According to one embodiment, the output Z of the attention model module 240 is adapted to be calculated according to the following formula:

u＝tanh(W_hh+b_h)

Z＝ah

wherein u is a vector obtained by performing nonlinear conversion on an input h through an activation function, a is a weight vector obtained by vector dot product operation and softmax function operation, tanh is the activation function, W_hAnd b_hRespectively, the weight parameter in the activation function, and T represents a time step (time step), which can also be understood as a step in a sentence sequence, where T is the inverse of the matrix.

The output module 250 is connected to the attention model module 240 and adapted to receive the weight of each individual character in the text output by the attention model module 240 and output the text label and the probability of each label. The output module 250 performs classification by using a softmax classifier, which takes the output of the attention model module 240 as input, calculates the probability of the text sequence on each label, and selects the label with the highest probability as the correct label of the text. The probability is obtained by calculation through a softmax classifier, and the specific calculation formula is as follows:

wherein t represents a label; z is the output of the attention model module 240, which is equal in size to the type of label; z_iA value representing the position in the Z vector to which the tag i corresponds, Z_tA value representing the position in the Z vector to which the tag t corresponds. The formula is to weight the values in vector Z so that the resulting vector P is the probability distribution over a label. It should be understood, of course, that other classifiers may be used as the output module 250, as long as label classification can be achieved, and the present invention is applicable to all kinds of label classificationThis is not limiting.

FIG. 7 illustrates a training method 700 for a text labeling apparatus, suitable for training the text labeling apparatus 200 as described above, which may be implemented in a computing apparatus, such as the computing apparatus 100, according to an embodiment of the present invention. As shown in fig. 7, the method begins at step S720.

In step S720, a training sample set is collected, where each sample in the training sample set includes a text and one or more labels preset for the text.

Subsequently, in step S740, the training sample set is input into the text label labeling device to obtain the predicted label of each text, and the text label labeling model is iteratively trained according to the predicted label and the actual text label corresponding to the predicted label, so as to obtain the trained text label labeling device. The training method may adopt the existing conventional model training method, for example, the cross-validation method of the training set and the validation set, which is not limited by the present invention.

Fig. 8 illustrates a text label labeling method 800 according to an embodiment of the present invention, which may be executed in a computing device having a trained text label labeling apparatus stored therein, the text label labeling apparatus being trained by the training method of the text label labeling apparatus as described above.

As shown in fig. 8, the method begins at step S820. In step S820, a text to be annotated is acquired. Subsequently, in step S840, the text is input into a trained text label labeling apparatus, and one or more labels and probabilities of the labels of the text are obtained. When a plurality of labels and a plurality of probabilities exist, the labels can be displayed in a descending order according to the size of the probability value.

Therefore, for any text of the label to be labeled, the text is input into the trained text label labeling equipment, and the label to which the text belongs and the probability of each label can be output and obtained. For topic texts, when a large number of topic libraries are faced (for example, 6600 ten thousand channels), it is obviously unrealistic to manually label the topic texts one by one, and a good text classification effect cannot be achieved even if a CNN model or an RNN model is used alone. The CNN-RNN mixed model CRAN based on the Attention model effectively combines the CNN model and the RNN model by using the Attention mechanism, can simultaneously reserve the advantages of the CNN and the RNN, and make up the defects of the CNN and the RNN in processing text sequences, thereby obtaining better classification effect compared with the existing deep learning text classification model.

A7, the text label labeling apparatus of A6, wherein the new form vector x of the current word is obtained by bidirectional LSTM_iComprises the following steps:

x_i＝[c_l(w_i)；e(w_i)；c_r(w_i)]

c_l(w_i)＝f(W^(l)c_l(w_i-1)+W^(sl)e(w_i-1))

c_r(w_i)＝f(W^(r)c_r(w_i+1)+W^(sr)e(w_i+1))

A8, the text label labeling apparatus of any one of A1-A7, wherein the inputs of the attention model module include a context input and an n-dimensional sequence input y₁,y₂,…,y_nWherein the context input is an output p of the convolutional neural network module, the n-dimensional sequence input is an output h of the recurrent neural network module, where y_iIs a one-dimensional vector in the output h, the attention model module is adapted to calculate the vector y_iAnd contextual inputThe similarity between the input y and the input y is calculated_iThe weight of (c).

A9, the text label labeling apparatus of A8, wherein the attention model module is adapted to calculate the vector y by dot product operation_iSimilarity with the context input to obtain a similarity vector u_iAnd calculating u by softmax function_iCorresponding weight vector a_i。

A10, the text label labeling apparatus of A9, wherein the output Z of the attention model module is adapted to be calculated according to the formula:

u＝tanh(W_hh+b_h)

Z＝ah

A11, the text label labeling apparatus of A1, wherein the weight of each single word in the output of the attention model module is expressed in floating point decimal and the sum of all floating point decimals is 1.

A12, the text label labeling apparatus according to any one of A1-A11, wherein the output module classifies the text sequence by using a softmax classifier, the classifier takes the output of the attention model module as input, calculates the probability of the text sequence on each label, and selects the label with the highest probability as the correct label of the text.

A13, the text label labeling apparatus of A1, wherein the text is title text.

A14, the text label labeling apparatus of A13, wherein the input module is adapted to: collecting subject texts of each subject, and training a word embedding corpus for each subject; and converting each single word in the subject text of the corresponding subject into a word vector according to the word embedding corpus of each subject, thereby converting the subject text into a vector matrix.

A15, the text label labeling apparatus of a1, wherein the input module is adapted to convert each single word in the text into a one-dimensional floating-point vector, thereby converting the text into a two-dimensional floating-point matrix.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.

In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the method of the present invention according to instructions in said program code stored in the memory.

Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.

As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims

1. A text label labeling apparatus for labeling text labels, the apparatus comprising:

the input module is suitable for receiving text input and converting and outputting the text into a vector matrix;

a convolutional neural network module connected to the input module, comprising:

a first input layer adapted to receive the vector matrix output by the input module;

the convolution layers are respectively connected with the first input layer in parallel and are suitable for performing convolution operation on the vector matrix to obtain a plurality of characteristic vectors;

the first pooling layer is connected with the convolution layers and is suitable for pooling the characteristic vectors and outputting a pooling result;

the first full-connection layer is connected with the first pooling layer and is suitable for performing dimensionality reduction operation on the pooling result to obtain the output of the convolutional neural network module, and the output represents the local semantic features of the text;

a recurrent neural network module connected to the input module, comprising:

a second input layer adapted to receive the vector matrix output by the input module;

the hidden layer is connected with the second input layer and is suitable for representing the word vector of each single word in the text as a new form vector formed by connecting the word vector with a forward context vector and a backward context vector; wherein, the hiding layer adopts a bidirectional LSTM long-time memory network or a bidirectional GRU hiding unit, and adopts bidirectional LSTM to obtain a new form vector x of the current single character_iComprises the following steps:

x_i＝[c_l(w_i)；e(w_i)；c_r(w_i)]

c_l(w_i)＝f(W^(l)c_l(w_i-1)+W^(sl)e(w_i-1))

c_r(w_i)＝f(W^(r)c_r(w_i+1)+W^(sr)e(w_i+1))

wherein, c_l(w_i) And c_r(w_i) Respectively representing the output of the current LSTM cell, c_l(w_i-1) And c_r(w_i+1) Representing the outputs of the preceding and succeeding LSTM cells, e (w), respectively_i) Word embedding vector, e (w), representing the current word_i-1) And e (w)_i+1) Respectively represent the previous one andword-embedding vector, W, of the latter single word^(l)And W^(r)Representing the weights of the preceding and succeeding LSTM elements, W, respectively^(sl)And W^(sr)Respectively representing the weight of the embedding vector of the previous word and the weight of the embedding vector of the next word, and f represents an activation function;

the second pooling layer is connected with the hidden layer and is suitable for pooling the new-form vectors of all the single characters and outputting a pooling result;

the second full-connection layer is connected with the second pooling layer and is suitable for performing dimensionality reduction operation on the pooling result to obtain the output of the recurrent neural network module, and the output represents the long-distance semantic features of the text;

the attention model module is connected with the convolutional neural network module and the cyclic neural network module and is suitable for outputting the weight of each single character in the text according to the local semantic features and the long-distance semantic features; and

and the output module is connected with the attention model module and is suitable for receiving the weight of each single character in the text and outputting text labels and the probability of each label.

2. The text label labeling apparatus of claim 1,

the convolution layers are suitable for carrying out convolution operation on the vector matrix at the same time, each convolution layer obtains a characteristic vector, and the numerical type contained in each characteristic vector is a floating point decimal;

the first pooling layer is suitable for respectively extracting the maximum floating point decimal number in each feature vector to form a multi-dimensional vector.

3. The text label labeling apparatus of claim 1,

the second pooling layer is adapted to retain the maximum value in the corresponding columns of all word vectors to obtain a one-dimensional vector of fixed length.

4. The text label labeling apparatus of any of claims 1-3, wherein the input to the attention model module comprises a contextual input andn-dimensional sequence input y₁,y₂,…,y_nWherein, in the step (A),

the context input is the output p of the convolutional neural network module, the n-dimensional sequence input is the output of the recurrent neural network module, where y_iThe attention model module is adapted to calculate a vector y as a one-dimensional vector in the output of the recurrent neural network module_iSimilarity with contextual input to compute y_iThe weight of (c).

5. The text label labeling apparatus of claim 4, wherein the attention model module is adapted to calculate the vector y by a dot product operation_iSimilarity with the context input to obtain a similarity vector u_iAnd calculating u by softmax function_iCorresponding weight vector a_i。

6. The text label labeling apparatus of claim 5, wherein the output Z of the attention model module is adapted to be calculated according to the formula:

u＝tanh(W_hh+b_h)

Z＝ah

7. The text labeling apparatus of claim 1, wherein the weight of each single word in the output of the attention model module is expressed in floating point decimals and the sum of all floating point decimals is 1.

8. The text label labeling apparatus of any one of claims 1-3, wherein the output module classifies using a softmax classifier that takes as input the output of the attention model module, calculates a probability of the text sequence on each label, and selects the label with the highest probability as the correct label for the text.

9. The text labeling apparatus of claim 1, wherein the text is title text.

10. The text label labeling apparatus of claim 9, wherein the input module is adapted to:

collecting subject texts of each subject, and training a word embedding corpus for each subject; and

and converting each single word in the title text of the corresponding subject into a word vector according to the word embedding corpus of each subject, thereby converting the title text into a vector matrix.

11. The text labeling apparatus of claim 1, wherein the input module is adapted to convert each single word in the text into a one-dimensional floating point vector, thereby converting the text into a two-dimensional floating point matrix.

12. A method of training a text labeling apparatus, adapted to train the text labeling apparatus of any one of claims 1-11, executed in a computing device, the method comprising the steps of:

collecting a training sample set, wherein each sample in the training sample set comprises a text and one or more labels preset for the text; and

inputting the training sample set into the text label labeling equipment to obtain a predicted label of each text, and performing iterative training on the text label labeling equipment according to the predicted label and a text actual label corresponding to the predicted label to obtain trained text label labeling equipment.

13. A text label labeling method adapted to be executed in a computing device in which a trained text label labeling device is stored, the text label labeling device being trained by the training method according to claim 12, the text label labeling method comprising the steps of:

acquiring a text to be marked; and

and inputting the text into the trained text label labeling equipment to obtain one or more labels of the text and the probability of each label.

14. A computing device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 12 or 13.

15. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 12 or 13.