CN108334499A

CN108334499A - A kind of text label tagging equipment, method and computing device

Info

Publication number: CN108334499A
Application number: CN201810129331.3A
Authority: CN
Inventors: 郭龙; 张东祥; 陈李江
Original assignee: Hainan Cloud River Technology Co Ltd
Current assignee: Hainan Avanti Technology Co ltd
Priority date: 2018-02-08
Filing date: 2018-02-08
Publication date: 2018-07-27
Anticipated expiration: 2038-02-08
Also published as: CN108334499B

Abstract

The invention discloses a kind of text label tagging equipments, and for being labeled to text label, the equipment includes：Input module is suitable for receiving text input, and the text is converted output as vector matrix；Convolutional neural networks module, connect with input module, is suitable for exporting the local semantic feature of text according to vector matrix；Recognition with Recurrent Neural Network module, connect with input module, is suitable for exporting the long range semantic feature of text according to vector matrix；Attention model module is connect with convolutional neural networks module and Recognition with Recurrent Neural Network module, is suitable for the weight according to each individual character in local semantic feature and long range semantic feature output text；And output module, it is connect with attention model module, is suitable for receiving the weight output text label of each individual character in text and the probability of each label.The invention also discloses the training methods of text label for labelling equipment, and corresponding text label mask method and computing device.

Description

A kind of text label tagging equipment, method and computing device

Technical field

The present invention relates to text data analysis field more particularly to a kind of text label tagging equipments and training method, text This label for labelling method and computing device.

Background technology

With the development of computer and Internet technology, the practice in education of middle and primary schools or even university education and test question Mesh realizes electronic storage, and can upload on network and be used for student.Over time, the quantity of topic can be got over Come bigger.Since each topic can be related to specific knowledge point and have specific difficulty, to find and cover from the topic of magnanimity Certain knowledge points simultaneously have specific individual topic, will become to be not easy very much.Currently used settling mode is：By teacher and religion Auxiliary personnel are manually labeled topic, to specify which knowledge point the topic corresponds to.However such mode increases teacher's work Make intensity, very time-consuming and laborious and efficiency is too low.

Therefore, it is necessary to a kind of artificial intelligence label technologies to carry out automatic marking to topic label.

Invention content

In view of the above problems, the present invention proposes a kind of text label tagging equipment and training method, text label mark Method and computing device exist above to try hard to solve the problems, such as or at least solve.

According to an aspect of the present invention, a kind of text label tagging equipment is provided, for being labeled to text label, Equipment includes：Input module is suitable for receiving text input, and the text is converted output as vector matrix；Convolutional neural networks Module is connect with input module, is suitable for exporting the local semantic feature of text according to the vector matrix；Recognition with Recurrent Neural Network mould Block is connect with input module, is suitable for exporting the long range semantic feature of text according to the vector matrix；Attention model mould Block is connect with convolutional neural networks module and Recognition with Recurrent Neural Network module, is suitable for according to local semantic feature and semantic over long distances Feature exports the weight of each individual character in text；And output module, it is connect with attention model module, is suitable for receiving the text In each individual character weight output text label and each label probability.

Optionally, in text label tagging equipment according to the present invention, convolutional neural networks module includes：First input Layer, the vector matrix exported suitable for receiving input module；Multiple convolutional layers are connected in parallel with the first input layer, are suitable for respectively Convolution operation is carried out to the vector matrix, obtains multiple feature vectors；First pond layer, connect with multiple convolutional layers, be suitable for pair Multiple feature vectors carry out pondization operation, and output pool result；And the first full articulamentum, it connect, fits with the first pond layer In carrying out dimensionality reduction operation to pond result, the output of convolutional neural networks module is obtained, the part which represents text is semantic Feature.

Optionally, in text label tagging equipment according to the present invention, multiple convolutional layers are suitable for simultaneously to the moment of a vector Battle array carries out convolution operation, and each convolutional layer obtains a feature vector, and the value type that each feature vector includes is that floating-point is small Number；First pond layer is suitable for extracting the maximum floating-point decimal in each feature vector respectively, forms a multi-C vector.

Optionally, in text label tagging equipment according to the present invention, the input dimension of convolutional neural networks module is W*h, h are the height for inputting text matrix, and w is the width for inputting text matrix, and output dimension is 200；Convolutional neural networks mould Block includes the convolutional layer of 3 kinds of different convolution kernel sizes, wherein first to the convolution kernel size of third convolutional layer be respectively 3*h, 4*h And 5*h, and each convolutional layer includes 256 characteristic patterns；The output vector dimension of first pond layer is 768, the first full connection The weight parameter dimension of layer is 768*200, and output vector dimension is 200.

Optionally, in text label tagging equipment according to the present invention, Recognition with Recurrent Neural Network module includes：Second input Layer, the vector matrix exported suitable for receiving input module；Hidden layer is connect with the second input layer, and being suitable for will be each in text The term vector of individual character is expressed as the new model vector that the term vector is connected with forward-backward algorithm context vector；Second pond Layer, connect with hidden layer, is suitable for carrying out pondization operation, and output pool result to the new model vector of all individual characters；And the Two full articulamentums are connect with the second pond layer, are suitable for carrying out dimensionality reduction operation to pond result, are obtained Recognition with Recurrent Neural Network module Output, the output represent the long range semantic feature of text.

Optionally, in text label tagging equipment according to the present invention, hidden layer uses two-way LSTM long short-term memories Network or two-way GRU hidden units；Second pond layer is suitable for retaining the maximum value in all term vector respective columns, to be fixed The one-dimensional vector of length.

Optionally, in text label tagging equipment according to the present invention, using the obtained current individual characters of two-way LSTM New model vector x i is：

x_i=[c_l(w_i)；e(w_i)；c_r(w_i)]

c_l(w_i)=f (W^(l)c_l(w_i-1)+W^(sl)e(w_i-1))

c_r(w_i)=f (W^(r)c_r(w_i+1)+W^(sr)e(w_i+1))

Wherein, c_l(w_i) and c_r(w_i) respectively represent the output of current LSTM units, c_l(w_i-1) and c_r(w_i+1) respectively represent Previous and the latter LSTM units output, e (w_i-1) and e (w_i+1) respectively represent previous and the latter individual character word insertion Vector, W^(l)And W^(r)Respectively represent previous and the latter LSTM units weight, W^(sl)And W^(sr)It respectively represents previous with after The weight of one word insertion vector, f represent activation primitive.

Optionally, in text label tagging equipment according to the present invention, the input of attention model module includes up and down Text input and n dimension sequence inputtings y₁,y₂,…,y_n, wherein context input is that the output p, n of convolutional neural networks module tie up sequence Row input is the output h, wherein y of Recognition with Recurrent Neural Network module_iFor an one-dimensional vector in output h, attention model module Vector y is calculated suitable for passing through_iSimilarity between being inputted context calculates input y_iWeight.

Optionally, in text label tagging equipment according to the present invention, attention model module is suitable for grasping by dot product Make to calculate vector y_iContext input between similarity and obtain similarity vector u_i, and u is calculated by softmax functions_i Corresponding weight vectors a_i。

Optionally, in text label tagging equipment according to the present invention, the output Z of attention model module is suitable for basis Following formula calculates：

U=tanh (W_hh+b_h)

Z=ah

Wherein, u is will to input the vector after h carries out non-linear conversion by activation primitive, and a is operated by dot product The weight vectors obtained with softmax functional operation, tanh are activation primitive, W_hAnd b_hRespectively represent the weight in activation primitive Parameter, t represent a certain step in a time step or a sentence sequence, and T is the inversion of matrix.

Optionally, in text label tagging equipment according to the present invention, each individual character in the output of attention model module Weight with the sum of floating-point fractional representation and all floating-point decimals be 1.

Optionally, in text label tagging equipment according to the present invention, output module is carried out using softmax graders Text sequence on each tab general is calculated using the output of attention model module as inputting in classification, the grader Rate, and correct label of the maximum label of select probability as the text.

Optionally, in text label tagging equipment according to the present invention, text is topic text.

Optionally, in text label tagging equipment according to the present invention, input module is suitable for：Collect the topic of each subject Text, and train a word to be embedded in corpus for each subject；And corpus is embedded in by corresponding subject according to the word of each subject Topic text in each individual character be converted to a term vector, to by topic text conversion be vector matrix.

Optionally, in text label tagging equipment according to the present invention, input module is suitable for each list in text Word is converted to one-dimensional floating point vector, to convert text to two-dimentional floating-point matrix.

According to a further aspect of the invention, a kind of training method of text label tagging equipment is provided, is suitable for as above The text label tagging equipment is trained, and is executed in computing device, the method comprising the steps of：Collect training sample Collection, each sample that training sample is concentrated include a text and are in advance one or more labels of text setting；With And training sample set is input in text label tagging equipment, the prediction label of each text is obtained, and according to the prediction label And its corresponding text physical tags are iterated training to text label for labelling equipment, to obtain trained text label Tagging equipment.

According to a further aspect of the invention, a kind of text label mask method is provided, suitable for being executed in computing device, Trained text label tagging equipment is stored in the computing device, text label tagging equipment uses text as described above The training method of label for labelling equipment is trained, and text label for labelling method includes step：Obtain text to be marked；And it will The text is input in the trained text label tagging equipment, obtains one or more labels of the text and each mark The probability of label.

According to a further aspect of the invention, a kind of computing device is provided, including：At least one processor；Be stored with The memory of program instruction, wherein the program instruction is configured as being suitable for being executed by least one processor, program instruction It include the instruction of the training method and/or text label mask method for executing text label tagging equipment as described above.

According to a further aspect of the invention, a kind of readable storage medium storing program for executing for the instruction that has program stored therein is provided, when the program When instruction is read and is executed by computing device so that the computing device executes the training of text label tagging equipment as described above Method and/or text label mask method.

According to the technique and scheme of the present invention, convolutional neural networks CNN models can capture the part of a sentence well Semantic feature carries out text classification, but it is limited to the fixed visual field of convolutional layer, can not consider that the long range of a sentence is semantic Dependence and succession.Recognition with Recurrent Neural Network RNN models can model longer sequence information, but Recognition with Recurrent Neural Network model Vocabulary in text sequence can only be put on an equal footing, the importance of different vocabulary cannot be distinguished.And the present invention is based on convolutional Neurals Network model and the respective advantage and disadvantage of Recognition with Recurrent Neural Network model, it is proposed that the CNN-RNN mixed models based on attention model CRAN.Namely by using attention model mechanism effectively by convolutional neural networks model and Recognition with Recurrent Neural Network model knot Altogether, the advantages of two kinds of models can be retained simultaneously in this way, and the deficiency in each comfortable processing text sequence is made up, to Compared to existing deep learning textual classification model, better classifying quality can be obtained.

Description of the drawings

To the accomplishment of the foregoing and related purposes, certain illustrative sides are described herein in conjunction with following description and drawings Face, these aspects indicate the various modes that can put into practice principles disclosed herein, and all aspects and its equivalent aspect It is intended to fall in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical reference numeral generally refers to identical Component or element.

Fig. 1 shows the structure diagram of computing device 100 according to an embodiment of the invention；

Fig. 2 shows the structure diagrams of text label tagging equipment 200 according to an embodiment of the invention；

Fig. 3 shows the detailed block diagram of text label tagging equipment according to an embodiment of the invention；

Fig. 4 shows the structure diagram of convolutional neural networks module 220 according to an embodiment of the invention；

Fig. 5 shows the structure diagram of Recognition with Recurrent Neural Network module 230 according to an embodiment of the invention；

Fig. 6 shows the structural schematic diagram of attention model module 240 according to an embodiment of the invention；

Fig. 7 shows the flow chart of the training method 700 of text label tagging equipment according to an embodiment of the invention； And

The flow chart of the text label mask method 800 of Fig. 8 one embodiment of the invention.

Specific implementation mode

The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.

Fig. 1 is the block diagram of Example Computing Device 100.In basic configuration 102, computing device 100, which typically comprises, is System memory 106 and one or more processor 104.Memory bus 108 can be used for storing in processor 104 and system Communication between device 106.

Depending on desired configuration, processor 104 can be any kind of processing, including but not limited to：Microprocessor (μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 may include such as The cache of one or more rank of on-chip cache 110 and second level cache 112 etc, processor core 114 and register 116.Exemplary processor core 114 may include arithmetic and logical unit (ALU), floating-point unit (FPU), Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 118 can be with processor 104 are used together, or in some implementations, and Memory Controller 118 can be an interior section of processor 104.

Depending on desired configuration, system storage 106 can be any type of memory, including but not limited to：Easily The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System stores Device 106 may include operating system 120, one or more apply 122 and program data 124.In some embodiments, It may be arranged to be operated using program data 124 on an operating system using 122.Program data 124 includes instruction, in root In computing device 100 according to the present invention, program data 124 includes the training method 700 for executing text label tagging equipment And/or the instruction of text label mask method 800.

Computing device 100 can also include contributing to from various interface equipments (for example, output equipment 142, Peripheral Interface 144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.Example Output equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as contribute to via One or more port A/V 152 is communicated with the various external equipments of such as display or loud speaker etc.Outside example If interface 144 may include serial interface controller 154 and parallel interface controller 156, they, which can be configured as, contributes to Via one or more port I/O 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, touch Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicated.Exemplary communication is set Standby 146 may include network controller 160, can be arranged to convenient for via one or more communication port 164 and one The communication that other a or multiple computing devices 162 pass through network communication link.

Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier wave Or the computer-readable instruction in the modulated data signal of other transmission mechanisms etc, data structure, program module, and can To include any information delivery media." modulated data signal " can such signal, one in its data set or more It is a or it change can the mode of coding information in the signal carry out.As unrestricted example, communication media can be with Include the wire medium of such as cable network or private line network etc, and such as sound, radio frequency (RF), microwave, infrared (IR) the various wireless mediums or including other wireless mediums.Term computer-readable medium used herein may include depositing Both storage media and communication media.

Computing device 100 can be implemented as server, such as file server, database server, application program service Device and WEB server etc. can also be embodied as a part for portable (or mobile) electronic equipment of small size, these electronic equipments Can be such as cellular phone, personal digital assistant (PDA), personal media player device, wireless network browsing apparatus, individual Helmet, application specific equipment or may include any of the above function mixing apparatus.Computing device 100 can also be real It includes desktop computer and the personal computer of notebook computer configuration to be now.In some embodiments, 100 quilt of computing device It is configured to execute the training method 800 and/or text label mask method 800 of text label tagging equipment according to the present invention.

Fig. 2 shows the structural schematic diagram of text label tagging equipment 200 according to an embodiment of the invention, Fig. 3 shows The detailed construction schematic diagram of text label for labelling equipment is gone out.As shown in Figures 2 and 3, the model include input module 210, Convolutional neural networks module 220, Recognition with Recurrent Neural Network module 230, attention model module 240 and output module 250.The equipment It is based primarily upon a kind of attention hybrid neural networks (CRAN, Hybrid CNN-RNN Attention-based Neural Network), by attention (Attention) model mechanism by convolutional neural networks (CNN) model, Recognition with Recurrent Neural Network (RNN) model is merged, the advantages of to combine two kinds of models, and the shortcomings that compensate for two kinds of models.

Specifically, input module 210 is suitable for receiving text input, and the text is converted output as vector matrix, wherein The text can be topic text.Commonly entering module 210 can be by popular word embedded technology at present by text sequence Row are converted into higher dimensional space vector matrix, naturally it is also possible to other text vector conversion methods, the present invention be used not to limit this System.Further, each individual character in text can be converted to one-dimensional floating point vector by input module 210, to turn text It is changed to two-dimentional floating-point matrix (two Dimension Numerical Value matrix), each circle represents a floating type element in Fig. 3.

According to one embodiment, for topic text, input module 210 can collect the topic text of each subject, It trains a word to be embedded in corpus for each subject, and corpus is embedded in by the topic text of corresponding subject according to the word of each subject In each individual character be converted to a term vector, to by topic text conversion be vector matrix.Specifically, for each All topic texts for belonging to the subject in exam pool are collected, then (such as by text collection input word incorporation model first by section The word2vec models of google exploitations, are certainly not limited to this) obtain a word insertion corpus.In this way in a topic Each individual character, just the individual character can be converted into an one-dimensional floating point vector using corresponding word incorporation model, such as [0.14,0.52 ..., 0.23], finally the topic be converted into a two-dimentional floating-point matrix.

For example, as it is known that one of topic in exam pool is as follows：Known 2x²The value of+3x+1 is 10, then algebraic expression 4x²+6x+ 1 value is.Then input module 210 using word embedded technology by each individual character (including individual Chinese character, the English words in the topic Female, number or punctuation mark etc.) being converted into the one-dimensional vector of regular length, (length can be specified freely, such as 300).Example As individual character is converted into [0.6425,0.5252,0.1923 ..., 0.234] " ".In this way, which is converted At a two-dimensional matrix, the size of the matrix is m × n, and wherein m is the individual character number (m=32 herein) of the topic, and n is one-dimensional The length (n=300 herein) of vector.

Convolutional neural networks module 220 is connect with input module 210, is suitable for exporting the local language of text according to vector matrix Adopted feature, main purpose are that semantic most abundant phrase sequence is extracted from the text matrix of input.

According to one embodiment, as shown in figure 4, convolutional neural networks module 220 may include the first input layer 221, it is more A convolutional layer 222, the first pond layer 223 and the first full articulamentum 224.Wherein, the first input layer 221 is suitable for receiving input module The vector matrix exported.Multiple convolutional layers 222 respectively with the first input layer 221 be connected in parallel, be suitable for the vector matrix into Row convolution operation obtains multiple feature vectors.Further, multiple convolutional layers 222 are suitable for carrying out convolution to vector matrix simultaneously Operation, each convolutional layer obtain a feature vector, and each feature vector represents a feature of the text, final N number of volume Lamination can obtain N number of vector, the N item features of the corresponding text.In general, the value type that feature vector includes is that floating-point is small Number.

First pond layer 223 is connect with multiple convolutional layers 222, is suitable for carrying out pondization operation to multiple feature vector, and Output pool result.Wherein, pondization operation can be maximum pond, and the first pond layer takes the maximum in each vector at this time Fractional value forms a N-dimensional vector.Certain pondization operation can also be average pond, the invention is not limited in this regard.Further Ground, the first pond layer 223 are suitable for extracting the maximum floating-point decimal in each feature vector respectively, form a multi-C vector.The One full articulamentum 224 is connect with the first pond layer 223, is suitable for carrying out dimensionality reduction operation to pond result, is obtained convolutional neural networks The output of module 220, the output represent the local semantic feature of text.

It is, convolutional neural networks module 220 carries out convolution operation to text sequence first, then pass through maximum pond Operation obtains the one-dimensional vector of fixed dimension size, and the dimension size of the vector is adjusted finally by one layer of full articulamentum, is obtained CNN layers of output p.

About the concrete structure and parameter of the convolutional neural networks, those skilled in the art can voluntarily set as needed It is fixed.According to one embodiment, which includes the convolutional layer of 3 kinds of different convolution kernel sizes, wherein the first convolution The convolution kernel size of layer is 3*h, and h is the height of input text matrix, and includes 256 characteristic patterns, the convolution of the second convolutional layer Core size is 4*h, and includes 256 characteristic patterns, and the convolution kernel size of third convolutional layer is 5*h, and includes 256 characteristic patterns. The output vector dimension of first pond layer is 768, and the weight parameter dimension of the first full articulamentum is 768*200, output vector dimension Degree is 200.The input dimension of entire convolutional neural networks is w*h, and w is the width for inputting text matrix, and output dimension is 200.

It should be appreciated that if adding a grader layer after full articulamentum in convolutional neural networks module 220 The N-dimensional vector that pond layer exports is transported to after this full connection+softmax layers by (such as softmax graders), so that it may with To a C dimensional vector, wherein C represents the quantity of text label.Each element in the vector corresponds to a label, and numerical value is big The small probability for representing the text and belonging to the label, such those skilled in the art can be made with the maximum preceding several labels of select probability For the label of the text.Although this method can also export text label and label probability, it is merely able in text classification The local semantic feature for capturing sentence, is limited to the fixed visual field of convolutional layer, can not consider the long range semantic dependency of sentence And succession, and long range semantic dependency and succession are of crucial importance text classification.Therefore, this is only used Kind of convolutional neural networks method is unable to get accurate classification results, and Recognition with Recurrent Neural Network (such as text Recognition with Recurrent Neural Network TextRNN defect of the convolutional neural networks in long-term dependence capture) can be preferably made up, to obtain preferably classification Effect.Therefore, the text label tagging equipment 200 in the present invention adds Recognition with Recurrent Neural Network module 230 again, more complete to carry out The feature in face exports.

Specifically, Recognition with Recurrent Neural Network module 230 is equally connect with input module 210, suitable for being exported according to vector matrix The long range semantic feature of text.The word that Recognition with Recurrent Neural Network module 230 is equally converted text to using word embedded technology to Amount is inputted as feature, and main purpose is sequence of extraction and long range semantic dependency feature from the text sequence of input.

According to one embodiment, as shown in figure 5, Recognition with Recurrent Neural Network module 230 may include the second input layer 231, it is hidden Hide layer 232, the second pond layer 233 and the second full articulamentum 234.Wherein, the second input layer 231 is suitable for receiving input module 210 The vector matrix exported.Hidden layer 232 is connect with the second input layer 231, is suitable for the term vector table of each individual character in text It is shown as the new model vector that the term vector is connected with forward-backward algorithm context vector.In general, hidden layer 232 may be used it is double To LSTM long memory network or two-way GRU hidden units in short-term.Second pond layer 233 is connect with hidden layer 232, is suitable for all The new model vector of individual character carries out pondization operation, and output pool result.In general, the second pond layer 233 is suitable for retaining all words Maximum value in vectorial respective column, to obtain the one-dimensional vector of regular length.Second full articulamentum 234 and the second pond layer 233 Connection is suitable for carrying out dimensionality reduction operation to pond result, obtains the output h of Recognition with Recurrent Neural Network module 230, which represents text Long range semantic feature.It is, Recognition with Recurrent Neural Network module 230 extracts the context of text sequence by hidden layer first Then information connects contextual information and by one layer of full articulamentum, finally obtains RNN layers of output.

About the concrete structure and parameter of the Recognition with Recurrent Neural Network, those skilled in the art can voluntarily set as needed It is fixed.According to one embodiment, the hidden unit layer dimension of forward direction GRU and backward GRU is to be set as in the Recognition with Recurrent Neural Network 200, input dimension is w*h (w is the width for inputting text matrix, and h is the height for inputting text matrix), and output dimension is w* 200。

According to another embodiment, in order to preferably capture long range semantic dependency and the succession in text, cycle Neural network module 230 may be used two-way LSTM and obtain the forward and backward context expression of text：

c_l(w_i)=f (W^(l)c_l(w_i-1)+W^(sl)e(w_i-1))

c_r(w_i)=f (W^(r)c_r(w_i+1)+W^(sr)e(w_i+1))

Wherein, c_l(w_i) and c_r(w_i) respectively represent the output of current LSTM units, c_l(w_i-1) and c_r(w_i+1) respectively represent Previous and the latter LSTM units output, e (w_i-1) and e (w_i+1) respectively represent previous and the latter individual character word insertion Vector, W^(l)And W^(r)Respectively represent previous and the latter LSTM units weight, W^(sl)And W^(sr)It respectively represents previous with after The weight of one word insertion vector, f represent activation primitive, such as sigmoid functions or tanh activation primitives, are certainly not limited to this. It is, a LSTM unit, there are two input, one of input is the output of previous LSTM units, another input It is the word insertion expression of current individual character, while generates an output.Wherein first formula is the previous of the current individual character of consideration Individual character generates its corresponding feature vector, and second formula is then to consider that the latter individual character of current list is right to generate its The feature vector answered.The expression of such a individual character reforms into monosyllabic word insertion vector sum forward-backward algorithm context vector connection The form got up, i.e., the new model vector x of the current individual character obtained at this time_iFor：

x_i=[c_l(w_i)；e(w_i)；c_r(w_i)]

Then the maximum retained in all term vector respective columns by maximum pond layer is worth to the one of a fixed size Dimensional vector.If for example, corresponding three vectors of text that length is 3 be respectively [0.13,0.24,0.69], [0.04, 0.29,0.23] and [0.88,0.23,0.28], then after maximum pondization operation, can be obtained a length for 3 it is one-dimensional to It measures [0.88,0.29,0.69], wherein each element is the maximum value on these three vectorial corresponding positions.The pondization operates It can make the text for arbitrary different length, the vector of a regular length can be exported, because the length of the vector is only It is related with the word insertion length of vector sum context vector, and it is unrelated with the length of text.

It is also to be understood that if adding a classification after full articulamentum in Recognition with Recurrent Neural Network module 230 Device layer (such as softmax graders), then can also carry out text classification.Although this sorting technique can model longer sequence Information, but it can only put on an equal footing the vocabulary in text sequence, and the importance of different vocabulary cannot be distinguished.Therefore the present invention is comprehensive Recognition with Recurrent Neural Network module 230 (RNN Layer) and convolutional neural networks module 220 (CNN Layer) have been closed, and has passed through attention The output of the two is effectively bonded together by power model module 240 (Attention Layer), while retaining both moulds The advantages of block, simultaneously makes up respective disadvantage.

Specifically, attention model module 240 connects with convolutional neural networks module 220 and Recognition with Recurrent Neural Network module 230 It connects, the local semantic feature and long range semantic feature suitable for being exported respectively according to the two modules export each individual character in text Weight.

According to one embodiment, as shown in fig. 6, attention model module 240 includes two inputs, it is that context is defeated respectively Enter (i.e. Context inputs) and n dimension sequence inputtings y₁,y₂,…,y_n, and return to a vector Z according to the two inputs.Wherein, n Dimension sequence inputting usually such as can be text conversion at word embeded matrix, can also be word embeded matrix by cycle nerve The two-dimensional matrix being converted into after network module 240.The two-dimensional matrix of the latter, i.e. n can be taken to tie up sequence inputting in the present invention For the output h of Recognition with Recurrent Neural Network module 240, and y_iIt is the one-dimensional vector exported in h.Furthermore, it is noted that power pattern die The weight for returning to each individual character in the vector Z exported of block 240 is with the sum of floating-point fractional representation and all floating-point decimals for 1.Each Floating-point values represent the weight size of individual character in corresponding text sequence, such as [with, under, generation, number, formula] corresponding vector Z is The weight of [0.05,0.05,0.3,0.3,0.3], wherein individual character " generation " is 0.3.

Attention model mechanism can assign vocabulary important in list entries higher weight, preferably divide to obtain Class effect.And support this mechanisms play act on be attention model module 240 context input (Context inputs), Context input is the output p of convolutional neural networks module 220.

According to one embodiment, based on context guidance that attention model module 240 inputs, by calculating n input Come for the different weight of each input imparting from the correlation of context input.Specifically, which is suitable for by calculating vector y_iSimilarity between being inputted context calculates input y_iWeight.At this point, it can calculate vector by dot product operations y_iContext input between similarity and obtain similarity vector u_i, and u is calculated by softmax functions_iCorresponding power Weight vector a_i。

The present invention uses the output of Recognition with Recurrent Neural Network module 230 to tie up input as the n of attention model module 240, because What this input represented is the long range semantic feature of urtext sequence；Using convolutional neural networks module 220 as attention The context of power model module 240 is inputted, thus context represent be urtext sequence local semantic feature.It is this It can be designed so that attention model module 240 retains the sequences of text and long range language of 230 extraction of Recognition with Recurrent Neural Network module Adopted feature, while the different power that n ties up list entries are calculated by the local semantic feature that convolutional neural networks module 220 is extracted Weight.According to one embodiment, the output Z of attention model module 240 is suitable for being calculated according to following formula：

U=tanh (W_hh+b_h)

Z=ah

Wherein, u is will to input the vector after h carries out non-linear conversion by activation primitive, and a is operated by dot product The weight vectors obtained with softmax functional operation, tanh are activation primitive, W_hAnd b_hRespectively represent the weight in activation primitive Parameter, t represent a time step (time step), it is understood that for a certain step in a sentence sequence, T is matrix It is inverted.

Output module 250 is connect with attention model module 240, is exported suitable for reception attention model module 240 The weight of each individual character in text, and export the probability of text label and each label.Output module 250 uses softmax graders Classify, which is calculated text sequence each using the output of the attention model module 240 as input Probability on label, and correct label of the maximum label of select probability as the text.Probability is by softmax graders It is calculated, specific formula for calculation is as follows：

Wherein, t represents a label；Z is the output of attention model module 240, and size is equal to the type of label；Z_i Represent the value of the corresponding positions label i in Z-direction amount, Z_tRepresent the value of the corresponding positions label t in Z-direction amount.The formula be in order to The vectorial P that numerical value in vector Z is done a weighted average, therefore obtained is the probability distribution on a label.Certainly should Understand, output module 250 can also use other graders, as long as can realize that labeling, the present invention do not limit this System.

Fig. 7 shows the training method 700 of text label tagging equipment according to an embodiment of the invention, is suitable for such as The upper text label tagging equipment 200 is trained, and can be executed in computing device, such as be held in computing device 100 Row.As shown in fig. 7, this method starts from step S720.

In step S720, collect training sample set, each sample that the training sample is concentrated include a text and It is in advance one or more labels of text setting.

Then, in step S740, training sample set is input in text label tagging equipment, obtains the pre- of each text Mark label, and instruction is iterated to text label for labelling model according to the prediction label and its corresponding text physical tags Practice, to obtain trained text label tagging equipment.Existing conventional model training method, example may be used in the training method Such as use the cross-validation method of training set and verification collection, the invention is not limited in this regard.

Fig. 8 shows text label mask method 800 according to an embodiment of the invention, can be held in computing device It goes, trained text label tagging equipment is stored in the computing device, text label for labelling equipment is using as described above Text label tagging equipment training method training.

As shown in figure 8, this method starts from step S820.In step S820, text to be marked is obtained.Then, in step In S840, the text is input in trained text label tagging equipment, obtain one or more labels of the text with And the probability of each label.When there are multiple labels and multiple probability, it can also be dropped according to the multiple labels of size of probability value Sequence is shown.

In this way, for the text of any one label to be marked, the text is input to trained text label and is marked In equipment, you can output obtains the probability of label and each label belonging to the text.For topic text, when in face of magnanimity When exam pool (such as 66,000,000), it is carried out manually marking one by one apparent unrealistic, and CNN models or exclusive use is used alone RNN models can not also carry out good text classification effect.And the CNN- based on Attention models designed by the present invention RNN mixed model CRAN are effectively got up CNN models and RNN models couplings by using Attention mechanism, Ke Yitong When the advantages of retaining CNN and RNN, and the deficiency in each comfortable processing text sequence is made up, to compare existing depth Textual classification model is practised, better classifying quality can be obtained.

A7, the text label tagging equipment as described in A6, wherein using the new model of the obtained current individual characters of two-way LSTM Vector x_iFor：

x_i=[c_l(w_i)；e(w_i)；c_r(w_i)]

c_l(w_i)=f (W^(l)c_l(w_i-1)+W^(sl)e(w_i-1))

c_r(w_i)=f (W^(r)c_r(w_i+1)+W^(sr)e(w_i+1))

A8, the text label tagging equipment as described in any one of A1-A7, wherein the attention model module it is defeated Enter including context input and n dimension sequence inputtings y₁,y₂,…,y_n, wherein the context input is the convolutional neural networks The output p of module, the n dimensions sequence inputting are the output h, wherein y of the Recognition with Recurrent Neural Network module_iFor output h in One one-dimensional vector, the attention model module are suitable for by calculating vector y_iSimilarity between being inputted context is counted Calculate input y_iWeight.

A9, the text label tagging equipment as described in A8, wherein the attention model module is suitable for passing through dot product operations Calculate vector y_iContext input between similarity and obtain similarity vector u_i, and u is calculated by softmax functions_iInstitute Corresponding weight vectors a_i。

A10, the text label tagging equipment as described in A9, wherein the output Z of the attention model module is suitable for basis Following formula calculates：

U=tanh (W_hh+b_h)

Z=ah

A11, the text label tagging equipment as described in A1, wherein each individual character in the output of the attention model module Weight with the sum of floating-point fractional representation and all floating-point decimals be 1.

A12, the text label tagging equipment as described in any one of A1-A11, wherein the output module uses Softmax graders are classified, which is calculated text using the output of the attention model module as input The probability of sequence on each tab, and correct label of the maximum label of select probability as the text.

A13, the text label tagging equipment as described in A1, wherein the text is topic text.

A14, the text label tagging equipment as described in A13, wherein the input module is suitable for：Collect the topic of each subject Mesh text, and train a word to be embedded in corpus for each subject；And corpus is embedded according to the word of each subject and is learned corresponding Each individual character in the topic text of section is converted to a term vector, to be vector matrix by topic text conversion.

A15, the text label tagging equipment as described in A1, wherein the input module is suitable for each list in text Word is converted to one-dimensional floating point vector, to convert text to two-dimentional floating-point matrix.

In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice without these specific details.In some instances, well known method, knot is not been shown in detail Structure and technology, so as not to obscure the understanding of this description.

Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of each inventive aspect, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention：It is i.e. required to protect Shield the present invention claims the feature more features than being expressly recited in each claim.More precisely, as following As claims reflect, inventive aspect is all features less than single embodiment disclosed above.Therefore, it abides by Thus the claims for following specific implementation mode are expressly incorporated in the specific implementation mode, wherein each claim itself As a separate embodiment of the present invention.

Those skilled in the art should understand that the module of the equipment in example disclosed herein or unit or groups Part can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example In different one or more equipment.Module in aforementioned exemplary can be combined into a module or be segmented into addition multiple Submodule.

Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment Change and they are arranged in the one or more equipment different from the embodiment.It can be the module or list in embodiment Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it may be used any Combination is disclosed to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit requires, abstract and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation It replaces.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed One of meaning mode can use in any combination.

Various technologies described herein are realized together in combination with hardware or software or combination thereof.To the present invention Method and apparatus or the process and apparatus of the present invention some aspects or part can take embedded tangible media, such as it is soft The form of program code (instructing) in disk, CD-ROM, hard disk drive or other arbitrary machine readable storage mediums, Wherein when program is loaded into the machine of such as computer etc, and is executed by the machine, the machine becomes to put into practice this hair Bright equipment.

In the case where program code executes on programmable computers, computing device generally comprises processor, processor Readable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremely A few output device.Wherein, memory is configured for storage program code；Processor is configured for according to the memory Instruction in the said program code of middle storage executes the method in the present invention.

In addition, be described as herein can be by the processor of computer system or by executing for some in the embodiment The combination of method or method element that other devices of the function are implemented.Therefore, have for implementing the method or method The processor of the necessary instruction of element forms the device for implementing this method or method element.In addition, device embodiment Element described in this is the example of following device：The device is used to implement performed by the element by the purpose in order to implement the invention Function.

As used in this, unless specifically stated, come using ordinal number " first ", " second ", " third " etc. Description plain objects are merely representative of the different instances for being related to similar object, and are not intended to imply that the object being described in this way must Must have the time it is upper, spatially, in terms of sequence or given sequence in any other manner.

Although the embodiment according to limited quantity describes the present invention, above description, the art are benefited from It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that The language that is used in this specification primarily to readable and introduction purpose and select, rather than in order to explain or limit Determine subject of the present invention and selects.Therefore, without departing from the scope and spirit of the appended claims, to this skill Many modifications and changes will be apparent from for the those of ordinary skill in art field.For the scope of the present invention, to this hair Bright done disclosure is illustrative and not restrictive, and it is intended that the scope of the present invention be defined by the claims appended hereto.

Claims

1. a kind of text label tagging equipment, for being labeled to text label, the equipment includes：

Input module is suitable for receiving text input, and the text is converted output as vector matrix；

Convolutional neural networks module is connect with the input module, is suitable for exporting the local language of text according to the vector matrix Adopted feature；

Recognition with Recurrent Neural Network module is connect with the input module, is suitable for exporting the long range of text according to the vector matrix Semantic feature；

Attention model module is connect with the convolutional neural networks module and Recognition with Recurrent Neural Network module, is suitable for according to part The weight of each individual character in semantic feature and long range semantic feature output text；And

Output module is connect with the attention mould power pattern block, and the weight for being suitable for receiving each individual character in the text exports text The probability of label and each label.

2. text label tagging equipment as described in claim 1, wherein the convolutional neural networks module includes：

First input layer, the vector matrix exported suitable for receiving the input module；

Multiple convolutional layers are connected in parallel with first input layer respectively, are suitable for carrying out convolution operation to the vector matrix, be obtained Multiple feature vectors；

First pond layer, connect with the multiple convolutional layer, is suitable for carrying out pondization operation to the multiple feature vector, and export Pond result；And

First full articulamentum is connect with first pond layer, is suitable for carrying out dimensionality reduction operation to the pond result, is obtained described The output of convolutional neural networks module, the output represent the local semantic feature of text.

3. text label tagging equipment as claimed in claim 2, wherein

The multiple convolutional layer is suitable for carrying out convolution operation to vector matrix simultaneously, and each convolutional layer obtains a feature vector, The value type that each feature vector includes is floating-point decimal；

First pond layer is suitable for extracting the maximum floating-point decimal in each feature vector respectively, forms a multi-C vector.

4. text label tagging equipment as claimed in claim 2, wherein the input dimension of the convolutional neural networks module is W*h, h are the height for inputting text matrix, and w is the width for inputting text matrix, and output dimension is 200；

The convolutional neural networks module includes the convolutional layer of 3 kinds of different convolution kernel sizes, wherein first to third convolutional layer Convolution kernel size is respectively 3*h, 4*h and 5*h, and each convolutional layer includes 256 characteristic patterns；

The output vector dimension of first pond layer is 768, and the weight parameter dimension of the first full articulamentum is 768*200, export to It is 200 to measure dimension.

5. text label tagging equipment as described in claim 1, wherein the Recognition with Recurrent Neural Network module includes：

Second input layer, the vector matrix exported suitable for receiving the input module；

Hidden layer is connect with second input layer, suitable for by the term vector of each individual character in text be expressed as the term vector with The new model vector that forward-backward algorithm context vector connects；

Second pond layer, connect with the hidden layer, is suitable for carrying out pondization operation to the new model vector of all individual characters, and export Pond result；And

Second full articulamentum is connect with second pond layer, is suitable for carrying out dimensionality reduction operation to the pond result, is obtained described The output of Recognition with Recurrent Neural Network module, the output represent the long range semantic feature of text.

6. text label tagging equipment as claimed in claim 5, wherein

The hidden layer uses two-way LSTM long memory network or two-way GRU hidden units in short-term；

Second pond layer is suitable for retaining the maximum value in all term vector respective columns, with obtain regular length it is one-dimensional to Amount.

7. a kind of training method of text label tagging equipment is suitable for the text label described in any one of claim 1-6 Tagging equipment is trained, and is executed in computing device, the method includes the steps：

Training sample set is collected, each sample that the training sample is concentrated includes a text and is arranged in advance for the text One or more labels；And

Training sample set is input in the text label tagging equipment, obtains the prediction label of each text, and pre- according to this Mark label and its corresponding text physical tags are iterated training to text label for labelling model, to obtain trained text This label for labelling equipment.

8. a kind of text label mask method is stored with trained suitable for being executed in computing device in the computing device Text label tagging equipment, the text label tagging equipment are trained using training method as claimed in claim 7, the text This label for labelling method includes step：

Obtain text to be marked；And

The text is input in the trained text label tagging equipment, obtain one or more labels of the text with And the probability of each label.

9. a kind of computing device, including：

One or more processors；

Memory；And

One or more programs, wherein one or more of programs are stored in the memory and are configured as by described one A or multiple processors execute, and one or more of programs include for executing in method according to claim 7 or 8 Either method instruction.

10. a kind of computer readable storage medium of the one or more programs of storage, one or more of programs include instruction, Described instruction is when executed by a computing apparatus so that the computing device executes in method according to claim 7 or 8 Either method.