CN110046250A

CN110046250A - Three embedded convolutional neural networks model and its more classification methods of text

Info

Publication number: CN110046250A
Application number: CN201910200666.4A
Authority: CN
Inventors: 朱定局; 田娟; 欧珠
Original assignee: Xizang Minzu University; South China Normal University
Current assignee: Xizang Minzu University; South China Normal University
Priority date: 2019-03-17
Filing date: 2019-03-17
Publication date: 2019-07-23

Abstract

The invention belongs to Text Classification field, it is related to a kind of word, part of speech and the long three embedded convolutional neural networks model of word and its more classification methods of text.The long three embedded convolutional neural networks model of word, part of speech and word, including input layer, look-up table, convolutional layer, feature pool layer, feature articulamentum, feature selecting layer and the classification output layer being sequentially connected, Chinese neologisms are identified using new word identification method simultaneously, to improve the accuracy of participle and the accuracy rate of text classification.

Description

Three-embedded convolutional neural network model and text multi-classification method thereof

Technical Field

The invention belongs to the technical field of text classification, and relates to a word, part of speech and word length embedded convolutional neural network model and a text multi-classification method thereof.

Background

Text classification is an important task in natural language processing, and can help people organize and manage massive text information better, quickly and accurately acquire required information, and realize personalized information recommendation. Text classification is applied in numerous fields such as network search, information filtering processing, emotion analysis, text indexing, automatic abstracting, information retrieval pushing, digital libraries, question and answer systems and the like.

The text classification processing flow comprises feature extraction, training of a model and text classification, wherein the feature extraction is the most core task in the classification, and the traditional text feature extraction method mainly adopts a rule-based method and a statistics-based method. The former is to establish an expert system by expanding expert rules and utilizing knowledge engineering, and these methods all rely on making rules for data sets and scenes to perform classification task processing, but in other data set applications, the regularized classification method is not suitable for popularization. The latter is to extract text features from the machine learning angle through statistical rules and shallow classification models, and has good effect, and the existing methods mainly include TF-IDF, Information Gain method (Information Gain), Mutual Information (Mutual Information), Expected cross entropy (Expected cross entropy), LDA model, N-Gram algorithm, and the like. However, these conventional text feature extraction methods often ignore the context information of the text or the sequence of the text, and the finally generated features have too large and too sparse dimensions, resulting in problems of dimension disasters and the like.

With the success of deep learning in graphic image processing in recent years, more and more researchers have applied deep learning related techniques to natural language processing. The word embedding and deep learning method provides a new idea for solving the characteristic sparsity problem. Word Embedding (Word Embedding) is a distributed representation of words, and the Neural Network Language Model (NNLM) proposed by Bengio Y in 2003 (see the documents: Bengio Y, Ducharme R, Vincent P, et al. A neural network language model [ J ]. Journal of machine learning research,2003,3(Feb): 1137-. Tomas Mikolov et al, 2013, proposed the structure of a model with both words CBOW and Skip-Gram embedded (see Mikolov T, Sutskeeper I, Chen K, et al, distribution details of words and phrases and the same compositional theory [ C ]// advanced sin neural information processing system.2013: 3111-3119.) substantially similar to NNLM, except that the model removed the non-linear hidden layer, the prediction objective was different, CBOW is the context word predicting the current word, and Skip-Gram is the opposite. The text classification model solves the problem of automatic feature extraction (namely feature expression) by using deep learning networks such as RNN/CNN and the like and variants thereof.

The Recurrent Neural Network (RNN) commonly used in deep learning can better express context information, can implicitly extract sentences to represent on the basis of keeping word order information, and can analyze the semantics of the whole document without sentence boundaries. Weningger F et al, using The LSTM + RNN model (see: Weningger F, Bergmann J, Schuller B. introducing current nnt: The music open-source cuda recircury neural network toolkit [ J ]. The Journal of machine learning Research,2015,16(1):547-551.), can avoid The gradient elimination and gradient diffusion problems of RNN in text classification, but this model recognizes The latter words as being more dominant than The former words when processing text. Thus, when it is used to capture the semantics of an entire document, its effectiveness may be reduced because the key component may appear anywhere in the document rather than at the end. The textCNN text Classification model proposed by Kim in 2014 (see the literature: KimY. structural Neural Networks for the Session Classification [ C ]// Proc of electronic Methods in Natural Language processing 2014:1746 and 1751), and the local relevance of texts can be captured by combining the CNN model with word embedding, so that the problem of relation deletion between words is solved. Although TextCNN can perform well in many tasks, CNN has the biggest problem of fixing the view of filter _ size, on one hand, it is not possible to model longer sequence information, and on the other hand, the super-reference adjustment of filter _ size is also cumbersome.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a convolution neural network model with three embedded words, parts of speech and word length, wherein an input layer receives a preprocessed text, a lookup table is added for convenient searching, dimension reduction processing is carried out on a pooling layer for improving the classification accuracy, local features of the text are extracted after convolution and pooling operations, a merged feature matrix is subjected to dropout mechanism, partial features are randomly deleted, and a classifier input matrix is obtained and added into a classifier.

The invention also provides a text multi-classification method based on the word, part of speech and word length three-embedded convolutional neural network model. The new word recognition method is adopted to recognize new Chinese words, the text word segmentation corpus is expanded, word segmentation is carried out, a word, part of speech and word length library are obtained, and the word segmentation accuracy is improved.

The text multi-classification convolutional neural network model embedded with three words, parts of speech and word length is realized by adopting the following technical scheme:

the utility model provides a three embedding convolution neural network models of word, part of speech and word length, includes consecutive input layer, lookup table, convolution layer, characteristic pooling layer, characteristic connecting layer, characteristic selection layer and categorised output layer, wherein:

an input layer: the word length feature vector mapping matrix is used for mapping word feature vectors to word characteristic vectors;

a lookup table: the word characteristic vector mapping matrix, the part-of-speech characteristic vector mapping matrix and the word length characteristic vector mapping matrix are used for storing, and comprise a word lookup table, a part-of-speech lookup table and a word length lookup table;

and (3) rolling layers: the word matrix, the part-of-speech matrix and the word length matrix which need to be processed are obtained through the lookup table, and convolution operation is carried out on the obtained word matrix, the obtained part-of-speech matrix and the obtained word length matrix to obtain local characteristics of words, parts of speech and word length;

a characteristic pooling layer: the word length and word property local characteristics are subjected to dimension reduction processing;

characteristic connecting layer: the word length local feature fusion processing module is used for carrying out merging processing on the words, the part of speech and the word length local features after the dimension reduction processing to obtain a fusion feature matrix; different weights are given to the words, the part of speech and the word length local characteristics for fusion when the combination processing is carried out;

a feature selection layer: the method is used for deleting part of features randomly by adopting a dropout mechanism on the fusion feature matrix to obtain a classifier input matrix;

and (4) a classification output layer: the system is used for taking the classifier input matrix as input, analyzing the characteristics and completing multi-classification of the text;

wherein,

the step of determining the preset weight value comprises the following steps: taking a plurality of groups of different preset weights; embedding three embedded convolutional neural network models of words, parts of speech and word length formed by connecting operation layers formed by different preset weights into the test of text classification; obtaining the classification accuracy corresponding to the word, part of speech and word length three-embedded convolution neural network model formed by connecting operation layers formed by different preset weights according to test calculation; and taking the preset weight corresponding to the word, the part of speech and the word length with the highest classification accuracy embedded into the convolutional neural network model as the preset weight of the connection operation layer.

Preferably, when the convolutional neural network model is trained by embedding three words, parts of speech and word length, the parameters are set as follows: the sizes h of convolution kernel windows are respectively 3, 4 and 5, and the maximum word vector d is 300; the number of each convolution kernel is 100; dropout probability is 0.5; the chunk was 5 in chunk-max pooling samples.

Preferably, the feature pooling layer adopts a maximum feature pooling dimension reduction strategy in blocks; the classification output layer uses softmax as a classifier.

The text multi-classification method is realized by adopting the following technical scheme:

a text multi-classification method based on a word, part of speech and word length three-embedded convolutional neural network model comprises the following steps:

preprocessing data to obtain an input data set of a word, a part of speech and a word length embedded into a convolutional neural network model;

the word, part of speech and word length three-embedded convolutional neural network model receives the preprocessed input data set, carries out word vector coding and obtains a word characteristic vector mapping matrix, a part of speech characteristic vector mapping matrix and a word length characteristic vector mapping matrix;

constructing a word lookup table, a part of speech lookup table and a word length lookup table;

obtaining a word matrix, a part of speech matrix and a word length matrix which need to be processed by searching the word lookup table, the part of speech lookup table and the word length lookup table, and performing convolution operation on the obtained word matrix, the part of speech matrix and the word length matrix to obtain local characteristics of words, parts of speech and word length;

performing dimensionality reduction processing on the word, the part of speech and the word length local characteristic;

the word length local feature fusion processing module is used for carrying out merging processing on the words, the part of speech and the word length local features after the dimension reduction processing to obtain a fusion feature matrix; different weights are given to the words, the part of speech and the word length local characteristics for fusion when merging processing is carried out;

a dropout mechanism is adopted for the fusion characteristic matrix, and partial characteristics are deleted randomly to obtain a classifier input matrix;

and inputting the II into a classification output layer, analyzing the characteristics and finishing multi-classification of the text.

Further, the data preprocessing comprises: and identifying new Chinese words by adopting a new word identification method, expanding a text word segmentation corpus and then segmenting words to obtain a word, part of speech and word length library.

Further, the new word recognition method takes the degree of solidity, information entropy and point mutual information of adjacent words as reference information.

Is provided with L_av(ξ) indicates the number of different left-hand words immediately adjacent to word string ξ, R_av(ξ) the number of different characters on the right side immediately adjacent to the word string ξ is represented by L_av(ξ) and R_av(ξ) the probability that a word string ξ forms a word in a different semantic environment is represented, and the degree of freezing of adjacent words is calculated as follows:

G_av(ξ)＝logAv(ξ)

wherein, Av (ξ) ═ min { L }_av(ξ),R_av(ξ)}。

The calculation formula of the information entropy is as follows:

wherein, I (x)_i) Denotes x_iSelf information of (1), P (x)_i) Is denoted by x_iProbability distribution of (2).

The calculation formula of the point-to-point information is as follows:

wherein, x and y are words or characters in the corpus, and if x and y are independent, P (x, y) ═ P (x) P (y); the larger the value of PMI, the more relevant both x, y are.

Further, obtaining the word local feature includes:

① input spliced word vector x_i:j：

In the formula, x_i(x_i∈R^d) A word vector of the ith word of the sentence with the length of n and the dimension of d;for the splicing operation, x_i:jIs [ i, j ] th in a sentence of length n]Splicing word vectors of the words;

②, if the convolution kernel size is h × d, the feature mapping of the word window after convolution operation is:

S_i＝f(w×x_i:i+h-1+b)

wherein x is_i:i+h-1A presentation vocabulary window; w is the weight matrix of the convolution kernel; h is the size of the words input in the convolution kernel, b is a bias term, and f is an activation function;

③ vector x for concatenation word in sentence with length n_1:h、x_2:h+1、…、x_n+h-1:nPerforming convolution operation to obtain a word local feature mapping matrix S:

S＝[s₁,s₂,s₃,...,s_n-h+1]

further, the method also comprises the following steps: and (3) training a word, part of speech and word length three-embedded convolutional neural network model by adopting a gradient descent method Adam.

Preferably, the method further comprises the steps of: evaluating the multi-classification effect of the text; let the sample set T, sample y of a given sample size N_iIs represented by n_iAnd m_iThe text multi-classification effect evaluation formula is as follows:

wherein, | n_i＝m_iIf | is true, it takes 1, and if | is false, it takes 0.

Compared with the prior art, the invention has the following beneficial effects:

(1) by analyzing the sample set, the new Chinese words are identified by adopting a new word identification method, the text word segmentation corpus is expanded, and the word segmentation accuracy can be improved.

(2) The model can obtain the optimal result in a mode of three inputs of words, parts of speech and word length under the condition of not changing the number of convolution kernels, and improves the recognition capability of text semantics. And the classification result obtained by predicting the convolutional neural network model can be more accurate by simultaneously considering three factors of words, parts of speech and word length.

(3) And maximum feature pooling (chunk max boosting) down-sampling in the blocks is increased, so that feature extraction is more accurate, and more feature information is reserved.

(4) By adopting a dropout mechanism, partial features are deleted randomly, and the generalization capability of the model is improved

Drawings

FIG. 1 is a block diagram of a convolutional neural network model structure with words, parts of speech and word length embedded in accordance with an embodiment of the present invention;

FIG. 2 is a flowchart of a text multi-classification method according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific embodiments, but the embodiments of the present invention are not limited thereto.

The Convolutional Neural Network (CNN) is a feedforward neural network model formed by a plurality of layers of neurons, and a convolutional neural network feature extractor is formed by convolutional layers and sub-sampling layers and can regard sub-sampling as a special convolution process. The convolutional layer comprises a plurality of characteristic planes, each characteristic plane is composed of neurons of an input matrix, the neurons of the same characteristic plane share a weight, connection among layers of a network can be reduced through the shared weight, and the risk of overfitting is reduced. The convolution layer extracts local features through convolution operation, and the down-sampling layer reserves optimal features through calculation and performs dimension reduction processing on the feature structure.

A convolution neural network model with three embedded words, parts of speech and word length is shown in figure 1, and comprises an input layer, a lookup table (Look-up tables), a convolution operation layer, a maximum feature pooling operation layer in blocks, a connection operation layer, a feature selection operation layer and a classification output layer which are sequentially connected, wherein:

an input layer: and the word length characteristic vector mapping matrix is used for receiving the preprocessed text and carrying out word vector coding to obtain a word characteristic vector mapping matrix, a part-of-speech characteristic vector mapping matrix and a word length characteristic vector mapping matrix.

A lookup table: the word characteristic vector mapping matrix storage device is used for storing a word characteristic vector mapping matrix, a part-of-speech characteristic vector mapping matrix and a word length characteristic vector mapping matrix, and comprises a word lookup table, a part-of-speech lookup table and a word length lookup table.

And (3) rolling layers: the method is used for obtaining a word matrix, a part of speech matrix and a word length matrix which need to be processed through a lookup table, and carrying out convolution operation on the word matrix, the part of speech matrix and the word length matrix to obtain local characteristics of words, parts of speech and word length.

The maximum characteristic pooling operation layer in the block: the method is used for performing dimension reduction processing on the word, the part of speech and the word length local characteristic.

Connecting the operation layer: and the local character merging module is used for merging the word, the part of speech and the word length local character after the dimension reduction processing to obtain a fusion character matrix. Different weights are given to the word, the part of speech and the word length local features for fusion when merging is carried out, for example, the weights of the word, the part of speech and the word length local features are respectively K1, K2 and K3 when fusing, so that different local features can play the roles of different proportions according to the importance degrees of the local features, and the fused features can reflect the overall characteristics of the text better. Wherein K1, K2 and K3 are all numbers between 0 and 1, and K1+ K2+ K3 is 1. How to determine the sizes of K1, K2 and K3 is very important, and the specific steps are as follows: taking a plurality of groups of different values for K1, K2 and K3, enabling the values to satisfy that K1, K2 and K3 are all numbers between 0 and 1, and enabling K1+ K2+ K3 to be 1, for example, the first group of values are K11, K21 and K31, the second group of values are K12, K22 and K32, and the third group of values are K13, K23 and K33, then carrying out text classification tests by using connection operation layers formed by different values, calculating to obtain classification accuracies corresponding to the different values, and taking a group of values with the highest classification accuracy as the values of the connection operation layers.

Feature selection operation layer: and the method is used for deleting part of features randomly by adopting a dropout mechanism on the fusion feature matrix to obtain a classifier input matrix.

And (4) a classification output layer: and the method is used for taking the classifier input matrix as input, analyzing the characteristics and finishing multi-classification of the text.

Further, a penalty item of L2 regularization is given to the classification output layer weight to constrain the parameters. Softmax was used as the classifier and softmax loss as the loss function of the classifier.

The text multi-classification method based on the word, part of speech and word length three-embedded convolutional neural network model comprises the steps that an input layer receives a preprocessed text, chunk-max posing is used in a maximum feature pooling operation layer in blocks for improving classification accuracy, local features of the text are automatically extracted after convolution and pooling operation, a dropout mechanism is adopted for fusing feature matrixes after series connection and fusion, partial features are randomly deleted, the obtained classifier input matrix is added into a softmax classifier, a L2 regularized penalty term is given to classification output layer weight, and Adam is selected as a gradient updating method.

In this example, the principle of the text multi-classification method based on the three embedded convolutional neural network models of word, part of speech and word length is as follows: during input, three inputs of words, parts of speech and Word Length are created, and after input layer processing, a Word Feature vector mapping matrix (WF), a part of speech Feature vector mapping matrix (VF) and a Word Length Feature vector mapping matrix (LF) are obtained. Wherein, the word feature vector mapping matrix: the word feature vector mapping matrix is obtained by word2vec calculation after segmentation by a jieba word segmentation tool; part of speech Feature vector mapping matrix (VF): the method comprises the steps of obtaining a part-of-speech vector mapping matrix through extraction and calculation after the jieba word segmentation; word Length Feature vector mapping matrix (LF): the word length vector mapping matrix is obtained by extracting and calculating the word length vector after the jieba word segmentation. And taking WF, VF and LF as three layers of input of a convolution layer in a convolution neural network model, embedding three layers of words, parts of speech and word length into the convolution layer, calculating the convolution layer and a pooling layer, and performing final feature fusion on the WF, the VF and the LF to obtain a word vector space set which is recorded as VT:

VT＝WF×index(VF)×index(LF)

where index () is an index function.

A text multi-classification method based on a word, part of speech and word length three-embedded convolutional neural network model is shown in FIG. 2 and comprises the following steps:

and S1, preprocessing data to obtain an input data set of a word, part of speech and word length embedded convolutional neural network model.

In this embodiment, the data preprocessing flow is as shown in fig. 2, and a new word recognition method is used to recognize new chinese words through sample set analysis, and perform word segmentation after expanding a text word segmentation corpus to obtain a word, part of speech and word length library, which is used as an input data set of a word, part of speech and word length embedded convolutional neural network model.

The problems that the updating speed of a Chinese word stock is slow and the generation speed of new words is high exist in the Chinese word segmentation at present, and the Chinese word stock cannot be accurately distinguished when text word segmentation is carried out. In this embodiment, the new word recognition method combines the recognized new chinese words with the existing word bank to obtain an expanded text word segmentation corpus. The data structure used for new word recognition is a dictionary tree, the root node of the tree does not contain characters, each node except the root node contains one character, and a character string formed by connecting the root node to a certain node path is a formed new word. The dictionary tree data structure is used because the dictionary tree can carry out quick calculation statistics on the frequency of characters, and a word with the highest word forming possibility can be quickly selected as an output result.

The new word recognition method takes the solidity, the information entropy and the point mutual information of the front and rear adjacent words as reference information. Is provided with L_av(ξ) indicates the number of different left-hand words immediately adjacent to word string ξ, R_av(ξ) indicates the number of different words to the right immediately adjacent to word string ξ, then L may be used_av(ξ) and R_av(ξ) to indicate the possibility of word formation of the word string ξ in different semantic environments, the calculation formula of the degree of freezing of the preceding and following adjacent words is as follows:

G_av(ξ)＝logAv(ξ)

wherein, Av (ξ) ═ min { L }_av(ξ),R_av(ξ)}。

The information entropy is used for measuring the stability of two fragment word formation, and generally, the larger the information entropy value is, the higher the instability of the interrelation of characters and word fragments left and right adjacent characters is, the more possible the information entropy value is as an independent individual word formation; the smaller the information entropy value, the more closely the relationship between the explanatory word, word fragment, and left and right adjacent words, the greater their likelihood of becoming a word. The formula for calculating the information entropy values of the fragment characters and the words is as follows:

wherein, I (x)_i) Denotes x_iSelf information of (1), P (x)_i) Is denoted by x_iProbability distribution of (2). The information entropy of the left adjacent words of the fragmented characters and words is HL, and the calculation formula is as follows:

the formula of the information entropy of the right adjacent word is as follows:

in data analysis, Point Mutual Information (PMI) is used for measuring the correlation between two things, and the PMI is used for measuring the coupling of the co-occurrence of N-element words in new word recognition, and the calculation formula is as follows:

wherein, x and y are words or characters in the corpus, and if x and y are independent, P (x, y) ═ P (x) P (y); the larger the value of PMI, the more relevant both x, y are. In the process of finding a new word, the larger the mutual information value between two independent words is, the more the possibility that the two words are combined into one word is.

Unlike English text processing, English can distinguish words by spaces, Chinese text is continuous, and words need to be extracted by text segmentation technology. In this embodiment, a Python chinese word segmentation component "Jieba" is adopted, considering that the meaning of the text in this embodiment is an enterprise basic attribute brief description, for example: the traffic bank is one of the most important financial service providers in China, and the business range covers the comprehensive financial services of commercial banks, securities, trusts, financial leasing, fund management, insurance, offshore finance and the like. As the first nationwide commercial bank which is approved to carry out deep reformation, the traffic bank is moving towards the strategic target of ' moving on the international and comprehensive roads ' and building a first-class public stock bank group which is characterized by wealth management '. In this paragraph, it can be seen that the category of the enterprise can be determined through keywords such as "bank", "securities", "finance" and the like according to text information, but in terms of proper nouns of some industries, the Jieba word segmentation cannot be well distinguished, and developers can specify a custom dictionary so as to contain words that do not exist in the Jieba lexicon. Although the jieba has the capacity of recognizing new words, the self-addition of the new words can ensure higher accuracy.

In this embodiment, the text data set includes the profiles of 8000 enterprises in 10 industries, such as finance, internet, heavy industry, energy and raw materials, biomedical, video entertainment, and the real estate industry. Before text word segmentation, some professional nouns of the enterprises are gathered, and a knowledge base is expanded in a self-defined dictionary mode, so that word segmentation results are more accurate.

The word segmentation is carried out on the expanded word segmentation material base, and the format of the text data after the word segmentation stop words are removed is as follows: transportation bank/nt/4 China/n/2 main/b/2 finance/n/2 service/vt/2 supplier/n/3 ….

And S2, receiving the preprocessed input data set by the word, part of speech and word length embedded convolutional neural network model, and carrying out word vector coding to obtain a word characteristic vector mapping matrix, a part of speech characteristic vector mapping matrix and a word length characteristic vector mapping matrix.

In order to convert words in natural language into dense Word vectors which can be understood by a computer, Word and Word vector codes are obtained based on Word2vec model training. Since Word2vec uses unique Word vector calculation during Word vector training and cannot well represent and process Chinese ambiguous words, the training is divided into Word and Word vector training (three-layer pipeline embedding of Word, part of speech and Word length), for example: transportation bank/nt/4 China/n/2 main/b/2 finance/n/2 service/vt/2 supplier/n/3. It should be noted that, in other text corpora, the word property and word length pipeline embedding can be changed into the word property, word length and word pinyin pipeline embedding, so as to improve the recognition capability of text semantics.

And S3, constructing a word lookup table, a part of speech lookup table and a word length lookup table.

The word lookup table is used for storing a word characteristic vector mapping matrix, the word characteristic lookup table is used for storing a word characteristic vector mapping matrix, and the word length lookup table is used for storing a word characteristic vector mapping matrix.

S4, obtaining a word matrix, a part of speech matrix and a word length matrix which need to be processed by searching the word lookup table, the part of speech lookup table and the word length lookup table, and carrying out convolution operation on the obtained word matrix, the part of speech matrix and the word length matrix to obtain local characteristics of words, parts of speech and word length.

In this embodiment, a convolution kernel with a size of h × d is used to perform convolution operation on the word matrix, the part-of-speech matrix, and the word length matrix.

The following description will be given taking local features as examples: assuming that the dimension of the word vector is d and the length of the sentence is h, the sentence matrix can be represented as: x is formed by R^dThe sentence matrix dimension is h × d. Firstly, input the spliced word vector x_i:j：

In the formula, x_i(x_i∈R^d) Is a word vector of the ith word of a sentence of length n, with dimension d.For the splicing operation, x_i:jIs [ i, j ] th in a sentence of length n]And splicing word vectors of the words.

If the convolution kernel size is h × d, the feature mapping of each word window is:

S_i＝f(w×x_i:i+h-1+b)

wherein x is_i:i+h-1A presentation vocabulary window; w is the weight matrix of the convolution kernel; h is the size of the word input in the convolution kernel, i.e. the size of the convolution kernel window, b is the bias term, and f is the activation function.

In the neural network model, there are many activation functions commonly used, such as a sigmod function, a tanh function, and the like. In this embodiment, the linear activation function Relu having the highest convergence speed is selected as the activation function.

For the spliced word vector x in the sentence with the length of n_1:h、x_2:h+1、…、x_n+h-1:nPerforming convolution operation to obtain a word local feature mapping matrix S, wherein:

S＝[s₁,s₂,s₃,...,s_n-h+1]

and S5, performing dimension reduction processing on the word, the part of speech and the word length local characteristic.

And performing dimensionality reduction processing on the local characteristics of the words, the part of speech and the word length to avoid the occurrence of an over-fitting phenomenon, and adopting a block maximum characteristic Pooling (chunk-Max Pooling) down-sampling strategy in a block maximum characteristic Pooling operation layer. The idea of chunk-Max Pooling is: a certain characteristicAll the eigenvectors of the Convolution layer corresponding to the extractor (Filter) are segmented, and after the eigenvectors are cut into a plurality of segments, the Top n eigenvalues are respectively obtained in each segment. A plurality of relevant local features can be captured through a chunk-Max Pooling downsampling strategy, and redundant features are removed. In this embodiment, we will divide the convolutional layer word, the word character and the word length vector into m segments, and each segment takes the first n maximum eigenvalues as S_i(i is more than or equal to 1 and less than or equal to m), and obtaining a characteristic matrix after each section is subjected to convolution operationFor feature matrixObtaining each section of classification characteristic matrix after down-sampling The calculation formula is as follows:

wherein f is_flatten() Is a dimension reduction operation, which compresses a multi-dimensional feature matrix into a one-dimensional, f_sort() Is a sorting function.

And S6, merging the word, the part of speech and the word length local features after the dimension reduction processing to obtain a fusion feature matrix. Different weights are given to the word, the part of speech and the word length local features for fusion when merging is carried out, for example, the weights of the word, the part of speech and the word length local features are respectively K1, K2 and K3 when fusing, so that different local features can play the roles of different proportions according to the importance degrees of the local features, and the fused features can reflect the overall characteristics of the text better. Wherein K1, K2 and K3 are all numbers between 0 and 1, and K1+ K2+ K3 is 1. How to determine the sizes of K1, K2 and K3 is very important, and the specific steps are as follows: taking a plurality of groups of different values for K1, K2 and K3, enabling the values to satisfy the condition that K1, K2 and K3 are all numbers between 0 and 1, and enabling K1+ K2+ K3 to be 1, wherein for example, the first group of values are K11, K21 and K31, the second group of values are K12, K22 and K32, and the third group of values are K13, K23 and K33, then carrying out text classification tests by utilizing connecting operation layers formed by different values, calculating to obtain the classification accuracies corresponding to the different values, and taking a group of values with the highest classification accuracy as the values of the connecting operation layers.

Classifying m segments into feature matrixPerforming fusion to obtain a fusion characteristic matrix V_tThe calculation process is as follows:

and S7, randomly deleting partial features of the fusion feature matrix by using a dropout mechanism to obtain a classifier input matrix.

In order to improve the generalization capability of the model, a dropout mechanism is adopted for the fusion feature matrix, and partial features are deleted randomly to obtain a classifier input matrix. In an embodiment, the dropout value is set to 0.5, i.e., half of the parameters are randomly discarded.

And S8, inputting the II into a classification output layer, analyzing the characteristics and completing multi-classification of the text.

In this embodiment, a penalty term of L2 regularization is given to the classification output layer weight, and the parameters are constrained. Softmax was used as the classifier and softmax loss as the loss function of the classifier.

And S9, training the word, part of speech and word length embedded convolutional neural network model.

In this embodiment, the word, part of speech and word length three-embedded convolutional neural network model trains parameters by using a gradient descent method Adam, and the Adam algorithm dynamically adjusts the learning rate for each parameter according to the first moment estimation and the second moment estimation of the gradient of each parameter by using a loss function. Adam is also a gradient descent-based method, but the learning step size of the parameter at each iteration has a certain range, so that a large learning step size cannot be caused due to a large gradient, and the value of the parameter is stable.

In order to evaluate the reliability of the obtained convolutional neural network model, the original text corpus is subjected to k-fold cross validation and cross validation in the model training stage.

And S10, evaluating the multi-classification effect of the text.

In this embodiment, the evaluation of the multi-classification effect of the text is mainly measured from the classification accuracy, and a sample set T and a sample y with a given sample capacity N are set_iIs represented by n_iAnd m_iThe text multi-classification effect evaluation formula is as follows:

wherein, | n_i＝m_iIf | is true, it takes 1, and if | is false, it takes 0.

Setting specific parameters:

word, part of speech and word length vector parameter settings: in the embodiment, a data set based on enterprise introduction text information in a pull-up network recruitment website is constructed, wherein the data set comprises 8000 pieces of enterprise basic information, is divided into 10 categories, and is 10 industries of finance, internet, heavy industry, energy materials, biomedical science, audio-visual entertainment, land and industrial industry, agricultural products, logistics and home services. In the training Word vector, a Skip-gram model in a Word2vec tool is used as a training model, and Word vector pre-training is carried out on 8000 pieces of text information of the enterprise brief introduction. The dimension of the word vector is set to 256, words with frequency lower than 3 are filtered, and finally, the vocabulary size of the word vector is trained to 8616. And for the part-of-speech vector, initialization is performed in a random manner, the dimension is 64, and the size of the part-of-speech table is 96. Setting word vector training parameters: a Skip-gram model is selected, a context sliding window is 5, and iter iteration times are 40.

Three embedded convolutional neural network model training parameter settings of words, parts of speech and word length: when a convolution neural network model is embedded with three words, parts of speech and word length for training, the parameters are set as follows: the sizes h of convolution kernel windows are respectively 3, 4 and 5, and the maximum word vector d is 300; the number of each convolution kernel is 100; dropout probability is 0.5; the chunk was 5 in chunk-max pooling samples.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. The utility model provides a three embedding convolution neural network models of word, part of speech and word length which characterized in that, includes consecutive input layer, lookup table, convolution layer, characteristic pooling layer, characteristic connecting layer, characteristic selection layer and categorised output layer, wherein:

a lookup table: the word characteristic vector mapping matrix, the part-of-speech characteristic vector mapping matrix and the word length characteristic vector mapping matrix are stored and comprise a word lookup table, a part-of-speech lookup table and a word length lookup table;

and (3) rolling layers: the word matrix, the part of speech matrix and the word length matrix are used for obtaining a word matrix, a part of speech matrix and a word length matrix which need to be processed through a lookup table, and performing convolution operation on the obtained word matrix, the obtained part of speech matrix and the obtained word length matrix to obtain local characteristics of words, parts of speech and word length;

characteristic connecting layer: the word length local feature fusion processing module is used for carrying out merging processing on the words, the part of speech and the word length local features after the dimension reduction processing to obtain a fusion feature matrix; different preset weights are given to words, part of speech and word length local characteristics for fusion when merging processing is carried out;

a feature selection layer: the system is used for deleting part of features randomly by adopting a dropout mechanism on the fusion feature matrix to obtain a classifier input matrix;

wherein,

the step of determining the preset weight value comprises the following steps: taking a plurality of groups of different preset weights; embedding three embedded convolutional neural network models of words, parts of speech and word length formed by connecting operation layers formed by different preset weights into the test of text classification; obtaining the classification accuracy corresponding to the word, part of speech and word length three-embedded convolutional neural network model formed by connecting operation layers formed by different preset weights according to test calculation; and taking the preset weight corresponding to the word, the part of speech and the word length with the highest classification accuracy embedded into the convolutional neural network model as the preset weight of the connection operation layer.

2. The word, part of speech and word length three-embedded convolutional neural network model of claim 1, wherein when training with the word, part of speech and word length three-embedded convolutional neural network model, its parameter settings are as follows: the sizes h of convolution kernel windows are respectively 3, 4 and 5, and the maximum word vector d is 300; the number of each convolution kernel is 100; dropout probability is 0.5; the chunk was 5 in chunk-max pooling samples.

3. The text multi-classification method according to claim 1, characterized in that the feature pooling layer employs a maximum feature pooling dimension reduction strategy within blocks; the classification output layer uses softmax as a classifier.

4. A text multi-classification method based on a word, part of speech and word length three-embedded convolutional neural network model, which is implemented based on the word, part of speech and word length three-embedded convolutional neural network model of any one of claims 1-3, and comprises:

the word length local feature fusion processing module is used for carrying out merging processing on the words, the part of speech and the word length local features after the dimension reduction processing to obtain a fusion feature matrix; different weights are given to the words, the part of speech and the word length local characteristics for fusion when the combination processing is carried out;

and inputting the classifier input matrix into a classification output layer, analyzing the characteristics and finishing multi-classification of the text.

5. The text multi-classification method according to claim 4, characterized in that the data preprocessing comprises: and identifying new Chinese words by adopting a new word identification method, expanding a text word segmentation corpus and then segmenting words to obtain a word, part of speech and word length library.

6. The text multi-classification method according to claim 5, wherein the new word recognition method uses the degree of solidity, entropy, and mutual point information of adjacent words as reference information.

7. The method of claim 6, wherein L is set_av(ξ) indicates the number of different left-hand words immediately adjacent to word string ξ, R_av(ξ) the number of different characters on the right side immediately adjacent to the word string ξ is represented by L_av(ξ) and R_av(ξ) the probability that a word string ξ forms a word in a different semantic environment is represented, and the degree of freezing of adjacent words is calculated as follows:

G_av(ξ)＝logAv(ξ)

wherein, Av (ξ) ═ min { L }_av(ξ),R_av(ξ)}；

The calculation formula of the information entropy is as follows:

wherein, I (x)_i) Denotes x_iSelf information of (1), P (x)_i) Is denoted by x_iA probability distribution of (a);

the calculation formula of the point-to-point information is as follows:

8. The text multi-classification method according to any one of claims 4-7, characterized in that obtaining word local features comprises:

① input spliced word vector x_i:j：

S_i＝f(w×x_i:i+h-1+b)

S＝[s₁,s₂,s₃,...,s_n-h+1]。

9. the text multi-classification method according to claim 8, further comprising the steps of: and (3) training a word, part of speech and word length three-embedded convolutional neural network model by adopting a gradient descent method Adam.

10. The text multi-classification method according to any one of claims 4-7 and 9, further comprising the steps of: text multi-classification effect evaluation(ii) a Let the sample set T, sample y for a given sample size N_iIs represented by n_iAnd m_iThe text multi-classification effect evaluation formula is as follows:

wherein, | n_i＝m_iIf | is true, it takes 1, and if | is false, it takes 0.