CN109766553A

CN109766553A - A kind of Chinese word cutting method of the capsule model combined based on more regularizations

Info

Publication number: CN109766553A
Application number: CN201910018546.2A
Authority: CN
Inventors: 李明正; 李思; 孙忆南; 徐雅静; 王蓬辉; 赵建博; 刘伟杰
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2019-01-09
Filing date: 2019-01-09
Publication date: 2019-05-17

Abstract

The present invention provides a kind of Chinese word cutting methods of capsule model combined based on more regularizations, by increasing capsule slide window capsule sliding window, the technical issues of capsule model migration is applied in natural language processing NLP sequence labelling task i.e. Chinese word segmentation task, solves the task that capsule model is not particularly suited for sequence labelling；Multiple regularization terms are combined, realize simple field migration, capsule model is adapted in sequence labelling task, completes the Chinese word segmentation of higher accuracy, help more complicated natural language processing task by the present invention；By the joint of more regular terms, the generalization ability of model is improved, certain field migration is realized, the mark of artificial corpus can be reduced, reduces the artificial and time cost for manually marking corpus in natural language processing research.

Description

A kind of Chinese word cutting method of the capsule model combined based on more regularizations

Technical field

The present invention relates to Internet technical field more particularly to a kind of Chinese of the capsule model combined based on more regularizations Segmenting method.

Background technique

As the technologies such as information technology, machine learning develop, the technology for automatically processing information is gradually applied to various fields Scape, such as to user preference is excavated in the comment on commodity of film comment, shopping, the short summary etc. an of article is automatically generated, It requires to carry out automatic processing to text, and as Chinese user becomes increasingly active on the internet, the information of generation is also got over Come it is more, it is more necessary to the automatic processing of text information.The appearance of these situations so that natural language processing related skill Art is widely applied to each corner of society.And for natural language processing technique, at domestic natural language For the development of reason technology, Chinese Automatic Word Segmentation technology is wherein the most basic also the most key one of technology again.

Chinese word segmentation task is exactly that a Chinese is divided boundary according to word, so that during machine is easier to understand The technology of literary language.Chinese is different from English, has space as boundary between English word, and for Chinese, especially Modern Chinese, the composition of word generally are connected to obtain by two and more than two Chinese characters, and cannot by simply with Word is boundary understanding.This allows for computer when automatically processing Chinese text, needs first to carry out Chinese word segmentation to text, For many Chinese natural language processing techniques, for example, part-of-speech tagging, name Entity recognition, text classification, text snippet generate, The tasks such as event extraction, information retrieval, are all highly dependent on Chinese word segmentation.The quality of Chinese word segmentation will be to these dependent on Chinese The technology of participle task generates actively or negative effect, it can be seen that Chinese word segmentation task basic and establishes one Good Words partition system, the importance automatically processed for high performance text information.

Being done for task of Chinese word segmentation be exactly by algorithm by computer automatically between the word and word in Chinese language text Automatically plus boundary markers such as spaces.Chinese Word Automatic Segmentation has developed more than 20 years, the dictionary matching method since most, that is, most Classical Direct/Reverse maximum matching starts, and disambiguates the segmentation methods of model to probability is added, then arrive traditional condition random field (Conditional Random Fields, CRFs), structuring perceptron (Structure Perceptron), maximum entropy mould Type (Maximum Entropy, ME) and current neural network (Neural Network, NN) participle model.Segmentation methods It itself is being constantly progressive, and most sequence labelling task mainly relies on machine learning algorithm at present, sequence labelling passes through Algorithm distributes respective class label to each member of the sequence of observations.Chinese word segmentation task is generally also by as a sequence Mark task is handled, and by way of occurring in word to word and position carries out class indication to word, obtaining word sequence Word segmentation result.

For neural network method, extensive careful data mark will determine to the generation of the final result of each task Qualitatively influence, however extensive careful data mark needs to be completed by people, thereby produce great human cost and Time cost.Field migration task then wishes to reduce other field by migrating the existing data marked Data mark, such as Chinese word segmentation, the data marked on a large scale concentrate on News Field, in comparison other field Text data it is then fewer and fewer, therefore, how the key message of extensive labeled data collection moved into only a small amount of mark On the data set for the other field not marked even, become difficult point.Field migrating technology becomes more important therewith.

As shown in Figure 1, being mentioned in " Dynamic Routing between Capsules " article of one of prior art The model to solve the problems, such as Handwritten Digit Recognition technical solution:

Firstly, converting the matrix of 28x28 size as input for handwritten numeral picture；Second, the matrix of input is led to It crosses a convolutional layer and extracts feature, the convolution kernel of convolution operation shares 256, and convolution kernel size is 9x9, and convolution kernel step-length is 1, Using activation primitive, preferably line rectification function (Rectified Linear Unit, ReLU) output is 256 characteristic patterns, Every Zhang great little is 20x20；Previous step is exported as input, passes through primary roll gum deposit cystoblast (convolutional by third Primary capsule layer), 32 characteristic patterns are obtained, every figure is made of capsule, shares the capsule of 6x6, each capsule The vector tieed up for one 8；4th, previous step is exported as input, by digital capsule layer, this layer of size needs are classified Number shares 10 classification for Handwritten Digit Recognition task, therefore this layer shares 10 capsules, and each capsule is 16 dimensions Vector；5th, the probability of each classification is obtained by NONLINEAR CALCULATION, maximum probability is the classification classified.

As shown in Fig. 2, calculating outputting and inputting for capsule realizes that dynamic routing algorithm is only containing by dynamic routing algorithm Have and is carried out between the layer of capsule.

The length of capsule output vector can indicate the existing probability under current input of entity representated by capsule.Cause This, ensures to be compressed to compared with short amount close to zero and larger vector is compressed using non-linear " squeezing (squash) " function To close to 1.So that there is the study of differentiation to can be good at utilizing this nonlinear function:

Wherein v_jIt is the output of j-th of capsule, s_jIt is input summation.

For all capsules in addition to first layer, capsule s_jIt is total input be in lower layer's predicted vectorOne power Weight and, and predicted vectorIt is that the capsule of lower layer is exported into u_iWith a weight matrix W_ijIt is multiplied.

Wherein c_ijIt is the coefficient of coup is determined by iteration routing procedure.

The coefficient of coup between capsule i and be 1 in upper one layer of whole capsule summations, and the coefficient of coup is routed by one Softmax decision, initial logic b_ijIt is logarithm prior probability, i.e. capsule i should be coupled with capsule j

Priori logarithm can carry out distinguishing study as other weights simultaneously.They rely on part and two capsules Type, rather than depend on current image input.Later, the initial coefficient of coup is by measuring each glue in high one layer The current output v of capsule j_ijWith the prediction of capsuleBetween consistency repeat to refine.

Consistency is only scalar product,This consistency is considered log-likelihood ratio, and to connection glue Before all coefficients of coup of capsule i and higher capsule calculate new value, this consistency is added to initial logic b_ij。

In convolution capsule layer, each capsule exports the local area network of a vector to each type of the capsule in high one layer Lattice, and each type of each section and capsule for grid uses different transformation matrixs.

As shown in figure 3, the two of the prior art " Deep Learning for Chinese Word Segmentation And POS Tagging " article proposes that feedforward neural network solves the problems, such as the technical solution of Chinese word segmentation:

Firstly, converting vector for Chinese character, each identical Chinese character is converted into d and ties up identical vector, passes through One dictionary is realized；Second, converting Hidden unit for previous step vector using feedforward neural network indicates；Third, by upper one Step output carries out Nonlinear Mapping as input, by sigmoid function；4th, previous step is exported as input, before To neural network, dimension dimension classification quantity is exported, in Chinese word segmentation, classification quantity is 4；5th, previous step is four-dimensional Softmax is made in output, obtains each word and is divided into different classes of probability, these probability is decoded by viterbi algorithm, meter Optimal sequence is calculated, is segmented.

Inventor has found in the course of the study: skill existing for " Dynamic routing between capsules " In art:

1, just for the task of Handwritten Digit Recognition, overall model is not particularly suited for sequence labelling task；

2, the regularization term of reconstruction image is not particularly suited for Chinese word segmentation task；

Following disadvantage exists in the prior art since above-mentioned technical problem results in:

1, accuracy rate is general；

2, capsule model is not particularly suited for the task of sequence labelling；

3, Generalization Capability is general, i.e., can only be trained and test on same corpus, when the test of replacement other field When corpus, performance can sharply decline.

Summary of the invention

In order to solve the above-mentioned technical problems, the present invention provides a kind of Chinese of capsule model combined based on more regularizations Segmenting method realizes the Chinese word segmentation of higher accuracy based on the application to capsule model, so that training is surveyed on same corpus The result of examination is more accurate, by the change and extension to capsule model, realizes capsule model answering in sequence labelling task With；By the joint to more regularization terms, realize on capsule model for the field migration effect of Chinese word segmentation task.

The present invention provides a kind of Chinese word cutting method of capsule model combined based on more regularizations, training particular areas When corpus, this method comprises:

Step 1: identify the maximum length of sentence in corpus, using pre-stored character by corpus it is insufficient most The sentence length polishing of long length is to maximum length；

Step 2: the Chinese character of sentence in corpus, which is mapped as vector, to be indicated；

Step 3: vector is indicated to extract feature by convolutional layer；

Step 4: the feature extracted is input to primary capsule layer (primary capsule layer), pass through convolution Operation, obtaining a scalar to each word of each characteristic pattern indicates, the expression of obtained scalar is connected to become vector and is made For a capsule, one is obtained by primary capsule (primary capsules)；

Step 5: obtained primary capsule (primary capsules) is passed through capsule slide window (capsule Sliding window) character representation of some word is obtained, to adapt to sequence labelling task；

Step 6: routing will be used as by the feature obtained after capsule slide window (capsule sliding window) The input of algorithm obtains label capsule (tag capsules) by routing algorithm；

Step 7: label capsule (tag capsules) output modulus is obtained the label probability of each character；

Step 8: the label probability of each character and true label probability are made cross entropy respectively and input condition is random Field (Conditional Random Field, CRF) calculates likelihood probability；

Step 9: loss function is summed it up to obtain by cross entropy and log-likelihood probability by weight, by log-likelihood probability As the regular terms of loss function, is calculated by back-propagation algorithm (Backpropagation, BP) and update each layer weight of network.

Further, at least two field corpus of training, using one of field corpus as the target neck finally segmented When the corpus of domain, step 9 is replaced, is replaced as follows:

Step 9: loss function is by finally segmenting the corpus, log-likelihood probability, the corpus for being migrated participle field in field It sums it up to obtain by weight, wherein the likelihood probability of target domain corpus and as regularization term, meanwhile, the friendship of source domain corpus The weight for pitching entropy is less than non-regularization term.

Further, loss function is indicated by following formula:

Loss=λ₁cross entropy_target+λ₂Likelihood_target+λ₃cross entropy_source

Wherein, in formula first item indicate target domain corpus cross entropy, Section 2 be target domain likelihood probability, Section 3 is the cross entropy of source domain corpus.λ_nFor every weight；

Further, in non-training situation, when Chinese word segmentation, step 8 and step 9 are replaced.It replaces as follows:

The label probability for needing each character of the testing material segmented is decoded to obtain by step 8 by viterbi algorithm Optimal sequence completes participle.

Further, in step 2, the Chinese character of sentence in corpus, which is mapped as vector, to be indicated, comprising:

By mapping dictionary, the Chinese character of sentence in corpus is mapped as not sparse by the method being embedded in using word Vector indicates.

Further, by mapping dictionary, the Chinese character of sentence in corpus is mapped as by the method being embedded in using word Not sparse vector indicates, comprising:

Training corpus is traversed, and all unduplicated characters are found out, be each character number, each identical characters to Amount indicates identical, and the vector of kinds of characters indicates different, while one vector of setting indicates all not in training corpus set The character of middle appearance is unknown character；

In training network, dropout mechanism is introduced, at random by a part of parameter zero setting.

Further, in the step 3, vector is indicated to extract feature by convolutional layer, comprising:

Vector is indicated by obtaining a certain number of characteristic patterns after convolutional layer, every characteristic pattern represent one it is one-dimensional to Amount, vector dimension is sentence length, the vector that characteristic pattern represents is connected into matrix, the feature extracted for convolutional layer.

Further, logarithm prior probability b in the routing algorithm_ijUpdate be expressed as follows:

Wherein,For predicted vector, v_jIt is the output of j-th of capsule.

Further, the cross entropy formula is indicated by following formula:

Cross entropy=- ∑_ip_real(i)log(p_pred(i))

Wherein, p_real(i) true probability of each label of each character, p are indicated_pred(i) each of each character is indicated The prediction probability of label sums the cross entropy of each character to obtain the cross entropy of a word；

Likelihood probability is indicated by the following formula of formula:

Wherein, p (y_real) indicate the probability of correct sequence, all possible sequence of y ' expression in formula denominator；

Further, the loss function is indicated by following formula:

Loss=λ₁cross entropy+λ₂Likelihood

Wherein, cross entropy indicates that cross entropy, Likelihood indicate likelihood probability, λ_nIndicate preceding two power Weight.

The Chinese word cutting method of a kind of capsule model combined based on more regularizations provided by the invention, by by die of capsule Type adapts in sequence labelling task, completes the Chinese word segmentation of higher accuracy, helps more complicated natural language processing (Natural Language Processing, NLP) task；By the joint of more regular terms, the extensive energy of model is improved Power realizes certain field migration, can reduce the mark of artificial corpus, reduces and manually marks corpus in NLP research Artificial and time cost.

Detailed description of the invention

Fig. 1 is the capsule model schematic diagram to solve the problems, such as Handwritten Digit Recognition；

Fig. 2 is dynamic routing algorithm schematic diagram；

Fig. 3 is neural network structure schematic diagram；

Fig. 4 is a kind of Chinese word cutting method flow chart of capsule model combined based on more regularizations provided by the invention；

Fig. 5 is the flow chart of embodiment one；

Fig. 6 is convolution operation schematic diagram.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.Wherein, the abbreviation and Key Term occurred in the present embodiment is defined as follows:

NN:Neural Network neural network；

CNN:Convolutional Neural Network convolutional neural networks；

LSTM:Long Short-Term Memory shot and long term Memory Neural Networks；

CTB:Chinese Treebank Penn Chinese treebank；

CRFs:Conditional Random Fields condition random field；

FNN:Feedforward neural network feedforward neural network；

ReLU:Rectified Linear Unit line rectification function, is a kind of activation primitive；

ME:Maximum Entropy maximum entropy model；

NLP:Natural Language Processing natural language processing.

Embodiment one

Referring to shown in Fig. 4-6, Fig. 4,5 show a kind of capsule model combined based on more regularizations provided by the invention Chinese word cutting method, specifically, when training particular area corpus, this method comprises:

Step 1: identify the maximum length of sentence in corpus, using pre-stored character by corpus it is insufficient most The sentence length polishing of long length is to maximum length.

Wherein, it is 128 that maximum sentence length is arranged in the present embodiment, corpus CTB6.0；The purpose of this step is All sentences being input in network model are all fixed as uniform length.

Step 2: the Chinese character of sentence in corpus, which is mapped as vector, to be indicated；By mapping dictionary, it is embedded in using word Method, the Chinese character of sentence in corpus, which is mapped as not sparse vector, to be indicated.

Further, the Chinese character of sentence in corpus is mapped as not using word embedding grammar by mapping dictionary Sparse vector indicates, comprising:

Training corpus is traversed, and all unduplicated characters are found out, be each character number, each identical characters to Amount indicates identical, and the vector of kinds of characters indicates different, while one vector of setting indicates all not in training corpus set The character of middle appearance is unknown character；In training network, dropout mechanism is introduced, at random by a part of parameter zero setting.

It is 200 to each word setting map vector dimension in the present embodiment；This step passes through a mapping dictionary and realizes, Character, which is mapped as not sparse vector, indicates that training corpus is traversed first, is found out all unduplicated characters, is each Character number, it is assumed that share M character, then establish 200 rows (word DUAL PROBLEMS OF VECTOR MAPPING dimension is 200), the matrix of M+1 column, often The vectors of a identical characters indicates identical, and the vector of kinds of characters indicates different, other than M character, also set up one to Amount indicates institute either with or without the character occurred in training corpus set, for unknown character.In this step, it is automatic to use for reference denoising The thought of encoder, invention introduces dropout mechanism, at random by a part of parameter zero setting, to keep away in this way in training network Exempt from over-fitting and provides a kind of side for effectively substantially combining many different neuronal structures that indexation increases Method.

Step 3: vector is indicated to extract feature by convolutional layer；Vector is indicated by obtaining a fixed number after convolutional layer The characteristic pattern of amount, every characteristic pattern represent an one-dimensional vector, and vector dimension is sentence length, and the vector that characteristic pattern is represented connects It is connected into matrix, the feature extracted for convolutional layer.

The characteristic pattern quantity for extracting feature is set as 200, and every characteristic pattern is an one-dimensional vector, and vector dimension is sentence The characteristic pattern (vector) obtained after convolution is connected into matrix by length, the feature extracted for convolutional layer；

In this step, the convolutional neural networks of single layer are good at the extraction of local feature, and multilayer convolutional neural networks Superposition, also can play good effect to the study of context.As shown in Figure 6 is indicated for the convolution operation of text.

Step 4: the feature extracted is input to primary capsule layer (primary capsule layer), pass through convolution Operation, obtaining a scalar to each word of each characteristic pattern indicates, the expression of obtained scalar is connected to become vector and is made For a capsule, a primary capsule (primary capsules) is obtained；

In the present embodiment, convolution operation is had altogether eight times, and operation obtains one to some word of each characteristic pattern every time Scalar indicates, the scalar expression that eight times obtain is connected to become vector, as a capsule, finally obtain one is indicated by capsule Eigenmatrix.

Capsule sliding window is set as 7 in the present embodiment；

In this step, capsule slide window (capsule sliding window) is a sliding window, by previous step Eigenmatrix is indicated by capsule in rapid, wherein each column represent a character, sliding window is selected for each character, It is influenced by the character of front and back n, i.e., by the total n character in front and back (including this character itself), as dynamic routing algorithm Input.

Step 6: routing will be used as by the feature obtained after capsule slide window (capsule sliding window) The input of algorithm obtains label capsule tag capsules by routing algorithm；

The number of label capsule is the quantity 4 of classification in the present embodiment, and the dimension of label capsule is 16 dimensions, and dynamic routing is calculated The number of iterations of method is 3 times；

The step of routing algorithm, enumerates as follows:

First, it will pass through in the capsule of primary capsule layer (primary capsule layer), that is, vector Squash Nonlinear Mapping formula (2-1) obtains the output v of primary capsule layer (primary capsule layer)_j.

Second, using the output of primary capsule layer (primary capsule layer) as trademark adhesive cystoblast (tag Capsule input) is calculated trademark adhesive cystoblast (tag capsule) capsule by formula (2-2), then these capsules is passed through Squash Nonlinear Mapping is crossed, the output of trademark adhesive cystoblast (tag capsule) is obtained.

For the logit parameter b of softmax_ijUpdate by formula (3-1) indicate:

Wherein,For predicted vector, v_jIt is the output of j-th of capsule.

The calculation formula of the coefficient of coup such as formula (3-2) indicates:

Step 7: label capsule tag capsules output modulus is obtained the label probability of each character；

Further, the loss function is indicated by following formula:

Loss=λ₁cross entropy+λ₂Likelihood

Wherein, at least two field corpus of training, using one of field corpus as the target domain language finally segmented When material, step 9 is replaced, is replaced as follows:

Further, loss function is indicated by following formula:

Loss=λ₁cross entropy_target+λ₂Likelihood_target+λ₃cross entropy_source

Wherein, in formula first item indicate target domain corpus cross entropy, Section 2 be target domain likelihood probability, Section 3 is the cross entropy of source domain corpus.λ_nFor every weight.

In non-training situation, when Chinese word segmentation, step 8 and step 9 are replaced, (i.e. in non-training situation, step Rapid nine are removed), it replaces as follows:

Further, the cross entropy formula is indicated by following formula:

Cross entropy=- ∑_ip_real(i)log(p_pred(i))

Wherein, p_real(i) true probability of each label of each character, p are indicated_pred(i) each of each character is indicated The prediction probability of label sums the cross entropy of each character to obtain the cross entropy of a word.

Likelihood probability is indicated by the following formula of formula:

Wherein, p (y_real) indicate the probability of correct sequence, all possible sequence of y ' expression in formula denominator.

One preferred embodiment, as shown in figure 4, first reflecting each of sentence character by taking " A friend in need is a friend indeed " as an example It penetrates as a dense vector, vector dimension n, by convolution, extraction obtains the feature of each word in a word；Convolution is mentioned The feature obtained is input in capsule layer, and the feature that convolution obtains is passed through to the feature for being connected to primary capsule layer, is passed through One capsule slide window and iteration router-level topology obtain the feature of trademark adhesive cystoblast, calculate vector field homoemorphism in trademark adhesive cystoblast Obtain the probability of each label；Finally, inferring layer by label, using viterbi algorithm, in short optimal sequence mark is found out Note realizes participle.

The embodiment of the present invention one is by increasing capsule slide window (capsule sliding window), by capsule model Migration is applied to the Chinese word segmentation that higher accuracy is completed in NLP sequence labelling task (Chinese word segmentation task), helps more multiple Miscellaneous natural language processing (Natural Language Processing, NLP) task；By the joint of more regular terms, promoted The generalization ability of model realizes certain field migration, can reduce the mark of artificial corpus, reduces in NLP research The artificial and time cost of artificial mark corpus.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims

1. a kind of Chinese word cutting method of the capsule model combined based on more regularizations, which is characterized in that training particular area language When material, this method comprises:

Step 1: identifying the maximum length of sentence in corpus, most greatly enhanced using pre-stored character by insufficient in corpus The sentence length polishing of degree is to maximum length；

Step 3: vector is indicated to extract feature by convolutional layer；

Step 4: the feature extracted is input to primary capsule layer (primary capsule layer), grasped by convolution Make, obtaining scalar to each word of each characteristic pattern indicates, using the expression of obtained scalar be connected to become vector as One capsule obtains a primary capsule (primary capsules)；

Step 5: obtained primary capsule (primary capsules) is passed through capsule slide window (capsule sliding Window the character representation of some word is obtained) to adapt to sequence labelling task；

Step 6: will be by the feature that is obtained after capsule slide window (capsule sliding window) as routing algorithm Input, obtain label capsule (tag capsules) by routing algorithm；

Step 8: the label probability of each character and true label probability are made cross entropy and input condition random field respectively (Conditional Random Field, CRF) calculates likelihood probability；

Step 9: loss function is summed it up to obtain by cross entropy and log-likelihood probability by weight, using log-likelihood probability as The regular terms of loss function is calculated by back-propagation algorithm (Backpropagation, BP) and updates each layer weight of network.

2. the method as described in claim 1, which is characterized in that at least two field corpus of training, with one of field language When material is as the target domain corpus finally segmented, step 9 is replaced, is replaced as follows:

Step 9: loss function is passed through by finally segmenting the corpus in field, log-likelihood probability, the corpus for being migrated participle field Weight sums it up to obtain, wherein the likelihood probability of target domain corpus and as regularization term, meanwhile, the cross entropy of source domain corpus Weight be less than non-regularization term.

3. method according to claim 2, which is characterized in that loss function is indicated by following formula:

Loss=λ₁cross entropy_target+λ₂Likelihood_target+λ₃crossentropy_source

Wherein, in formula first item indicate target domain corpus cross entropy, Section 2 be target domain likelihood probability, third Item is the cross entropy of source domain corpus.λ_nFor every weight.

4. the method as described in claim 1, which is characterized in that in non-training situation, when Chinese word segmentation, by step 8 and step Nine are replaced, and are replaced as follows:

Step 8, will need the label probability of each character of testing material segmented decode to obtain by viterbi algorithm it is optimal Sequence completes participle.

5. the method as described in claim 1, which is characterized in that in step 2, the Chinese character of sentence in corpus is mapped For vector expression, comprising:

The Chinese character of sentence in corpus is mapped as using word embedding grammar by not sparse vector table by mapping dictionary Show.

6. method as claimed in claim 5, which is characterized in that by mapping dictionary, the method being embedded in using word, by corpus The Chinese character of middle sentence, which is mapped as not sparse vector, to be indicated, comprising:

Training corpus is traversed, and all unduplicated characters are found out, and is each character number, the vector table of each identical characters Show identical, the vector expression difference of kinds of characters, while one vector of setting indicates all and goes out not in training corpus set Existing character is unknown character；

7. the method as described in claim 1, which is characterized in that in the step 3, vector is indicated to extract by convolutional layer Feature, comprising:

Vector is indicated that every characteristic pattern represents an one-dimensional vector by obtaining a certain number of characteristic patterns after convolutional layer, to Amount dimension is sentence length, the vector that characteristic pattern represents is connected into matrix, the feature extracted for convolutional layer.

8. the method as described in claim 1, which is characterized in that logarithm prior probability b in the routing algorithm_ijUpdate indicate It is as follows:

Wherein,For predicted vector, v_jIt is the output of j-th of capsule.

9. the method as described in claim 1, which is characterized in that the cross entropy formula is indicated by following formula:

Cross entropy=- ∑_ip_real(i)log(p_pred(i))

Wherein, p_real(i) true probability of each label of each character, p are indicated_pred(i) each label of each character is indicated Prediction probability, the cross entropy of each character is summed to obtain the cross entropy of a word.

Likelihood probability is indicated by the following formula of formula:

10. the method as described in claim 1, which is characterized in that the loss function is indicated by following formula:

Loss=λ₁cross entropy+λ₂Likelihood

Wherein, cross entropy indicates that cross entropy, Likelihood indicate likelihood probability, λ_nIndicate preceding two weights.