CN109766553A - A kind of Chinese word cutting method of the capsule model combined based on more regularizations - Google Patents
A kind of Chinese word cutting method of the capsule model combined based on more regularizations Download PDFInfo
- Publication number
- CN109766553A CN109766553A CN201910018546.2A CN201910018546A CN109766553A CN 109766553 A CN109766553 A CN 109766553A CN 201910018546 A CN201910018546 A CN 201910018546A CN 109766553 A CN109766553 A CN 109766553A
- Authority
- CN
- China
- Prior art keywords
- capsule
- corpus
- vector
- character
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Machine Translation (AREA)
Abstract
The present invention provides a kind of Chinese word cutting methods of capsule model combined based on more regularizations, by increasing capsule slide window capsule sliding window, the technical issues of capsule model migration is applied in natural language processing NLP sequence labelling task i.e. Chinese word segmentation task, solves the task that capsule model is not particularly suited for sequence labelling;Multiple regularization terms are combined, realize simple field migration, capsule model is adapted in sequence labelling task, completes the Chinese word segmentation of higher accuracy, help more complicated natural language processing task by the present invention;By the joint of more regular terms, the generalization ability of model is improved, certain field migration is realized, the mark of artificial corpus can be reduced, reduces the artificial and time cost for manually marking corpus in natural language processing research.
Description
Technical field
The present invention relates to Internet technical field more particularly to a kind of Chinese of the capsule model combined based on more regularizations
Segmenting method.
Background technique
As the technologies such as information technology, machine learning develop, the technology for automatically processing information is gradually applied to various fields
Scape, such as to user preference is excavated in the comment on commodity of film comment, shopping, the short summary etc. an of article is automatically generated,
It requires to carry out automatic processing to text, and as Chinese user becomes increasingly active on the internet, the information of generation is also got over
Come it is more, it is more necessary to the automatic processing of text information.The appearance of these situations so that natural language processing related skill
Art is widely applied to each corner of society.And for natural language processing technique, at domestic natural language
For the development of reason technology, Chinese Automatic Word Segmentation technology is wherein the most basic also the most key one of technology again.
Chinese word segmentation task is exactly that a Chinese is divided boundary according to word, so that during machine is easier to understand
The technology of literary language.Chinese is different from English, has space as boundary between English word, and for Chinese, especially
Modern Chinese, the composition of word generally are connected to obtain by two and more than two Chinese characters, and cannot by simply with
Word is boundary understanding.This allows for computer when automatically processing Chinese text, needs first to carry out Chinese word segmentation to text,
For many Chinese natural language processing techniques, for example, part-of-speech tagging, name Entity recognition, text classification, text snippet generate,
The tasks such as event extraction, information retrieval, are all highly dependent on Chinese word segmentation.The quality of Chinese word segmentation will be to these dependent on Chinese
The technology of participle task generates actively or negative effect, it can be seen that Chinese word segmentation task basic and establishes one
Good Words partition system, the importance automatically processed for high performance text information.
Being done for task of Chinese word segmentation be exactly by algorithm by computer automatically between the word and word in Chinese language text
Automatically plus boundary markers such as spaces.Chinese Word Automatic Segmentation has developed more than 20 years, the dictionary matching method since most, that is, most
Classical Direct/Reverse maximum matching starts, and disambiguates the segmentation methods of model to probability is added, then arrive traditional condition random field
(Conditional Random Fields, CRFs), structuring perceptron (Structure Perceptron), maximum entropy mould
Type (Maximum Entropy, ME) and current neural network (Neural Network, NN) participle model.Segmentation methods
It itself is being constantly progressive, and most sequence labelling task mainly relies on machine learning algorithm at present, sequence labelling passes through
Algorithm distributes respective class label to each member of the sequence of observations.Chinese word segmentation task is generally also by as a sequence
Mark task is handled, and by way of occurring in word to word and position carries out class indication to word, obtaining word sequence
Word segmentation result.
For neural network method, extensive careful data mark will determine to the generation of the final result of each task
Qualitatively influence, however extensive careful data mark needs to be completed by people, thereby produce great human cost and
Time cost.Field migration task then wishes to reduce other field by migrating the existing data marked
Data mark, such as Chinese word segmentation, the data marked on a large scale concentrate on News Field, in comparison other field
Text data it is then fewer and fewer, therefore, how the key message of extensive labeled data collection moved into only a small amount of mark
On the data set for the other field not marked even, become difficult point.Field migrating technology becomes more important therewith.
As shown in Figure 1, being mentioned in " Dynamic Routing between Capsules " article of one of prior art
The model to solve the problems, such as Handwritten Digit Recognition technical solution:
Firstly, converting the matrix of 28x28 size as input for handwritten numeral picture;Second, the matrix of input is led to
It crosses a convolutional layer and extracts feature, the convolution kernel of convolution operation shares 256, and convolution kernel size is 9x9, and convolution kernel step-length is 1,
Using activation primitive, preferably line rectification function (Rectified Linear Unit, ReLU) output is 256 characteristic patterns,
Every Zhang great little is 20x20;Previous step is exported as input, passes through primary roll gum deposit cystoblast (convolutional by third
Primary capsule layer), 32 characteristic patterns are obtained, every figure is made of capsule, shares the capsule of 6x6, each capsule
The vector tieed up for one 8;4th, previous step is exported as input, by digital capsule layer, this layer of size needs are classified
Number shares 10 classification for Handwritten Digit Recognition task, therefore this layer shares 10 capsules, and each capsule is 16 dimensions
Vector;5th, the probability of each classification is obtained by NONLINEAR CALCULATION, maximum probability is the classification classified.
As shown in Fig. 2, calculating outputting and inputting for capsule realizes that dynamic routing algorithm is only containing by dynamic routing algorithm
Have and is carried out between the layer of capsule.
The length of capsule output vector can indicate the existing probability under current input of entity representated by capsule.Cause
This, ensures to be compressed to compared with short amount close to zero and larger vector is compressed using non-linear " squeezing (squash) " function
To close to 1.So that there is the study of differentiation to can be good at utilizing this nonlinear function:
Wherein vjIt is the output of j-th of capsule, sjIt is input summation.
For all capsules in addition to first layer, capsule sjIt is total input be in lower layer's predicted vectorOne power
Weight and, and predicted vectorIt is that the capsule of lower layer is exported into uiWith a weight matrix WijIt is multiplied.
Wherein cijIt is the coefficient of coup is determined by iteration routing procedure.
The coefficient of coup between capsule i and be 1 in upper one layer of whole capsule summations, and the coefficient of coup is routed by one
Softmax decision, initial logic bijIt is logarithm prior probability, i.e. capsule i should be coupled with capsule j
Priori logarithm can carry out distinguishing study as other weights simultaneously.They rely on part and two capsules
Type, rather than depend on current image input.Later, the initial coefficient of coup is by measuring each glue in high one layer
The current output v of capsule jijWith the prediction of capsuleBetween consistency repeat to refine.
Consistency is only scalar product,This consistency is considered log-likelihood ratio, and to connection glue
Before all coefficients of coup of capsule i and higher capsule calculate new value, this consistency is added to initial logic bij。
In convolution capsule layer, each capsule exports the local area network of a vector to each type of the capsule in high one layer
Lattice, and each type of each section and capsule for grid uses different transformation matrixs.
As shown in figure 3, the two of the prior art " Deep Learning for Chinese Word Segmentation
And POS Tagging " article proposes that feedforward neural network solves the problems, such as the technical solution of Chinese word segmentation:
Firstly, converting vector for Chinese character, each identical Chinese character is converted into d and ties up identical vector, passes through
One dictionary is realized;Second, converting Hidden unit for previous step vector using feedforward neural network indicates;Third, by upper one
Step output carries out Nonlinear Mapping as input, by sigmoid function;4th, previous step is exported as input, before
To neural network, dimension dimension classification quantity is exported, in Chinese word segmentation, classification quantity is 4;5th, previous step is four-dimensional
Softmax is made in output, obtains each word and is divided into different classes of probability, these probability is decoded by viterbi algorithm, meter
Optimal sequence is calculated, is segmented.
Inventor has found in the course of the study: skill existing for " Dynamic routing between capsules "
In art:
1, just for the task of Handwritten Digit Recognition, overall model is not particularly suited for sequence labelling task;
2, the regularization term of reconstruction image is not particularly suited for Chinese word segmentation task;
Following disadvantage exists in the prior art since above-mentioned technical problem results in:
1, accuracy rate is general;
2, capsule model is not particularly suited for the task of sequence labelling;
3, Generalization Capability is general, i.e., can only be trained and test on same corpus, when the test of replacement other field
When corpus, performance can sharply decline.
Summary of the invention
In order to solve the above-mentioned technical problems, the present invention provides a kind of Chinese of capsule model combined based on more regularizations
Segmenting method realizes the Chinese word segmentation of higher accuracy based on the application to capsule model, so that training is surveyed on same corpus
The result of examination is more accurate, by the change and extension to capsule model, realizes capsule model answering in sequence labelling task
With;By the joint to more regularization terms, realize on capsule model for the field migration effect of Chinese word segmentation task.
The present invention provides a kind of Chinese word cutting method of capsule model combined based on more regularizations, training particular areas
When corpus, this method comprises:
Step 1: identify the maximum length of sentence in corpus, using pre-stored character by corpus it is insufficient most
The sentence length polishing of long length is to maximum length;
Step 2: the Chinese character of sentence in corpus, which is mapped as vector, to be indicated;
Step 3: vector is indicated to extract feature by convolutional layer;
Step 4: the feature extracted is input to primary capsule layer (primary capsule layer), pass through convolution
Operation, obtaining a scalar to each word of each characteristic pattern indicates, the expression of obtained scalar is connected to become vector and is made
For a capsule, one is obtained by primary capsule (primary capsules);
Step 5: obtained primary capsule (primary capsules) is passed through capsule slide window (capsule
Sliding window) character representation of some word is obtained, to adapt to sequence labelling task;
Step 6: routing will be used as by the feature obtained after capsule slide window (capsule sliding window)
The input of algorithm obtains label capsule (tag capsules) by routing algorithm;
Step 7: label capsule (tag capsules) output modulus is obtained the label probability of each character;
Step 8: the label probability of each character and true label probability are made cross entropy respectively and input condition is random
Field (Conditional Random Field, CRF) calculates likelihood probability;
Step 9: loss function is summed it up to obtain by cross entropy and log-likelihood probability by weight, by log-likelihood probability
As the regular terms of loss function, is calculated by back-propagation algorithm (Backpropagation, BP) and update each layer weight of network.
Further, at least two field corpus of training, using one of field corpus as the target neck finally segmented
When the corpus of domain, step 9 is replaced, is replaced as follows:
Step 9: loss function is by finally segmenting the corpus, log-likelihood probability, the corpus for being migrated participle field in field
It sums it up to obtain by weight, wherein the likelihood probability of target domain corpus and as regularization term, meanwhile, the friendship of source domain corpus
The weight for pitching entropy is less than non-regularization term.
Further, loss function is indicated by following formula:
Loss=λ1cross entropytarget+λ2Likelihoodtarget+λ3cross entropysource
Wherein, in formula first item indicate target domain corpus cross entropy, Section 2 be target domain likelihood probability,
Section 3 is the cross entropy of source domain corpus.λnFor every weight;
Further, in non-training situation, when Chinese word segmentation, step 8 and step 9 are replaced.It replaces as follows:
The label probability for needing each character of the testing material segmented is decoded to obtain by step 8 by viterbi algorithm
Optimal sequence completes participle.
Further, in step 2, the Chinese character of sentence in corpus, which is mapped as vector, to be indicated, comprising:
By mapping dictionary, the Chinese character of sentence in corpus is mapped as not sparse by the method being embedded in using word
Vector indicates.
Further, by mapping dictionary, the Chinese character of sentence in corpus is mapped as by the method being embedded in using word
Not sparse vector indicates, comprising:
Training corpus is traversed, and all unduplicated characters are found out, be each character number, each identical characters to
Amount indicates identical, and the vector of kinds of characters indicates different, while one vector of setting indicates all not in training corpus set
The character of middle appearance is unknown character;
In training network, dropout mechanism is introduced, at random by a part of parameter zero setting.
Further, in the step 3, vector is indicated to extract feature by convolutional layer, comprising:
Vector is indicated by obtaining a certain number of characteristic patterns after convolutional layer, every characteristic pattern represent one it is one-dimensional to
Amount, vector dimension is sentence length, the vector that characteristic pattern represents is connected into matrix, the feature extracted for convolutional layer.
Further, logarithm prior probability b in the routing algorithmijUpdate be expressed as follows:
Wherein,For predicted vector, vjIt is the output of j-th of capsule.
Further, the cross entropy formula is indicated by following formula:
Cross entropy=- ∑ipreal(i)log(ppred(i))
Wherein, preal(i) true probability of each label of each character, p are indicatedpred(i) each of each character is indicated
The prediction probability of label sums the cross entropy of each character to obtain the cross entropy of a word;
Likelihood probability is indicated by the following formula of formula:
Wherein, p (yreal) indicate the probability of correct sequence, all possible sequence of y ' expression in formula denominator;
Further, the loss function is indicated by following formula:
Loss=λ1cross entropy+λ2Likelihood
Wherein, cross entropy indicates that cross entropy, Likelihood indicate likelihood probability, λnIndicate preceding two power
Weight.
The Chinese word cutting method of a kind of capsule model combined based on more regularizations provided by the invention, by by die of capsule
Type adapts in sequence labelling task, completes the Chinese word segmentation of higher accuracy, helps more complicated natural language processing
(Natural Language Processing, NLP) task;By the joint of more regular terms, the extensive energy of model is improved
Power realizes certain field migration, can reduce the mark of artificial corpus, reduces and manually marks corpus in NLP research
Artificial and time cost.
Detailed description of the invention
Fig. 1 is the capsule model schematic diagram to solve the problems, such as Handwritten Digit Recognition;
Fig. 2 is dynamic routing algorithm schematic diagram;
Fig. 3 is neural network structure schematic diagram;
Fig. 4 is a kind of Chinese word cutting method flow chart of capsule model combined based on more regularizations provided by the invention;
Fig. 5 is the flow chart of embodiment one;
Fig. 6 is convolution operation schematic diagram.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.Wherein, the abbreviation and Key Term occurred in the present embodiment is defined as follows:
NN:Neural Network neural network;
CNN:Convolutional Neural Network convolutional neural networks;
LSTM:Long Short-Term Memory shot and long term Memory Neural Networks;
CTB:Chinese Treebank Penn Chinese treebank;
CRFs:Conditional Random Fields condition random field;
FNN:Feedforward neural network feedforward neural network;
ReLU:Rectified Linear Unit line rectification function, is a kind of activation primitive;
ME:Maximum Entropy maximum entropy model;
NLP:Natural Language Processing natural language processing.
Embodiment one
Referring to shown in Fig. 4-6, Fig. 4,5 show a kind of capsule model combined based on more regularizations provided by the invention
Chinese word cutting method, specifically, when training particular area corpus, this method comprises:
Step 1: identify the maximum length of sentence in corpus, using pre-stored character by corpus it is insufficient most
The sentence length polishing of long length is to maximum length.
Wherein, it is 128 that maximum sentence length is arranged in the present embodiment, corpus CTB6.0;The purpose of this step is
All sentences being input in network model are all fixed as uniform length.
Step 2: the Chinese character of sentence in corpus, which is mapped as vector, to be indicated;By mapping dictionary, it is embedded in using word
Method, the Chinese character of sentence in corpus, which is mapped as not sparse vector, to be indicated.
Further, the Chinese character of sentence in corpus is mapped as not using word embedding grammar by mapping dictionary
Sparse vector indicates, comprising:
Training corpus is traversed, and all unduplicated characters are found out, be each character number, each identical characters to
Amount indicates identical, and the vector of kinds of characters indicates different, while one vector of setting indicates all not in training corpus set
The character of middle appearance is unknown character;In training network, dropout mechanism is introduced, at random by a part of parameter zero setting.
It is 200 to each word setting map vector dimension in the present embodiment;This step passes through a mapping dictionary and realizes,
Character, which is mapped as not sparse vector, indicates that training corpus is traversed first, is found out all unduplicated characters, is each
Character number, it is assumed that share M character, then establish 200 rows (word DUAL PROBLEMS OF VECTOR MAPPING dimension is 200), the matrix of M+1 column, often
The vectors of a identical characters indicates identical, and the vector of kinds of characters indicates different, other than M character, also set up one to
Amount indicates institute either with or without the character occurred in training corpus set, for unknown character.In this step, it is automatic to use for reference denoising
The thought of encoder, invention introduces dropout mechanism, at random by a part of parameter zero setting, to keep away in this way in training network
Exempt from over-fitting and provides a kind of side for effectively substantially combining many different neuronal structures that indexation increases
Method.
Step 3: vector is indicated to extract feature by convolutional layer;Vector is indicated by obtaining a fixed number after convolutional layer
The characteristic pattern of amount, every characteristic pattern represent an one-dimensional vector, and vector dimension is sentence length, and the vector that characteristic pattern is represented connects
It is connected into matrix, the feature extracted for convolutional layer.
The characteristic pattern quantity for extracting feature is set as 200, and every characteristic pattern is an one-dimensional vector, and vector dimension is sentence
The characteristic pattern (vector) obtained after convolution is connected into matrix by length, the feature extracted for convolutional layer;
In this step, the convolutional neural networks of single layer are good at the extraction of local feature, and multilayer convolutional neural networks
Superposition, also can play good effect to the study of context.As shown in Figure 6 is indicated for the convolution operation of text.
Step 4: the feature extracted is input to primary capsule layer (primary capsule layer), pass through convolution
Operation, obtaining a scalar to each word of each characteristic pattern indicates, the expression of obtained scalar is connected to become vector and is made
For a capsule, a primary capsule (primary capsules) is obtained;
In the present embodiment, convolution operation is had altogether eight times, and operation obtains one to some word of each characteristic pattern every time
Scalar indicates, the scalar expression that eight times obtain is connected to become vector, as a capsule, finally obtain one is indicated by capsule
Eigenmatrix.
Step 5: obtained primary capsule (primary capsules) is passed through capsule slide window (capsule
Sliding window) character representation of some word is obtained, to adapt to sequence labelling task;
Capsule sliding window is set as 7 in the present embodiment;
In this step, capsule slide window (capsule sliding window) is a sliding window, by previous step
Eigenmatrix is indicated by capsule in rapid, wherein each column represent a character, sliding window is selected for each character,
It is influenced by the character of front and back n, i.e., by the total n character in front and back (including this character itself), as dynamic routing algorithm
Input.
Step 6: routing will be used as by the feature obtained after capsule slide window (capsule sliding window)
The input of algorithm obtains label capsule tag capsules by routing algorithm;
The number of label capsule is the quantity 4 of classification in the present embodiment, and the dimension of label capsule is 16 dimensions, and dynamic routing is calculated
The number of iterations of method is 3 times;
The step of routing algorithm, enumerates as follows:
First, it will pass through in the capsule of primary capsule layer (primary capsule layer), that is, vector
Squash Nonlinear Mapping formula (2-1) obtains the output v of primary capsule layer (primary capsule layer)j.
Second, using the output of primary capsule layer (primary capsule layer) as trademark adhesive cystoblast (tag
Capsule input) is calculated trademark adhesive cystoblast (tag capsule) capsule by formula (2-2), then these capsules is passed through
Squash Nonlinear Mapping is crossed, the output of trademark adhesive cystoblast (tag capsule) is obtained.
For the logit parameter b of softmaxijUpdate by formula (3-1) indicate:
Wherein,For predicted vector, vjIt is the output of j-th of capsule.
The calculation formula of the coefficient of coup such as formula (3-2) indicates:
Step 7: label capsule tag capsules output modulus is obtained the label probability of each character;
Step 8: the label probability of each character and true label probability are made cross entropy respectively and input condition is random
Field (Conditional Random Field, CRF) calculates likelihood probability;
Step 9: loss function is summed it up to obtain by cross entropy and log-likelihood probability by weight, by log-likelihood probability
As the regular terms of loss function, is calculated by back-propagation algorithm (Backpropagation, BP) and update each layer weight of network.
Further, the loss function is indicated by following formula:
Loss=λ1cross entropy+λ2Likelihood
Wherein, cross entropy indicates that cross entropy, Likelihood indicate likelihood probability, λnIndicate preceding two power
Weight.
Wherein, at least two field corpus of training, using one of field corpus as the target domain language finally segmented
When material, step 9 is replaced, is replaced as follows:
Step 9: loss function is by finally segmenting the corpus, log-likelihood probability, the corpus for being migrated participle field in field
It sums it up to obtain by weight, wherein the likelihood probability of target domain corpus and as regularization term, meanwhile, the friendship of source domain corpus
The weight for pitching entropy is less than non-regularization term.
Further, loss function is indicated by following formula:
Loss=λ1cross entropytarget+λ2Likelihoodtarget+λ3cross entropysource
Wherein, in formula first item indicate target domain corpus cross entropy, Section 2 be target domain likelihood probability,
Section 3 is the cross entropy of source domain corpus.λnFor every weight.
In non-training situation, when Chinese word segmentation, step 8 and step 9 are replaced, (i.e. in non-training situation, step
Rapid nine are removed), it replaces as follows:
The label probability for needing each character of the testing material segmented is decoded to obtain by step 8 by viterbi algorithm
Optimal sequence completes participle.
Further, the cross entropy formula is indicated by following formula:
Cross entropy=- ∑ipreal(i)log(ppred(i))
Wherein, preal(i) true probability of each label of each character, p are indicatedpred(i) each of each character is indicated
The prediction probability of label sums the cross entropy of each character to obtain the cross entropy of a word.
Likelihood probability is indicated by the following formula of formula:
Wherein, p (yreal) indicate the probability of correct sequence, all possible sequence of y ' expression in formula denominator.
One preferred embodiment, as shown in figure 4, first reflecting each of sentence character by taking " A friend in need is a friend indeed " as an example
It penetrates as a dense vector, vector dimension n, by convolution, extraction obtains the feature of each word in a word;Convolution is mentioned
The feature obtained is input in capsule layer, and the feature that convolution obtains is passed through to the feature for being connected to primary capsule layer, is passed through
One capsule slide window and iteration router-level topology obtain the feature of trademark adhesive cystoblast, calculate vector field homoemorphism in trademark adhesive cystoblast
Obtain the probability of each label;Finally, inferring layer by label, using viterbi algorithm, in short optimal sequence mark is found out
Note realizes participle.
The embodiment of the present invention one is by increasing capsule slide window (capsule sliding window), by capsule model
Migration is applied to the Chinese word segmentation that higher accuracy is completed in NLP sequence labelling task (Chinese word segmentation task), helps more multiple
Miscellaneous natural language processing (Natural Language Processing, NLP) task;By the joint of more regular terms, promoted
The generalization ability of model realizes certain field migration, can reduce the mark of artificial corpus, reduces in NLP research
The artificial and time cost of artificial mark corpus.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (10)
1. a kind of Chinese word cutting method of the capsule model combined based on more regularizations, which is characterized in that training particular area language
When material, this method comprises:
Step 1: identifying the maximum length of sentence in corpus, most greatly enhanced using pre-stored character by insufficient in corpus
The sentence length polishing of degree is to maximum length;
Step 2: the Chinese character of sentence in corpus, which is mapped as vector, to be indicated;
Step 3: vector is indicated to extract feature by convolutional layer;
Step 4: the feature extracted is input to primary capsule layer (primary capsule layer), grasped by convolution
Make, obtaining scalar to each word of each characteristic pattern indicates, using the expression of obtained scalar be connected to become vector as
One capsule obtains a primary capsule (primary capsules);
Step 5: obtained primary capsule (primary capsules) is passed through capsule slide window (capsule sliding
Window the character representation of some word is obtained) to adapt to sequence labelling task;
Step 6: will be by the feature that is obtained after capsule slide window (capsule sliding window) as routing algorithm
Input, obtain label capsule (tag capsules) by routing algorithm;
Step 7: label capsule (tag capsules) output modulus is obtained the label probability of each character;
Step 8: the label probability of each character and true label probability are made cross entropy and input condition random field respectively
(Conditional Random Field, CRF) calculates likelihood probability;
Step 9: loss function is summed it up to obtain by cross entropy and log-likelihood probability by weight, using log-likelihood probability as
The regular terms of loss function is calculated by back-propagation algorithm (Backpropagation, BP) and updates each layer weight of network.
2. the method as described in claim 1, which is characterized in that at least two field corpus of training, with one of field language
When material is as the target domain corpus finally segmented, step 9 is replaced, is replaced as follows:
Step 9: loss function is passed through by finally segmenting the corpus in field, log-likelihood probability, the corpus for being migrated participle field
Weight sums it up to obtain, wherein the likelihood probability of target domain corpus and as regularization term, meanwhile, the cross entropy of source domain corpus
Weight be less than non-regularization term.
3. method according to claim 2, which is characterized in that loss function is indicated by following formula:
Loss=λ1cross entropytarget+λ2Likelihoodtarget+λ3crossentropysource
Wherein, in formula first item indicate target domain corpus cross entropy, Section 2 be target domain likelihood probability, third
Item is the cross entropy of source domain corpus.λnFor every weight.
4. the method as described in claim 1, which is characterized in that in non-training situation, when Chinese word segmentation, by step 8 and step
Nine are replaced, and are replaced as follows:
Step 8, will need the label probability of each character of testing material segmented decode to obtain by viterbi algorithm it is optimal
Sequence completes participle.
5. the method as described in claim 1, which is characterized in that in step 2, the Chinese character of sentence in corpus is mapped
For vector expression, comprising:
The Chinese character of sentence in corpus is mapped as using word embedding grammar by not sparse vector table by mapping dictionary
Show.
6. method as claimed in claim 5, which is characterized in that by mapping dictionary, the method being embedded in using word, by corpus
The Chinese character of middle sentence, which is mapped as not sparse vector, to be indicated, comprising:
Training corpus is traversed, and all unduplicated characters are found out, and is each character number, the vector table of each identical characters
Show identical, the vector expression difference of kinds of characters, while one vector of setting indicates all and goes out not in training corpus set
Existing character is unknown character;
In training network, dropout mechanism is introduced, at random by a part of parameter zero setting.
7. the method as described in claim 1, which is characterized in that in the step 3, vector is indicated to extract by convolutional layer
Feature, comprising:
Vector is indicated that every characteristic pattern represents an one-dimensional vector by obtaining a certain number of characteristic patterns after convolutional layer, to
Amount dimension is sentence length, the vector that characteristic pattern represents is connected into matrix, the feature extracted for convolutional layer.
8. the method as described in claim 1, which is characterized in that logarithm prior probability b in the routing algorithmijUpdate indicate
It is as follows:
Wherein,For predicted vector, vjIt is the output of j-th of capsule.
9. the method as described in claim 1, which is characterized in that the cross entropy formula is indicated by following formula:
Cross entropy=- ∑ipreal(i)log(ppred(i))
Wherein, preal(i) true probability of each label of each character, p are indicatedpred(i) each label of each character is indicated
Prediction probability, the cross entropy of each character is summed to obtain the cross entropy of a word.
Likelihood probability is indicated by the following formula of formula:
Wherein, p (yreal) indicate the probability of correct sequence, all possible sequence of y ' expression in formula denominator.
10. the method as described in claim 1, which is characterized in that the loss function is indicated by following formula:
Loss=λ1cross entropy+λ2Likelihood
Wherein, cross entropy indicates that cross entropy, Likelihood indicate likelihood probability, λnIndicate preceding two weights.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910018546.2A CN109766553A (en) | 2019-01-09 | 2019-01-09 | A kind of Chinese word cutting method of the capsule model combined based on more regularizations |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910018546.2A CN109766553A (en) | 2019-01-09 | 2019-01-09 | A kind of Chinese word cutting method of the capsule model combined based on more regularizations |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109766553A true CN109766553A (en) | 2019-05-17 |
Family
ID=66453491
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910018546.2A Pending CN109766553A (en) | 2019-01-09 | 2019-01-09 | A kind of Chinese word cutting method of the capsule model combined based on more regularizations |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109766553A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110263855A (en) * | 2019-06-20 | 2019-09-20 | 深圳大学 | A method of it is projected using cobasis capsule and carries out image classification |
CN110825849A (en) * | 2019-11-05 | 2020-02-21 | 泰康保险集团股份有限公司 | Text information emotion analysis method, device, medium and electronic equipment |
CN111460818A (en) * | 2020-03-31 | 2020-07-28 | 中国测绘科学研究院 | Web page text classification method based on enhanced capsule network and storage medium |
CN112270285A (en) * | 2020-11-09 | 2021-01-26 | 天津工业大学 | SAR image change detection method based on sparse representation and capsule network |
CN112579746A (en) * | 2019-09-29 | 2021-03-30 | 京东数字科技控股有限公司 | Method and device for acquiring behavior information corresponding to text |
CN116757534A (en) * | 2023-06-15 | 2023-09-15 | 中国标准化研究院 | Intelligent refrigerator reliability analysis method based on neural training network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145483A (en) * | 2017-04-24 | 2017-09-08 | 北京邮电大学 | A kind of adaptive Chinese word cutting method based on embedded expression |
CN108920467A (en) * | 2018-08-01 | 2018-11-30 | 北京三快在线科技有限公司 | Polysemant lexical study method and device, search result display methods |
-
2019
- 2019-01-09 CN CN201910018546.2A patent/CN109766553A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145483A (en) * | 2017-04-24 | 2017-09-08 | 北京邮电大学 | A kind of adaptive Chinese word cutting method based on embedded expression |
CN108920467A (en) * | 2018-08-01 | 2018-11-30 | 北京三快在线科技有限公司 | Polysemant lexical study method and device, search result display methods |
Non-Patent Citations (2)
Title |
---|
SI LI 等: "Capsules Based Chinese Word Segmentation for Ancient Chinese Medical Books", 《SPECIAL SECTION ON AI-DRIVEN BIG DATA PROCESSING: THEORY, METHODOLOGY, AND APPLICATIONS》 * |
李裕礞: "基于用户隐性反馈行为的下一个购物篮推荐", 《中文信息学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110263855A (en) * | 2019-06-20 | 2019-09-20 | 深圳大学 | A method of it is projected using cobasis capsule and carries out image classification |
CN110263855B (en) * | 2019-06-20 | 2021-12-14 | 深圳大学 | Method for classifying images by utilizing common-basis capsule projection |
CN112579746A (en) * | 2019-09-29 | 2021-03-30 | 京东数字科技控股有限公司 | Method and device for acquiring behavior information corresponding to text |
CN110825849A (en) * | 2019-11-05 | 2020-02-21 | 泰康保险集团股份有限公司 | Text information emotion analysis method, device, medium and electronic equipment |
CN111460818A (en) * | 2020-03-31 | 2020-07-28 | 中国测绘科学研究院 | Web page text classification method based on enhanced capsule network and storage medium |
CN112270285A (en) * | 2020-11-09 | 2021-01-26 | 天津工业大学 | SAR image change detection method based on sparse representation and capsule network |
CN112270285B (en) * | 2020-11-09 | 2022-07-08 | 天津工业大学 | SAR image change detection method based on sparse representation and capsule network |
CN116757534A (en) * | 2023-06-15 | 2023-09-15 | 中国标准化研究院 | Intelligent refrigerator reliability analysis method based on neural training network |
CN116757534B (en) * | 2023-06-15 | 2024-03-15 | 中国标准化研究院 | Intelligent refrigerator reliability analysis method based on neural training network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108628823B (en) | Named entity recognition method combining attention mechanism and multi-task collaborative training | |
CN109766553A (en) | A kind of Chinese word cutting method of the capsule model combined based on more regularizations | |
CN109284506B (en) | User comment emotion analysis system and method based on attention convolution neural network | |
CN106980683B (en) | Blog text abstract generating method based on deep learning | |
CN110472042B (en) | Fine-grained emotion classification method | |
CN109933670B (en) | Text classification method for calculating semantic distance based on combined matrix | |
Stojanovski et al. | Twitter sentiment analysis using deep convolutional neural network | |
CN109753660B (en) | LSTM-based winning bid web page named entity extraction method | |
CN110929034A (en) | Commodity comment fine-grained emotion classification method based on improved LSTM | |
CN112667818B (en) | GCN and multi-granularity attention fused user comment sentiment analysis method and system | |
CN111401061A (en) | Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention | |
CN109408812A (en) | A method of the sequence labelling joint based on attention mechanism extracts entity relationship | |
CN108182295A (en) | A kind of Company Knowledge collection of illustrative plates attribute extraction method and system | |
CN108268643A (en) | A kind of Deep Semantics matching entities link method based on more granularity LSTM networks | |
CN110765775A (en) | Self-adaptive method for named entity recognition field fusing semantics and label differences | |
Zhang et al. | Sentiment Classification Based on Piecewise Pooling Convolutional Neural Network. | |
CN109214006B (en) | Natural language reasoning method for image enhanced hierarchical semantic representation | |
CN110196980A (en) | A kind of field migration based on convolutional network in Chinese word segmentation task | |
CN110263325A (en) | Chinese automatic word-cut | |
CN111651974A (en) | Implicit discourse relation analysis method and system | |
CN108345583A (en) | Event recognition and sorting technique based on multi-lingual attention mechanism and device | |
CN111582506A (en) | Multi-label learning method based on global and local label relation | |
CN108920446A (en) | A kind of processing method of Engineering document | |
CN112632993A (en) | Electric power measurement entity recognition model classification method based on convolution attention network | |
CN114781382A (en) | Medical named entity recognition system and method based on RWLSTM model fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190517 |