CN106547737A - Based on the sequence labelling method in the natural language processing of deep learning - Google Patents
Based on the sequence labelling method in the natural language processing of deep learning Download PDFInfo
- Publication number
- CN106547737A CN106547737A CN201610950893.5A CN201610950893A CN106547737A CN 106547737 A CN106547737 A CN 106547737A CN 201610950893 A CN201610950893 A CN 201610950893A CN 106547737 A CN106547737 A CN 106547737A
- Authority
- CN
- China
- Prior art keywords
- label
- network
- vector
- training
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The invention belongs to Computer Natural Language Processing technical field, specially based on the sequence labelling method in the natural language processing of deep learning.The present invention can be used for the sequence labelling task for including the various natural languages such as Chinese word segmentation, English shallow parsing, Chinese and English part-of-speech tagging and name identification.Using depth learning technology, for the sentence being input into, the tag types of each component units in sentence are exported by computer program.The key of the sequence labelling method includes:Rapid serial mark network structure and learning algorithm, comprehensive front network structure and accelerating algorithm to label information based on deep learning, and integration and the integration mode of these key technologies.The system realized based on deep learning possesses parameter small scale, the fast advantage of operating speed, it is very suitable for the limited environment of computing resource, can be deployed on the relatively limited mobile computing platform of the computing resources such as mobile phone, can significantly improve system response time and user satisfaction.
Description
Technical field
The invention belongs to Computer Natural Language Processing technical field, and in particular to sequence mark in a kind of natural language processing
Injecting method.
Background technology
Deep learning is the progress of recent making a breakthrough property of artificial intelligence study, and it finishes artificial intelligence and is up to 10 years not
There can be the situation of breakthrough, and it is rapid in industrial quarters generation impact.Deep learning is different from
Narrow artificial intelligence system(Towards the functional simulation of particular task), as general artificial intelligence technology, can tackle
Various situations and problem, obtain extremely successfully applying in fields such as image recognition, speech recognitions, lead in natural language processing
Domain(It is mainly English)Also obtain certain effect.Deep learning is to realize that artificial intelligence is most effective at present, be also acquirement effect most
Big implementation.
Relatively conventional art, the system realized based on deep learning are also equipped with parameter small scale, the fast advantage of operating speed,
It is very suitable for the limited environment of computing resource.
In natural language processing field, for sequence labelling, including Chinese word segmentation, English shallow parsing, Chinese and English
The sequence labelling problem in various natural language processings such as part-of-speech tagging and name identification, although existing based on depth network
Method has been able to reach the performance similar to traditional method, but the parameter amount that its model is included is still more, and use time is still
So longer, mark performance need further to improve.For the problems referred to above, the present invention proposes a kind of new based on deep learning
Rapid serial mask method, not only accelerate by a relatively large margin network mark needed for training and use time, while can be comprehensive
To tag types improving the accuracy of mark before closing.
The content of the invention
It is an object of the invention to propose sequence mark in the short natural language processing of a kind of mark accuracy height, required time
The method of note.
The method of sequence labelling in the natural language processing that the present invention is provided, be with computer to read statement, according to
The tag set of task definition, is each component units in sentence(Word or word)Corresponding label is selected by its appearance order
Type.The method can be used for including that Chinese word segmentation, English shallow parsing, Chinese and English part-of-speech tagging and name identification etc. are each
Plant the sequence labelling task in natural language.
Sequence labelling refers to the tag set according to task, for the sentence of input, exports sentence by computer program
In each component units(Such as:The word of the word or English of Chinese)Tag types.By taking Chinese word segmentation as an example, general employing includesB、I、EWithSFour labels, represent starting word, middle word, terminating word and individually into the word of word for word respectively.If " I likes meter for input
Calculation machine.", correct annotation results for "S B E B I E S”(Punctuation mark typically also serves as component units and treats), will sentence
Son be divided into " I/like/computer/.”.The characteristics of sequence labelling method of the present invention is that mark speed is fast, system
Configuration requirement is low(Suitable for calculating and the limited equipment of storage resource), accuracy it is high.
In the natural language processing that the present invention is provided, the method for sequence labelling, comprises the following steps that:
(1)Each component units to corresponding language(Such as:The word of the word or English of Chinese)Training one vector representation, this to
Amount expression can be generated at random or carry out pre-training using unsupervised method(Such as:English word can be adopted
Word2Vector instruments [1], Chinese character can adopt the method described in list of references [2]), after training, by search to
The mode of scale is by each cell translation into corresponding vector representation.
(2)The tag set of various sequence labelling tasks is defined, determines which every kind of sequence labelling task marked respectively including
Sign.By taking Chinese word segmentation as an example, can adopt includesB、I、E、STag set, represent the beginning word of word, middle word, knot respectively
Beam word, the independent word into word.
(3)Prepare sequence in the natural language processings such as Chinese word segmentation, English shallow parsing, part-of-speech tagging, name identification
The language material of row mark task.
(4)Network structure is marked using rapid serial(As shown in Figure 1)Or the comprehensive front network structure to label information(Such as
Shown in Fig. 2), combined with Max-margin calculation using Perceptron-style algorithms or Perceptron-style algorithms
Method is trained to network.
As carried out network training using the rapid serial mark network structure and learning algorithm based on deep learning, which is quick
Sequence labelling network structure as shown in figure 1, wherein, each component units in sentence(Such as:The word of the word or English of Chinese)It is logical
The mode for crossing lookup vector table is converted into corresponding vector representation;The vector splicing of each component units and the unit around which
Into window feature matrix;Using one-dimensional convolution by window feature matrix conversion into window feature vector representation;To window feature to
Amount carries out nonlinear transformation and linear transformation successively, exports the dimension vector equal with task number of labels, vectorial each element
Represent the probability of corresponding label;Combination tag transition probability matrix, obtains a probability highest using Viterbi decoding algorithm
Sequence label as annotation results.
Concrete methods of realizing is:The label of one component units is typically related to its surrounding cells, thus network adopts window
Mouth mold type, i.e., when estimating that active cell belongs to the probability of certain label, using the unit of this unit and surrounding as defeated
Enter.If window size is arranged to 5, then it represents that using each two units of this unit and its left side and the right as input window.
If the character quantity on the left side and the right is not enough to the size that window specifies, replaced using special filler.
Unit in each input sentence will be converted into corresponding vector representation by way of searching vector table.It is each
The expression of individual unit can be generated at random or carry out pre-training using unsupervised method(Such as:English word can be adopted
Word2Vector instruments [1], Chinese character can adopt the method described in list of references [2]).The ginseng being stored in vector table
Number also can be constantly adjusted in training.These vectors are spliced into into eigenmatrix afterwards, the columns of eigenmatrix is window
Size, each vector representation for being classified as corresponding unit.
Then one-dimensional convolution algorithm is carried out to eigenmatrix, one-dimensional convolution is referred to for each row vector dot product of eigenmatrix
Corresponding parameter vector(Convolution kernel), using different convolution kernels when different rows vector carries out dot product operations.In one-dimensional convolution
Under effect, eigenmatrix is converted into and unit vector dimension identical vector, and the character representation of a certain window of the vector representation can
With regard as active cell around under the influence of unit produced by semantic feature represent.The reason for using one-dimensional convolution is not only
The parameter of model is reduced, and accelerates the training of model and using the required time.For example:Compared with list of references [3] and [4]
Methods described, number of parameters needed for model from(d×w×h)Drop to(d×w), wherein:dFor unit(Word or word)Vector dimension
Degree;wFor window size;hFor middle hidden neuron quantity.
After being then passed through a linear net network layers(Middle hidden layer), carried out using Sigmoid or hardTanh functions non-
Linear conversion, finally reuses a linear layer, exports the vector equal with task number of labels, vectorial each element representation
The probability of corresponding label.
A sentence is given, with window slip from left to right, network can export a matrix, each in matrix
Elementf θ (t|i)Represent the in sentenceiIndividual unit belongs to labeltProbability estimation, whereinθRepresent the parameter of network.
In sequence labelling task, due to there is very strong dependence between in front and back's label, matrix is introducedA ij Represent from labeliJump to mark
SignjProbability(It is also contained in parameter setsθIt is interior).Given one containsnThe sentence of individual units [1:n], can be isometric for certain
Sequence labelt [1:n]Carry out estimating point:
Score(s [1:n],t [1:n],θ)=(Formula 1)
In the case where network parameter is given, we can obtain a score value highest label sequence using Viterbi decoding algorithm
Row are used as annotation results.
The method of training is in training set, it is desirable to the maximum probability that the correct annotated sequence of each sample occurs:
(Formula 2)
Wherein:(s, t)A sample in expression training set.Training adopts gradient descent method, and all parameters of network are using following
Formula is updated:
(Formula 3)
Wherein:λRepresent Learning Step.
In 3 right side local derviation of computing formula, in order to avoid Index for Calculation exceedes double-precision number span and calculates complicated
Property higher problem, using Perceptron-style algorithms, i.e., only calculate the direction of parameter adjustment, and its size be all solid
Definite value 1, so as to simplify the calculating process of parameter adjustment, accelerates training speed.Concrete calculating process is as follows:Current network is joined
Several lower annotated sequences for comparing top score and correct annotated sequence, it is such as inconsistent, there is inconsistent position, setting causes
The local derviation of the outgoing position of mistake annotated sequence is 1, and the outgoing position local derviation of the correct annotated sequence of correspondence is+1.Same
Local derviation computational methods are also applied for transfer matrixA ij .In combination with using Max-margin methods when model parameter is trained, i.e., not
The highest scoring of correct annotated sequence is required nothing more than, and requires that its score value and the difference of incorrect annotated sequence top score exceed rule
Fixed threshold value.
If carrying out network training, Partial key network to the network structure and accelerating algorithm of label information before adopting comprehensively
Structure as shown in Fig. 2 wherein, each component units in sentence(Such as:The word of the word or English of Chinese)With each task mark
Sign corresponding vector representation is converted into by way of searching vector table;The vector of each component units and the unit around which
Window feature matrix is spliced into, and splicing is carried out by each possible label vector and window feature matrix one by one and produced containing front
To the window feature matrix of label;Feature of the front window feature matrix conversion to label into higher will be contained using one-dimensional convolution
Vector representation;Carry out nonlinear transformation and linear transformation, output dimension and task number of labels to the high-rise vector representation successively
Equal vector, the probability of vectorial each element representation corresponding label;Combination tag transition probability matrix, using Viterbi solution
Code algorithm obtains a probability highest sequence label as annotation results.
Concrete methods of realizing is:By vector table, each label is also corresponded to into a vector representation, and will be each
Individual possible label vector is carried out side by side with the eigenmatrix of current window, and it is corresponding to produce to carry out similar one-dimensional convolution afterwards
Semantic feature represent(The front unit of hypothesis is certain label).Network can be that each of each sentence component units may
Forward direction label export the vector equal with task number of labels, vectorial each element equally represents the possibility of corresponding label
Property.Conjugative tiansfer matrixA ij , a score value highest sequence label is obtained as annotation results using Viterbi decoding algorithm.
Window semantics character representation under assuming to label before difference is calculated can be shared to intermediate result, so as to
Accelerate network calculations speed.Specifically accelerated method is(Practical Calculation step is as shown in the numeral in Fig. 2):It is heavy which is calculated first
Folded part, i.e., to intermediate result during label before not considering;Then calculate the part that different label vectors affect;Finally by
Between result plus label affect part draw final calculation result.
When the sequence labelling task of natural language processing is carried out, the tag types of active cell not only with context(Week
Enclose unit)Correlation, it is also relevant with its forward direction tag types.For example:In " alliance " of Chinese, if the label of " connection " isB(Word
Start), the label of " alliance " is probablyE(The end of word), i.e., " alliance " is a word, it is also possible toI(The centre of word), such as " join
Party of alliance ";But if the label of " connection " isI, the label of " alliance " is most probablyE, such as " the Federal Republic of Yugoslavia ".English part-of-speech tagging task
There is analogue, such as:" work " can make noun, can also make verb.In " that work " phrase, if " that " is marked
Note as determiner, then " work " is likely to noun, and if " that " is noted as relative pronoun, " work " is then likely to
Verb.Thus the accuracy of all kinds of sequence labelling tasks before considering in a model, can be improved to the type of label.
(5)After newly-increased and extension language material, identical training algorithm can be adopted on training network parameter basis to parameter
It is adjusted, or re -training network completely.Training method in detail is shown in step(4)It is described.
(6)After network training terminates, in the case where network parameter is given, one point is obtained using Viterbi decoding algorithm
Result of the value highest sequence label as mark.
The above-mentioned sequence labelling method based on deep learning, feature is:
(1)It is special to produce the semanteme of window using one-dimensional convolution in the depth network for the sequence labelling of natural language processing
Expression is levied, the number of parameters of network model is reduced, training and the use time of network is accelerated;
(2)For the comprehensive front depth network structure and accelerating algorithm to label information of sequence labelling task;
(3)The depth algorithm for training network combined using Perceptron-style algorithms and Max-margin, that is, improved
The effect of training, accelerates the calculating process of network parameter adjustment, again so as to the training for reducing network and secondary customization time;
(4)Suggestion using word or term vector dimension be 50 ~ 300, window size is 3 or the function of 9, non-linear layer is
The network configuration of Sigmoid or hardTanh.
Invention effect
Model in sequence labelling method institute development system based on deep learning disclosed by the invention is using comprising with representative
Property field sample training set study after, performance as shown in table 1 is reached on test set:
1. model of table marks Performance comparision
Table 1 also compares the performance of typical network model at present.Conv-S represents disclosed by the invention based on deep learning
Sequence labelling method(To label condition before not considering), Conv-J represent combine before to label information web results.English
Part of speech analysis is using accuracy index, and other three tasks use F1 indexs.F1 index calculating methods are 2PR/(P + R),
WhereinPFor accuracy rate,RFor recall rate.Comparison of the table 2 for each model operating speed, in addition to Conv-S, other models are listed relatively
Multiple the time required to Conv-S models.With reference to Tables 1 and 2 as can be seen that Conv-J in various tasks, performance is most
It is good, and Conv-S when in use between be greatly reduced in the case of, show very competitive performance.
Compare the time required to the mark of table 2.
。
Term is explained
Natural language processing:An important branch in computer science and artificial intelligence field, research can realize people with
The various theoretical and methods of efficient communication are carried out between computer with natural language.Natural language processing is not usually to study
Natural language, and be to develop the computer system that can be effectively realized natural language communication, software system particularly therein;
Sequence labelling:According to given tag set, for the sentence of input, exported by computer program and respectively constituted in sentence
Unit(Such as:The word of the word or English of Chinese)Tag types.By taking Chinese word segmentation as an example, general employing includesB、I、EWithSFour
Label, represents starting word, middle word, terminating word and individually into the word of word for word respectively.If " I likes computer for input.",
Correct annotation results for "SB E B I E S”(Punctuation mark typically also serves as component units and treats), will sentence be divided into
" I/like/computer/.”.
Description of the drawings
Fig. 1. the rapid serial mark depth network structure based on one-dimensional convolution.
Fig. 2. the comprehensive front sequence labelling depth network key local structural graph to label information.
Specific embodiment
The invention discloses a kind of method that employing computer carries out sequence labelling in natural language processing automatically.To input
Sentence, according to the tag set of task definition, is each component units in sentence(Word or word)Select by its appearance order
Corresponding tag types, this is referred to as sequence labelling task in natural language processing field.Sequence labelling task can be used for Chinese
The various natural language processing tasks such as participle, English shallow parsing, Chinese and English part-of-speech tagging and name identification.Concrete steps
It is as follows:
(1)Each component units to corresponding language(Such as:The word of the word or English of Chinese)Correspondence one vector representation, this to
Amount expression can be generated at random or carry out pre-training using unsupervised method(Such as:English word can be adopted
Word2Vector instruments [1], Chinese character can adopt the method described in list of references [2]), can be by searching after training
The mode of vector table is by each cell translation into corresponding vector representation.
(2)The tag set of various sequence labelling tasks is defined, determines which every kind of sequence labelling task marked respectively including
Sign.By taking Chinese word segmentation as an example, can adopt includesB、I、E、STag set, represent the beginning word of word, middle word, knot respectively
Beam word, the independent word into word.
(3)Prepare sequence in the natural language processings such as Chinese word segmentation, English shallow parsing, part-of-speech tagging, name identification
The language material of row mark task.With Chinese word segmentation as row, previous column is word or punctuation mark in sentence, and rear string is corresponding mark
Sign.In whole corpus, separated with a null between each sentence.
IS
HappinessB
VigorouslyE
MeterB
CalculateI
MachineE
。S。
(4)Network structure is marked using rapid serial(As shown in Figure 1)Or the comprehensive front network structure to label information(Such as
Shown in Fig. 2), combined with Max-margin calculation using Perceptron-style algorithms or Perceptron-style algorithms
Method is trained to network.
As using the rapid serial mark network structure and learning algorithm based on deep learning, its rapid serial marks network
Structure is as shown in Figure 1.Specially:Each component units in sentence(Such as:The word of the word or English of Chinese)By searching vector
The mode of table is converted into corresponding vector representation;The vector of each component units and the unit around which is spliced into window feature
Matrix;Using one-dimensional convolution by window feature matrix conversion into window feature vector representation;Window feature vector is carried out successively
Nonlinear transformation and linear transformation, export the dimension vector equal with task number of labels, and vectorial each element representation correspondence is marked
The probability of label;Combination tag transition probability matrix, obtains a probability highest sequence label using Viterbi decoding algorithm
As annotation results.
Concrete methods of realizing is:The label of one component units is typically related to its surrounding cells, thus network adopts window
Mouth mold type, i.e., when estimating that active cell belongs to the probability of certain label, using the unit of this unit and surrounding as defeated
Enter.If window size is arranged to 5, then it represents that using each two units of this unit and its left side and the right as input window.
If the character quantity on the left side and the right is not enough to the size that window specifies, replaced using special filler.
Unit in each input sentence will be converted into corresponding vector representation by way of searching vector table, afterwards
These vectors are spliced into into eigenmatrix, the columns of eigenmatrix is window size, each vector representation for being classified as corresponding unit.
Then one-dimensional convolution algorithm is carried out to eigenmatrix, one-dimensional convolution is referred to joins accordingly for each row vector dot product of eigenmatrix
Number vector(Convolution kernel), using different convolution kernels when different rows vector carries out dot product operations.It is in the presence of one-dimensional convolution, special
Levy matrix conversion into unit vector dimension identical vector, the character representation of a certain window of the vector representation can regard as
Active cell around under the influence of unit produced by semantic feature represent.
After being then passed through a linear net network layers(Middle hidden layer), carried out using Sigmoid or hardTanh functions non-
Linear conversion, finally reuses a linear layer, exports the vector equal with task number of labels, vectorial each element representation
The probability of corresponding label.
A sentence is given, with window slip from left to right, network can export a matrix, each in matrix
Elementf θ (t|i)Represent the in sentenceiIndividual unit belongs to labeltProbability estimation, whereinθRepresent the parameter of network.
In sequence labelling task, due to there is very strong dependence between in front and back's label, matrix is introducedA ij Represent from labeliJump to mark
SignjProbability(It is also contained in parameter setsθIt is interior).Given one containsnThe sentence of individual units [1:n], can be isometric for certain
Sequence labelt [1:n]Carry out estimating point:
Score(s [1:n], t [1:n],θ) =(Formula 1)
In the case where network parameter is given, we can obtain a score value highest label sequence using Viterbi decoding algorithm
Row are used as annotation results.
The method of training is in training set, it is desirable to the maximum probability that the correct annotated sequence of each sample occurs:
(Formula 2)
Wherein:(s, t)A sample in expression training set.Training adopts gradient descent method, and all parameters of network are using following
Formula is updated:
(Formula 3)
Wherein:λRepresent Learning Step.
In 3 right side local derviation of computing formula, in order to avoid Index for Calculation exceedes double-precision number span and calculates complicated
Property higher problem, using Perceptron-style algorithms, i.e., only calculate the direction of parameter adjustment, and its size be all solid
Definite value 1, so as to simplify the calculating process of parameter adjustment, accelerates training speed.Concrete calculating process is as follows:Current network is joined
Several lower annotated sequences for comparing top score and correct annotated sequence, it is such as inconsistent, there is inconsistent position, setting causes
The local derviation of the outgoing position of mistake annotated sequence is 1, and the outgoing position local derviation of the correct annotated sequence of correspondence is+1.Same
Local derviation computational methods are also applied for transfer matrixA ij .In combination with using Max-margin methods when model parameter is trained, i.e., not
The highest scoring of correct annotated sequence is required nothing more than, and requires that its score value and the difference of incorrect annotated sequence top score exceed rule
Fixed threshold value.
If using comprehensive front network structure and accelerating algorithm to label information, Partial key network structure such as Fig. 2 institutes
Show, wherein, each component units in sentence(Such as:The word of the word or English of Chinese)With each task label by search to
The mode of scale is converted into corresponding vector representation;It is special that the vector of each component units and the unit around which is spliced into window
Matrix is levied, then splicing is carried out from each possible label vector and window feature matrix one by one and produced containing the front window to label
Eigenmatrix;To be represented into the characteristic vector of higher to the window feature matrix conversion of label containing front using one-dimensional convolution;It is right
The high-rise vector representation carries out nonlinear transformation and linear transformation successively, exports the dimension vector equal with task number of labels,
The probability of each element representation corresponding label of vector;Combination tag transition probability matrix, is obtained using Viterbi decoding algorithm
One probability highest sequence label is used as annotation results.
Concrete methods of realizing is:By vector table, each label is also corresponded to into a vector representation, and will be each
Individual possible label vector is carried out side by side with the eigenmatrix of current window, and it is corresponding to produce to carry out similar one-dimensional convolution afterwards
Semantic feature represent(The front unit of hypothesis is certain label).Network can be that each of each sentence component units may
Forward direction label export the vector equal with task number of labels, vectorial each element equally represents the possibility of corresponding label
Property.Conjugative tiansfer matrixA ij , a score value highest sequence label is obtained as annotation results using Viterbi decoding algorithm.
Window semantics character representation under assuming to label before difference is calculated can be shared to intermediate result, so as to
Accelerate network calculations speed.Specifically accelerated method is(Practical Calculation step is as shown in the numeral in Fig. 2):It is heavy which is calculated first
Folded part, i.e., to intermediate result during label before not considering;Then calculate the part that different label vectors affect;Finally by
Between result plus label affect part draw final calculation result.
(5)After newly-increased and extension language material, identical training algorithm can be adopted on training network parameter basis to parameter
It is adjusted, or re -training network completely.Training method in detail is shown in step(4)It is described.
(6)After network training terminates, in the case where network parameter is given, one point is obtained using Viterbi decoding algorithm
Result of the value highest sequence label as mark.
List of references
[1] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of
word representations in vector space. CoRR abs/1301.3781, 2013.
[2] Xiaoqing Zheng, JiangtaoFeng, Mengxiao Lin, Wenqiang Zhang. Context-
specific and multi-prototype character representations. In Proc. The Twenty- Fifth International Joint Conference on Artificial Intelligence (IJCAI’16), 2016.
[3] Ronan Collobert, Jason Weston, Léon Bottou, MichaelKarlen,
KorayKavukcuoglu, and PavelKuksa.Natural language processing (almost) from
scratch.Journal of Machine Learning Research, 12:2493–2537, 2011
[4] Xiaoqing Zheng, Hanyang Chen, and TianyuXu. Deep learning for Chinese
word segmentation and postagging. In Proceedings of the International Conferenceon Empirical Methods in Natural Language Processing(EMNLP’13),
2013.
[5] Wenzhe Pei, Tao Ge, and Baobao Chang.Maxmargintensor neural network
for chinsese word segmentation.In Proceedings of the 52nd Annual Meetingof the Association for Computational Linguistics(ACL’14), 2014.
[6] Pengfei Liu, ShafiqJoty, and Helen Heng.Finegrainedopinion mining
with recurrent neural networksand word embeddings. In Proceedings of the InternationalConference on Empirical Methods in NaturalLanguage Processing (EMNLP’15), 2015。
Claims (1)
1. a kind of sequence labelling method in natural language processing based on deep learning, be with computer to read statement,
According to the tag set of task definition, it is that each component units i.e. word or word in sentence selects corresponding by its appearance order
Tag types;Characterized in that, concretely comprising the following steps:
(1)For each component units one vector representation of correspondence of corresponding language, the vector representation can generate at random or
Pre-training is carried out using unsupervised method, after training, by each cell translation into phase by way of searching vector table
The vector representation answered;
(2)The tag set of various sequence labelling tasks is defined, determines which label is every kind of sequence labelling task include respectively;
(3)Prepare sequence mark in the natural language processings such as Chinese word segmentation, English shallow parsing, part-of-speech tagging, name identification
The language material of note task;
(4)Network structure or the comprehensive front network structure to label information are marked using rapid serial, using Perceptron-
The style algorithms or Perceptron-style algorithms algorithm that combined with Max-margin is trained to network;
As carried out network training using the rapid serial mark network structure and learning algorithm based on deep learning, its rapid serial
In mark network structure, the label of a component units is related to its surrounding cells, thus network adopts window model, that is, estimating
When meter active cell belongs to the probability of certain label, using the unit of this unit and surrounding as input;If window is big
It is little to be arranged to 5, then it represents that using each two units of this unit and its left side and the right as input window;If the left side and the right
Character quantity be not enough to the size that window specifies, then replaced using special filler;
Unit in each input sentence is converted into corresponding vector representation by way of searching vector table;Each unit
Expression generate or pre-training carried out using unsupervised method with random;Parameter in vector table is stored in also in training
Constantly adjusted;These vectors are spliced into into eigenmatrix afterwards, the columns of eigenmatrix is window size, each to be classified as right
Answer the vector representation of unit;
Then one-dimensional convolution algorithm is carried out to eigenmatrix, one-dimensional convolution refers to corresponding for each row vector dot product of eigenmatrix
Parameter vector be convolution kernel, using different convolution kernels when different rows vector carries out dot product operations;In the effect of one-dimensional convolution
Under, eigenmatrix is converted into and unit vector dimension identical vector, and the character representation of a certain window of the vector representation can be seen
Into be active cell around under the influence of unit produced by semantic feature represent;
After being then passed through a linear net network layers, nonlinear conversion is carried out using Sigmoid or hardTanh functions, finally
A linear layer is reused, the vector equal with task number of labels, the possibility of vectorial each element representation corresponding label is exported
Property;
A sentence is given, with window slip from left to right, network exports a matrix, each element in matrixf θ
(t|i)Represent the in sentenceiIndividual unit belongs to labeltProbability estimation, whereinθRepresent the parameter of network;In sequence mark
In note task, due to there is very strong dependence between in front and back's label, matrix is introducedA ij Represent from labeliJump to labelj's
Probability;Given one containsnThe sentence of individual units [1:n], it is certain isometric sequence labelt [1:n]Carry out estimating point:
Score(s [1:n],t [1:n],θ)=(Formula 1)
In the case where network parameter is given, a score value highest sequence label is obtained as mark using Viterbi decoding algorithm
Note result;
The method of training is in training set, it is desirable to the maximum probability that the correct annotated sequence of each sample occurs:
(Formula 2)
Wherein:(s, t)A sample in expression training set;Training adopts gradient descent method, and all parameters of network are using following
Formula is updated:
(Formula 3)
Wherein:λRepresent Learning Step;
In 3 right side local derviation of computing formula, using Perceptron-style algorithms, i.e., the direction of parameter adjustment is only calculated,
And its size is all fixed value 1, concrete calculating process is as follows:Compare the annotated sequence and just of top score under current network parameter
Really annotated sequence, such as inconsistent, and in the inconsistent position of generation, the local derviation for arranging the outgoing position for causing mistake annotated sequence is
1, and the outgoing position local derviation of the correct annotated sequence of correspondence is+1;Same local derviation computational methods are also applied for transfer matrixA ij ;
In combination with using Max-margin methods when model parameter is trained, i.e., the highest scoring of correct annotated sequence is not required nothing more than, and
And require that its score value exceedes the threshold value for specifying with the difference of incorrect annotated sequence top score;
If carrying out network training to the network structure and accelerating algorithm of label information before adopting comprehensively, concrete methods of realizing is:
By vector table, each label is also corresponded to into a vector representation, and by each possible label vector with
The eigenmatrix of current window is carried out side by side, is carried out similar one-dimensional convolution afterwards and is represented producing corresponding semantic feature;Net
Network is that each possible forward direction label of each sentence component units exports the vector equal with task number of labels, vector
Each element equally represent the probability of corresponding label;Conjugative tiansfer matrixA ij , using Viterbi decoding algorithm, obtain one
Score value highest sequence label is used as annotation results;
Window semantics character representation under assuming to label before difference is calculated can be shared to intermediate result, so as to accelerate
Network calculations speed, concrete accelerated method is:The part of its overlap is calculated first, i.e., tie in the middle of during label before not considering
Really;Then calculate the part that different label vectors affect;Intermediate result is drawn into final meter plus the part that label affects finally
Calculate result;
(5)After newly-increased and extension language material, parameter is adjusted using identical training algorithm on training network parameter basis,
Or complete re -training network;Concrete training method is shown in step(4)It is described;
(6)After network training terminates, in the case where network parameter is given, using Viterbi decoding algorithm, a score value is obtained
Result of the highest sequence label as mark.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610950893.5A CN106547737B (en) | 2016-10-25 | 2016-10-25 | Sequence labeling method in natural language processing based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610950893.5A CN106547737B (en) | 2016-10-25 | 2016-10-25 | Sequence labeling method in natural language processing based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106547737A true CN106547737A (en) | 2017-03-29 |
CN106547737B CN106547737B (en) | 2020-05-12 |
Family
ID=58392799
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610950893.5A Active CN106547737B (en) | 2016-10-25 | 2016-10-25 | Sequence labeling method in natural language processing based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106547737B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107273355A (en) * | 2017-06-12 | 2017-10-20 | 大连理工大学 | A kind of Chinese word vector generation method based on words joint training |
CN107832302A (en) * | 2017-11-22 | 2018-03-23 | 北京百度网讯科技有限公司 | Participle processing method, device, mobile terminal and computer-readable recording medium |
CN107832301A (en) * | 2017-11-22 | 2018-03-23 | 北京百度网讯科技有限公司 | Participle processing method, device, mobile terminal and computer-readable recording medium |
CN107894971A (en) * | 2017-10-27 | 2018-04-10 | 北京大学 | A kind of expansible sequence labelling method based on neutral net |
CN108009285A (en) * | 2017-12-22 | 2018-05-08 | 重庆邮电大学 | Forest Ecology man-machine interaction method based on natural language processing |
CN108549628A (en) * | 2018-03-16 | 2018-09-18 | 北京云知声信息技术有限公司 | The punctuate device and method of streaming natural language information |
CN109635157A (en) * | 2018-10-30 | 2019-04-16 | 北京奇艺世纪科技有限公司 | Model generating method, video searching method, device, terminal and storage medium |
CN109976807A (en) * | 2019-01-14 | 2019-07-05 | 浙江工商大学 | A kind of critical packet recognition methods based on software operational network |
CN110047463A (en) * | 2019-01-31 | 2019-07-23 | 北京捷通华声科技股份有限公司 | A kind of phoneme synthesizing method, device and electronic equipment |
CN110232182A (en) * | 2018-04-10 | 2019-09-13 | 蔚来汽车有限公司 | Method for recognizing semantics, device and speech dialogue system |
CN110245353A (en) * | 2019-06-20 | 2019-09-17 | 腾讯科技(深圳)有限公司 | Natural language representation method, device, equipment and storage medium |
CN110399614A (en) * | 2018-07-26 | 2019-11-01 | 北京京东尚科信息技术有限公司 | System and method for the identification of true product word |
CN110852386A (en) * | 2019-11-13 | 2020-02-28 | 精硕科技(北京)股份有限公司 | Data classification method and device, computer equipment and readable storage medium |
WO2021017268A1 (en) * | 2019-07-30 | 2021-02-04 | 平安科技(深圳)有限公司 | Double-architecture-based sequence labeling method, device, and computer device |
CN112989801A (en) * | 2021-05-11 | 2021-06-18 | 华南师范大学 | Sequence labeling method, device and equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2950306A1 (en) * | 2014-05-29 | 2015-12-02 | Samsung Electronics Polska Spolka z organiczona odpowiedzialnoscia | A method and system for building a language model |
US9298702B1 (en) * | 2008-11-18 | 2016-03-29 | Semantic Research Inc. | Systems and methods for pairing of a semantic network and a natural language processing information extraction system |
CN105512209A (en) * | 2015-11-28 | 2016-04-20 | 大连理工大学 | Biomedicine event trigger word identification method based on characteristic automatic learning |
CN105894088A (en) * | 2016-03-25 | 2016-08-24 | 苏州赫博特医疗信息科技有限公司 | Medical information extraction system and method based on depth learning and distributed semantic features |
CN105955953A (en) * | 2016-05-03 | 2016-09-21 | 成都数联铭品科技有限公司 | Word segmentation system |
CN106022239A (en) * | 2016-05-13 | 2016-10-12 | 电子科技大学 | Multi-target tracking method based on recurrent neural network |
CN106021227A (en) * | 2016-05-16 | 2016-10-12 | 南京大学 | State transition and neural network-based Chinese chunk parsing method |
-
2016
- 2016-10-25 CN CN201610950893.5A patent/CN106547737B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9298702B1 (en) * | 2008-11-18 | 2016-03-29 | Semantic Research Inc. | Systems and methods for pairing of a semantic network and a natural language processing information extraction system |
EP2950306A1 (en) * | 2014-05-29 | 2015-12-02 | Samsung Electronics Polska Spolka z organiczona odpowiedzialnoscia | A method and system for building a language model |
CN105512209A (en) * | 2015-11-28 | 2016-04-20 | 大连理工大学 | Biomedicine event trigger word identification method based on characteristic automatic learning |
CN105894088A (en) * | 2016-03-25 | 2016-08-24 | 苏州赫博特医疗信息科技有限公司 | Medical information extraction system and method based on depth learning and distributed semantic features |
CN105955953A (en) * | 2016-05-03 | 2016-09-21 | 成都数联铭品科技有限公司 | Word segmentation system |
CN106022239A (en) * | 2016-05-13 | 2016-10-12 | 电子科技大学 | Multi-target tracking method based on recurrent neural network |
CN106021227A (en) * | 2016-05-16 | 2016-10-12 | 南京大学 | State transition and neural network-based Chinese chunk parsing method |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107273355A (en) * | 2017-06-12 | 2017-10-20 | 大连理工大学 | A kind of Chinese word vector generation method based on words joint training |
CN107273355B (en) * | 2017-06-12 | 2020-07-14 | 大连理工大学 | Chinese word vector generation method based on word and phrase joint training |
CN107894971B (en) * | 2017-10-27 | 2019-11-26 | 北京大学 | A kind of expansible sequence labelling method neural network based |
CN107894971A (en) * | 2017-10-27 | 2018-04-10 | 北京大学 | A kind of expansible sequence labelling method based on neutral net |
CN107832302A (en) * | 2017-11-22 | 2018-03-23 | 北京百度网讯科技有限公司 | Participle processing method, device, mobile terminal and computer-readable recording medium |
CN107832301A (en) * | 2017-11-22 | 2018-03-23 | 北京百度网讯科技有限公司 | Participle processing method, device, mobile terminal and computer-readable recording medium |
CN107832301B (en) * | 2017-11-22 | 2021-09-17 | 北京百度网讯科技有限公司 | Word segmentation processing method and device, mobile terminal and computer readable storage medium |
CN108009285A (en) * | 2017-12-22 | 2018-05-08 | 重庆邮电大学 | Forest Ecology man-machine interaction method based on natural language processing |
CN108549628A (en) * | 2018-03-16 | 2018-09-18 | 北京云知声信息技术有限公司 | The punctuate device and method of streaming natural language information |
CN110232182B (en) * | 2018-04-10 | 2023-05-16 | 蔚来控股有限公司 | Semantic recognition method and device and voice dialogue system |
CN110232182A (en) * | 2018-04-10 | 2019-09-13 | 蔚来汽车有限公司 | Method for recognizing semantics, device and speech dialogue system |
CN110399614A (en) * | 2018-07-26 | 2019-11-01 | 北京京东尚科信息技术有限公司 | System and method for the identification of true product word |
CN109635157A (en) * | 2018-10-30 | 2019-04-16 | 北京奇艺世纪科技有限公司 | Model generating method, video searching method, device, terminal and storage medium |
CN109976807B (en) * | 2019-01-14 | 2022-11-25 | 深圳游禧科技有限公司 | Key package identification method based on software operation network |
CN109976807A (en) * | 2019-01-14 | 2019-07-05 | 浙江工商大学 | A kind of critical packet recognition methods based on software operational network |
CN110047463B (en) * | 2019-01-31 | 2021-03-02 | 北京捷通华声科技股份有限公司 | Voice synthesis method and device and electronic equipment |
CN110047463A (en) * | 2019-01-31 | 2019-07-23 | 北京捷通华声科技股份有限公司 | A kind of phoneme synthesizing method, device and electronic equipment |
CN110245353A (en) * | 2019-06-20 | 2019-09-17 | 腾讯科技(深圳)有限公司 | Natural language representation method, device, equipment and storage medium |
CN110245353B (en) * | 2019-06-20 | 2022-10-28 | 腾讯科技(深圳)有限公司 | Natural language expression method, device, equipment and storage medium |
WO2021017268A1 (en) * | 2019-07-30 | 2021-02-04 | 平安科技(深圳)有限公司 | Double-architecture-based sequence labeling method, device, and computer device |
CN110852386A (en) * | 2019-11-13 | 2020-02-28 | 精硕科技(北京)股份有限公司 | Data classification method and device, computer equipment and readable storage medium |
CN110852386B (en) * | 2019-11-13 | 2023-05-02 | 北京秒针人工智能科技有限公司 | Data classification method, apparatus, computer device and readable storage medium |
CN112989801A (en) * | 2021-05-11 | 2021-06-18 | 华南师范大学 | Sequence labeling method, device and equipment |
CN112989801B (en) * | 2021-05-11 | 2021-08-13 | 华南师范大学 | Sequence labeling method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN106547737B (en) | 2020-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106547737A (en) | Based on the sequence labelling method in the natural language processing of deep learning | |
CN111612103B (en) | Image description generation method, system and medium combined with abstract semantic representation | |
CN109783817B (en) | Text semantic similarity calculation model based on deep reinforcement learning | |
CN107563498B (en) | Image description method and system based on visual and semantic attention combined strategy | |
CN107480143B (en) | Method and system for segmenting conversation topics based on context correlation | |
CN108446271B (en) | Text emotion analysis method of convolutional neural network based on Chinese character component characteristics | |
CN111145728B (en) | Speech recognition model training method, system, mobile terminal and storage medium | |
CN107526834B (en) | Word2vec improvement method for training correlation factors of united parts of speech and word order | |
CN111709243B (en) | Knowledge extraction method and device based on deep learning | |
CN110705294A (en) | Named entity recognition model training method, named entity recognition method and device | |
CN109710744B (en) | Data matching method, device, equipment and storage medium | |
JP7291183B2 (en) | Methods, apparatus, devices, media, and program products for training models | |
CN109635124A (en) | A kind of remote supervisory Relation extraction method of combination background knowledge | |
CN112990296B (en) | Image-text matching model compression and acceleration method and system based on orthogonal similarity distillation | |
CN106547735A (en) | The structure and using method of the dynamic word or word vector based on the context-aware of deep learning | |
CN110879938A (en) | Text emotion classification method, device, equipment and storage medium | |
CN110826298B (en) | Statement coding method used in intelligent auxiliary password-fixing system | |
CN113705237A (en) | Relation extraction method and device fusing relation phrase knowledge and electronic equipment | |
CN112489622A (en) | Method and system for recognizing voice content of multi-language continuous voice stream | |
CN114254645A (en) | Artificial intelligence auxiliary writing system | |
CN110245353B (en) | Natural language expression method, device, equipment and storage medium | |
CN113282721A (en) | Visual question-answering method based on network structure search | |
CN110610006B (en) | Morphological double-channel Chinese word embedding method based on strokes and fonts | |
CN109670171B (en) | Word vector representation learning method based on word pair asymmetric co-occurrence | |
Fernandes et al. | Entropy-guided feature generation for structured learning of Portuguese dependency parsing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |