CN106547737A - Based on the sequence labelling method in the natural language processing of deep learning - Google Patents

Based on the sequence labelling method in the natural language processing of deep learning Download PDF

Info

Publication number
CN106547737A
CN106547737A CN201610950893.5A CN201610950893A CN106547737A CN 106547737 A CN106547737 A CN 106547737A CN 201610950893 A CN201610950893 A CN 201610950893A CN 106547737 A CN106547737 A CN 106547737A
Authority
CN
China
Prior art keywords
label
network
vector
training
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610950893.5A
Other languages
Chinese (zh)
Other versions
CN106547737B (en
Inventor
郑骁庆
陈易
林孟潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201610950893.5A priority Critical patent/CN106547737B/en
Publication of CN106547737A publication Critical patent/CN106547737A/en
Application granted granted Critical
Publication of CN106547737B publication Critical patent/CN106547737B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention belongs to Computer Natural Language Processing technical field, specially based on the sequence labelling method in the natural language processing of deep learning.The present invention can be used for the sequence labelling task for including the various natural languages such as Chinese word segmentation, English shallow parsing, Chinese and English part-of-speech tagging and name identification.Using depth learning technology, for the sentence being input into, the tag types of each component units in sentence are exported by computer program.The key of the sequence labelling method includes:Rapid serial mark network structure and learning algorithm, comprehensive front network structure and accelerating algorithm to label information based on deep learning, and integration and the integration mode of these key technologies.The system realized based on deep learning possesses parameter small scale, the fast advantage of operating speed, it is very suitable for the limited environment of computing resource, can be deployed on the relatively limited mobile computing platform of the computing resources such as mobile phone, can significantly improve system response time and user satisfaction.

Description

Based on the sequence labelling method in the natural language processing of deep learning
Technical field
The invention belongs to Computer Natural Language Processing technical field, and in particular to sequence mark in a kind of natural language processing Injecting method.
Background technology
Deep learning is the progress of recent making a breakthrough property of artificial intelligence study, and it finishes artificial intelligence and is up to 10 years not There can be the situation of breakthrough, and it is rapid in industrial quarters generation impact.Deep learning is different from Narrow artificial intelligence system(Towards the functional simulation of particular task), as general artificial intelligence technology, can tackle Various situations and problem, obtain extremely successfully applying in fields such as image recognition, speech recognitions, lead in natural language processing Domain(It is mainly English)Also obtain certain effect.Deep learning is to realize that artificial intelligence is most effective at present, be also acquirement effect most Big implementation.
Relatively conventional art, the system realized based on deep learning are also equipped with parameter small scale, the fast advantage of operating speed, It is very suitable for the limited environment of computing resource.
In natural language processing field, for sequence labelling, including Chinese word segmentation, English shallow parsing, Chinese and English The sequence labelling problem in various natural language processings such as part-of-speech tagging and name identification, although existing based on depth network Method has been able to reach the performance similar to traditional method, but the parameter amount that its model is included is still more, and use time is still So longer, mark performance need further to improve.For the problems referred to above, the present invention proposes a kind of new based on deep learning Rapid serial mask method, not only accelerate by a relatively large margin network mark needed for training and use time, while can be comprehensive To tag types improving the accuracy of mark before closing.
The content of the invention
It is an object of the invention to propose sequence mark in the short natural language processing of a kind of mark accuracy height, required time The method of note.
The method of sequence labelling in the natural language processing that the present invention is provided, be with computer to read statement, according to The tag set of task definition, is each component units in sentence(Word or word)Corresponding label is selected by its appearance order Type.The method can be used for including that Chinese word segmentation, English shallow parsing, Chinese and English part-of-speech tagging and name identification etc. are each Plant the sequence labelling task in natural language.
Sequence labelling refers to the tag set according to task, for the sentence of input, exports sentence by computer program In each component units(Such as:The word of the word or English of Chinese)Tag types.By taking Chinese word segmentation as an example, general employing includesBIEWithSFour labels, represent starting word, middle word, terminating word and individually into the word of word for word respectively.If " I likes meter for input Calculation machine.", correct annotation results for "S B E B I E S”(Punctuation mark typically also serves as component units and treats), will sentence Son be divided into " I/like/computer/.”.The characteristics of sequence labelling method of the present invention is that mark speed is fast, system Configuration requirement is low(Suitable for calculating and the limited equipment of storage resource), accuracy it is high.
In the natural language processing that the present invention is provided, the method for sequence labelling, comprises the following steps that:
(1)Each component units to corresponding language(Such as:The word of the word or English of Chinese)Training one vector representation, this to Amount expression can be generated at random or carry out pre-training using unsupervised method(Such as:English word can be adopted Word2Vector instruments [1], Chinese character can adopt the method described in list of references [2]), after training, by search to The mode of scale is by each cell translation into corresponding vector representation.
(2)The tag set of various sequence labelling tasks is defined, determines which every kind of sequence labelling task marked respectively including Sign.By taking Chinese word segmentation as an example, can adopt includesBIESTag set, represent the beginning word of word, middle word, knot respectively Beam word, the independent word into word.
(3)Prepare sequence in the natural language processings such as Chinese word segmentation, English shallow parsing, part-of-speech tagging, name identification The language material of row mark task.
(4)Network structure is marked using rapid serial(As shown in Figure 1)Or the comprehensive front network structure to label information(Such as Shown in Fig. 2), combined with Max-margin calculation using Perceptron-style algorithms or Perceptron-style algorithms Method is trained to network.
As carried out network training using the rapid serial mark network structure and learning algorithm based on deep learning, which is quick Sequence labelling network structure as shown in figure 1, wherein, each component units in sentence(Such as:The word of the word or English of Chinese)It is logical The mode for crossing lookup vector table is converted into corresponding vector representation;The vector splicing of each component units and the unit around which Into window feature matrix;Using one-dimensional convolution by window feature matrix conversion into window feature vector representation;To window feature to Amount carries out nonlinear transformation and linear transformation successively, exports the dimension vector equal with task number of labels, vectorial each element Represent the probability of corresponding label;Combination tag transition probability matrix, obtains a probability highest using Viterbi decoding algorithm Sequence label as annotation results.
Concrete methods of realizing is:The label of one component units is typically related to its surrounding cells, thus network adopts window Mouth mold type, i.e., when estimating that active cell belongs to the probability of certain label, using the unit of this unit and surrounding as defeated Enter.If window size is arranged to 5, then it represents that using each two units of this unit and its left side and the right as input window. If the character quantity on the left side and the right is not enough to the size that window specifies, replaced using special filler.
Unit in each input sentence will be converted into corresponding vector representation by way of searching vector table.It is each The expression of individual unit can be generated at random or carry out pre-training using unsupervised method(Such as:English word can be adopted Word2Vector instruments [1], Chinese character can adopt the method described in list of references [2]).The ginseng being stored in vector table Number also can be constantly adjusted in training.These vectors are spliced into into eigenmatrix afterwards, the columns of eigenmatrix is window Size, each vector representation for being classified as corresponding unit.
Then one-dimensional convolution algorithm is carried out to eigenmatrix, one-dimensional convolution is referred to for each row vector dot product of eigenmatrix Corresponding parameter vector(Convolution kernel), using different convolution kernels when different rows vector carries out dot product operations.In one-dimensional convolution Under effect, eigenmatrix is converted into and unit vector dimension identical vector, and the character representation of a certain window of the vector representation can With regard as active cell around under the influence of unit produced by semantic feature represent.The reason for using one-dimensional convolution is not only The parameter of model is reduced, and accelerates the training of model and using the required time.For example:Compared with list of references [3] and [4] Methods described, number of parameters needed for model from(d×w×h)Drop to(d×w), wherein:dFor unit(Word or word)Vector dimension Degree;wFor window size;hFor middle hidden neuron quantity.
After being then passed through a linear net network layers(Middle hidden layer), carried out using Sigmoid or hardTanh functions non- Linear conversion, finally reuses a linear layer, exports the vector equal with task number of labels, vectorial each element representation The probability of corresponding label.
A sentence is given, with window slip from left to right, network can export a matrix, each in matrix Elementf θ t|i)Represent the in sentenceiIndividual unit belongs to labeltProbability estimation, whereinθRepresent the parameter of network. In sequence labelling task, due to there is very strong dependence between in front and back's label, matrix is introducedA ij Represent from labeliJump to mark SignjProbability(It is also contained in parameter setsθIt is interior).Given one containsnThe sentence of individual units [1:n], can be isometric for certain Sequence labelt [1:n]Carry out estimating point:
Scores [1:n],t [1:n],θ)=(Formula 1)
In the case where network parameter is given, we can obtain a score value highest label sequence using Viterbi decoding algorithm Row are used as annotation results.
The method of training is in training set, it is desirable to the maximum probability that the correct annotated sequence of each sample occurs:
(Formula 2)
Wherein:(s, t)A sample in expression training set.Training adopts gradient descent method, and all parameters of network are using following Formula is updated:
(Formula 3)
Wherein:λRepresent Learning Step.
In 3 right side local derviation of computing formula, in order to avoid Index for Calculation exceedes double-precision number span and calculates complicated Property higher problem, using Perceptron-style algorithms, i.e., only calculate the direction of parameter adjustment, and its size be all solid Definite value 1, so as to simplify the calculating process of parameter adjustment, accelerates training speed.Concrete calculating process is as follows:Current network is joined Several lower annotated sequences for comparing top score and correct annotated sequence, it is such as inconsistent, there is inconsistent position, setting causes The local derviation of the outgoing position of mistake annotated sequence is 1, and the outgoing position local derviation of the correct annotated sequence of correspondence is+1.Same Local derviation computational methods are also applied for transfer matrixA ij .In combination with using Max-margin methods when model parameter is trained, i.e., not The highest scoring of correct annotated sequence is required nothing more than, and requires that its score value and the difference of incorrect annotated sequence top score exceed rule Fixed threshold value.
If carrying out network training, Partial key network to the network structure and accelerating algorithm of label information before adopting comprehensively Structure as shown in Fig. 2 wherein, each component units in sentence(Such as:The word of the word or English of Chinese)With each task mark Sign corresponding vector representation is converted into by way of searching vector table;The vector of each component units and the unit around which Window feature matrix is spliced into, and splicing is carried out by each possible label vector and window feature matrix one by one and produced containing front To the window feature matrix of label;Feature of the front window feature matrix conversion to label into higher will be contained using one-dimensional convolution Vector representation;Carry out nonlinear transformation and linear transformation, output dimension and task number of labels to the high-rise vector representation successively Equal vector, the probability of vectorial each element representation corresponding label;Combination tag transition probability matrix, using Viterbi solution Code algorithm obtains a probability highest sequence label as annotation results.
Concrete methods of realizing is:By vector table, each label is also corresponded to into a vector representation, and will be each Individual possible label vector is carried out side by side with the eigenmatrix of current window, and it is corresponding to produce to carry out similar one-dimensional convolution afterwards Semantic feature represent(The front unit of hypothesis is certain label).Network can be that each of each sentence component units may Forward direction label export the vector equal with task number of labels, vectorial each element equally represents the possibility of corresponding label Property.Conjugative tiansfer matrixA ij , a score value highest sequence label is obtained as annotation results using Viterbi decoding algorithm.
Window semantics character representation under assuming to label before difference is calculated can be shared to intermediate result, so as to Accelerate network calculations speed.Specifically accelerated method is(Practical Calculation step is as shown in the numeral in Fig. 2):It is heavy which is calculated first Folded part, i.e., to intermediate result during label before not considering;Then calculate the part that different label vectors affect;Finally by Between result plus label affect part draw final calculation result.
When the sequence labelling task of natural language processing is carried out, the tag types of active cell not only with context(Week Enclose unit)Correlation, it is also relevant with its forward direction tag types.For example:In " alliance " of Chinese, if the label of " connection " isB(Word Start), the label of " alliance " is probablyE(The end of word), i.e., " alliance " is a word, it is also possible toI(The centre of word), such as " join Party of alliance ";But if the label of " connection " isI, the label of " alliance " is most probablyE, such as " the Federal Republic of Yugoslavia ".English part-of-speech tagging task There is analogue, such as:" work " can make noun, can also make verb.In " that work " phrase, if " that " is marked Note as determiner, then " work " is likely to noun, and if " that " is noted as relative pronoun, " work " is then likely to Verb.Thus the accuracy of all kinds of sequence labelling tasks before considering in a model, can be improved to the type of label.
(5)After newly-increased and extension language material, identical training algorithm can be adopted on training network parameter basis to parameter It is adjusted, or re -training network completely.Training method in detail is shown in step(4)It is described.
(6)After network training terminates, in the case where network parameter is given, one point is obtained using Viterbi decoding algorithm Result of the value highest sequence label as mark.
The above-mentioned sequence labelling method based on deep learning, feature is:
(1)It is special to produce the semanteme of window using one-dimensional convolution in the depth network for the sequence labelling of natural language processing Expression is levied, the number of parameters of network model is reduced, training and the use time of network is accelerated;
(2)For the comprehensive front depth network structure and accelerating algorithm to label information of sequence labelling task;
(3)The depth algorithm for training network combined using Perceptron-style algorithms and Max-margin, that is, improved The effect of training, accelerates the calculating process of network parameter adjustment, again so as to the training for reducing network and secondary customization time;
(4)Suggestion using word or term vector dimension be 50 ~ 300, window size is 3 or the function of 9, non-linear layer is The network configuration of Sigmoid or hardTanh.
Invention effect
Model in sequence labelling method institute development system based on deep learning disclosed by the invention is using comprising with representative Property field sample training set study after, performance as shown in table 1 is reached on test set:
1. model of table marks Performance comparision
Table 1 also compares the performance of typical network model at present.Conv-S represents disclosed by the invention based on deep learning Sequence labelling method(To label condition before not considering), Conv-J represent combine before to label information web results.English Part of speech analysis is using accuracy index, and other three tasks use F1 indexs.F1 index calculating methods are 2PR/P + R), WhereinPFor accuracy rate,RFor recall rate.Comparison of the table 2 for each model operating speed, in addition to Conv-S, other models are listed relatively Multiple the time required to Conv-S models.With reference to Tables 1 and 2 as can be seen that Conv-J in various tasks, performance is most It is good, and Conv-S when in use between be greatly reduced in the case of, show very competitive performance.
Compare the time required to the mark of table 2.
Term is explained
Natural language processing:An important branch in computer science and artificial intelligence field, research can realize people with The various theoretical and methods of efficient communication are carried out between computer with natural language.Natural language processing is not usually to study Natural language, and be to develop the computer system that can be effectively realized natural language communication, software system particularly therein;
Sequence labelling:According to given tag set, for the sentence of input, exported by computer program and respectively constituted in sentence Unit(Such as:The word of the word or English of Chinese)Tag types.By taking Chinese word segmentation as an example, general employing includesBIEWithSFour Label, represents starting word, middle word, terminating word and individually into the word of word for word respectively.If " I likes computer for input.", Correct annotation results for "SB E B I E S”(Punctuation mark typically also serves as component units and treats), will sentence be divided into " I/like/computer/.”.
Description of the drawings
Fig. 1. the rapid serial mark depth network structure based on one-dimensional convolution.
Fig. 2. the comprehensive front sequence labelling depth network key local structural graph to label information.
Specific embodiment
The invention discloses a kind of method that employing computer carries out sequence labelling in natural language processing automatically.To input Sentence, according to the tag set of task definition, is each component units in sentence(Word or word)Select by its appearance order Corresponding tag types, this is referred to as sequence labelling task in natural language processing field.Sequence labelling task can be used for Chinese The various natural language processing tasks such as participle, English shallow parsing, Chinese and English part-of-speech tagging and name identification.Concrete steps It is as follows:
(1)Each component units to corresponding language(Such as:The word of the word or English of Chinese)Correspondence one vector representation, this to Amount expression can be generated at random or carry out pre-training using unsupervised method(Such as:English word can be adopted Word2Vector instruments [1], Chinese character can adopt the method described in list of references [2]), can be by searching after training The mode of vector table is by each cell translation into corresponding vector representation.
(2)The tag set of various sequence labelling tasks is defined, determines which every kind of sequence labelling task marked respectively including Sign.By taking Chinese word segmentation as an example, can adopt includesBIESTag set, represent the beginning word of word, middle word, knot respectively Beam word, the independent word into word.
(3)Prepare sequence in the natural language processings such as Chinese word segmentation, English shallow parsing, part-of-speech tagging, name identification The language material of row mark task.With Chinese word segmentation as row, previous column is word or punctuation mark in sentence, and rear string is corresponding mark Sign.In whole corpus, separated with a null between each sentence.
IS
HappinessB
VigorouslyE
MeterB
CalculateI
MachineE
S
(4)Network structure is marked using rapid serial(As shown in Figure 1)Or the comprehensive front network structure to label information(Such as Shown in Fig. 2), combined with Max-margin calculation using Perceptron-style algorithms or Perceptron-style algorithms Method is trained to network.
As using the rapid serial mark network structure and learning algorithm based on deep learning, its rapid serial marks network Structure is as shown in Figure 1.Specially:Each component units in sentence(Such as:The word of the word or English of Chinese)By searching vector The mode of table is converted into corresponding vector representation;The vector of each component units and the unit around which is spliced into window feature Matrix;Using one-dimensional convolution by window feature matrix conversion into window feature vector representation;Window feature vector is carried out successively Nonlinear transformation and linear transformation, export the dimension vector equal with task number of labels, and vectorial each element representation correspondence is marked The probability of label;Combination tag transition probability matrix, obtains a probability highest sequence label using Viterbi decoding algorithm As annotation results.
Concrete methods of realizing is:The label of one component units is typically related to its surrounding cells, thus network adopts window Mouth mold type, i.e., when estimating that active cell belongs to the probability of certain label, using the unit of this unit and surrounding as defeated Enter.If window size is arranged to 5, then it represents that using each two units of this unit and its left side and the right as input window. If the character quantity on the left side and the right is not enough to the size that window specifies, replaced using special filler.
Unit in each input sentence will be converted into corresponding vector representation by way of searching vector table, afterwards These vectors are spliced into into eigenmatrix, the columns of eigenmatrix is window size, each vector representation for being classified as corresponding unit. Then one-dimensional convolution algorithm is carried out to eigenmatrix, one-dimensional convolution is referred to joins accordingly for each row vector dot product of eigenmatrix Number vector(Convolution kernel), using different convolution kernels when different rows vector carries out dot product operations.It is in the presence of one-dimensional convolution, special Levy matrix conversion into unit vector dimension identical vector, the character representation of a certain window of the vector representation can regard as Active cell around under the influence of unit produced by semantic feature represent.
After being then passed through a linear net network layers(Middle hidden layer), carried out using Sigmoid or hardTanh functions non- Linear conversion, finally reuses a linear layer, exports the vector equal with task number of labels, vectorial each element representation The probability of corresponding label.
A sentence is given, with window slip from left to right, network can export a matrix, each in matrix Elementf θ t|i)Represent the in sentenceiIndividual unit belongs to labeltProbability estimation, whereinθRepresent the parameter of network. In sequence labelling task, due to there is very strong dependence between in front and back's label, matrix is introducedA ij Represent from labeliJump to mark SignjProbability(It is also contained in parameter setsθIt is interior).Given one containsnThe sentence of individual units [1:n], can be isometric for certain Sequence labelt [1:n]Carry out estimating point:
Scores [1:n], t [1:n],θ) =(Formula 1)
In the case where network parameter is given, we can obtain a score value highest label sequence using Viterbi decoding algorithm Row are used as annotation results.
The method of training is in training set, it is desirable to the maximum probability that the correct annotated sequence of each sample occurs:
(Formula 2)
Wherein:(s, t)A sample in expression training set.Training adopts gradient descent method, and all parameters of network are using following Formula is updated:
(Formula 3)
Wherein:λRepresent Learning Step.
In 3 right side local derviation of computing formula, in order to avoid Index for Calculation exceedes double-precision number span and calculates complicated Property higher problem, using Perceptron-style algorithms, i.e., only calculate the direction of parameter adjustment, and its size be all solid Definite value 1, so as to simplify the calculating process of parameter adjustment, accelerates training speed.Concrete calculating process is as follows:Current network is joined Several lower annotated sequences for comparing top score and correct annotated sequence, it is such as inconsistent, there is inconsistent position, setting causes The local derviation of the outgoing position of mistake annotated sequence is 1, and the outgoing position local derviation of the correct annotated sequence of correspondence is+1.Same Local derviation computational methods are also applied for transfer matrixA ij .In combination with using Max-margin methods when model parameter is trained, i.e., not The highest scoring of correct annotated sequence is required nothing more than, and requires that its score value and the difference of incorrect annotated sequence top score exceed rule Fixed threshold value.
If using comprehensive front network structure and accelerating algorithm to label information, Partial key network structure such as Fig. 2 institutes Show, wherein, each component units in sentence(Such as:The word of the word or English of Chinese)With each task label by search to The mode of scale is converted into corresponding vector representation;It is special that the vector of each component units and the unit around which is spliced into window Matrix is levied, then splicing is carried out from each possible label vector and window feature matrix one by one and produced containing the front window to label Eigenmatrix;To be represented into the characteristic vector of higher to the window feature matrix conversion of label containing front using one-dimensional convolution;It is right The high-rise vector representation carries out nonlinear transformation and linear transformation successively, exports the dimension vector equal with task number of labels, The probability of each element representation corresponding label of vector;Combination tag transition probability matrix, is obtained using Viterbi decoding algorithm One probability highest sequence label is used as annotation results.
Concrete methods of realizing is:By vector table, each label is also corresponded to into a vector representation, and will be each Individual possible label vector is carried out side by side with the eigenmatrix of current window, and it is corresponding to produce to carry out similar one-dimensional convolution afterwards Semantic feature represent(The front unit of hypothesis is certain label).Network can be that each of each sentence component units may Forward direction label export the vector equal with task number of labels, vectorial each element equally represents the possibility of corresponding label Property.Conjugative tiansfer matrixA ij , a score value highest sequence label is obtained as annotation results using Viterbi decoding algorithm.
Window semantics character representation under assuming to label before difference is calculated can be shared to intermediate result, so as to Accelerate network calculations speed.Specifically accelerated method is(Practical Calculation step is as shown in the numeral in Fig. 2):It is heavy which is calculated first Folded part, i.e., to intermediate result during label before not considering;Then calculate the part that different label vectors affect;Finally by Between result plus label affect part draw final calculation result.
(5)After newly-increased and extension language material, identical training algorithm can be adopted on training network parameter basis to parameter It is adjusted, or re -training network completely.Training method in detail is shown in step(4)It is described.
(6)After network training terminates, in the case where network parameter is given, one point is obtained using Viterbi decoding algorithm Result of the value highest sequence label as mark.
List of references
[1] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR abs/1301.3781, 2013.
[2] Xiaoqing Zheng, JiangtaoFeng, Mengxiao Lin, Wenqiang Zhang. Context- specific and multi-prototype character representations. In Proc. The Twenty- Fifth International Joint Conference on Artificial Intelligence (IJCAI’16), 2016.
[3] Ronan Collobert, Jason Weston, Léon Bottou, MichaelKarlen, KorayKavukcuoglu, and PavelKuksa.Natural language processing (almost) from scratch.Journal of Machine Learning Research, 12:2493–2537, 2011
[4] Xiaoqing Zheng, Hanyang Chen, and TianyuXu. Deep learning for Chinese word segmentation and postagging. In Proceedings of the International Conferenceon Empirical Methods in Natural Language Processing(EMNLP’13), 2013.
[5] Wenzhe Pei, Tao Ge, and Baobao Chang.Maxmargintensor neural network for chinsese word segmentation.In Proceedings of the 52nd Annual Meetingof the Association for Computational Linguistics(ACL’14), 2014.
[6] Pengfei Liu, ShafiqJoty, and Helen Heng.Finegrainedopinion mining with recurrent neural networksand word embeddings. In Proceedings of the InternationalConference on Empirical Methods in NaturalLanguage Processing (EMNLP’15), 2015。

Claims (1)

1. a kind of sequence labelling method in natural language processing based on deep learning, be with computer to read statement, According to the tag set of task definition, it is that each component units i.e. word or word in sentence selects corresponding by its appearance order Tag types;Characterized in that, concretely comprising the following steps:
(1)For each component units one vector representation of correspondence of corresponding language, the vector representation can generate at random or Pre-training is carried out using unsupervised method, after training, by each cell translation into phase by way of searching vector table The vector representation answered;
(2)The tag set of various sequence labelling tasks is defined, determines which label is every kind of sequence labelling task include respectively;
(3)Prepare sequence mark in the natural language processings such as Chinese word segmentation, English shallow parsing, part-of-speech tagging, name identification The language material of note task;
(4)Network structure or the comprehensive front network structure to label information are marked using rapid serial, using Perceptron- The style algorithms or Perceptron-style algorithms algorithm that combined with Max-margin is trained to network;
As carried out network training using the rapid serial mark network structure and learning algorithm based on deep learning, its rapid serial In mark network structure, the label of a component units is related to its surrounding cells, thus network adopts window model, that is, estimating When meter active cell belongs to the probability of certain label, using the unit of this unit and surrounding as input;If window is big It is little to be arranged to 5, then it represents that using each two units of this unit and its left side and the right as input window;If the left side and the right Character quantity be not enough to the size that window specifies, then replaced using special filler;
Unit in each input sentence is converted into corresponding vector representation by way of searching vector table;Each unit Expression generate or pre-training carried out using unsupervised method with random;Parameter in vector table is stored in also in training Constantly adjusted;These vectors are spliced into into eigenmatrix afterwards, the columns of eigenmatrix is window size, each to be classified as right Answer the vector representation of unit;
Then one-dimensional convolution algorithm is carried out to eigenmatrix, one-dimensional convolution refers to corresponding for each row vector dot product of eigenmatrix Parameter vector be convolution kernel, using different convolution kernels when different rows vector carries out dot product operations;In the effect of one-dimensional convolution Under, eigenmatrix is converted into and unit vector dimension identical vector, and the character representation of a certain window of the vector representation can be seen Into be active cell around under the influence of unit produced by semantic feature represent;
After being then passed through a linear net network layers, nonlinear conversion is carried out using Sigmoid or hardTanh functions, finally A linear layer is reused, the vector equal with task number of labels, the possibility of vectorial each element representation corresponding label is exported Property;
A sentence is given, with window slip from left to right, network exports a matrix, each element in matrixf θ t|i)Represent the in sentenceiIndividual unit belongs to labeltProbability estimation, whereinθRepresent the parameter of network;In sequence mark In note task, due to there is very strong dependence between in front and back's label, matrix is introducedA ij Represent from labeliJump to labelj's Probability;Given one containsnThe sentence of individual units [1:n], it is certain isometric sequence labelt [1:n]Carry out estimating point:
Scores [1:n],t [1:n],θ)=(Formula 1)
In the case where network parameter is given, a score value highest sequence label is obtained as mark using Viterbi decoding algorithm Note result;
The method of training is in training set, it is desirable to the maximum probability that the correct annotated sequence of each sample occurs:
(Formula 2)
Wherein:(s, t)A sample in expression training set;Training adopts gradient descent method, and all parameters of network are using following Formula is updated:
(Formula 3)
Wherein:λRepresent Learning Step;
In 3 right side local derviation of computing formula, using Perceptron-style algorithms, i.e., the direction of parameter adjustment is only calculated, And its size is all fixed value 1, concrete calculating process is as follows:Compare the annotated sequence and just of top score under current network parameter Really annotated sequence, such as inconsistent, and in the inconsistent position of generation, the local derviation for arranging the outgoing position for causing mistake annotated sequence is 1, and the outgoing position local derviation of the correct annotated sequence of correspondence is+1;Same local derviation computational methods are also applied for transfer matrixA ij ; In combination with using Max-margin methods when model parameter is trained, i.e., the highest scoring of correct annotated sequence is not required nothing more than, and And require that its score value exceedes the threshold value for specifying with the difference of incorrect annotated sequence top score;
If carrying out network training to the network structure and accelerating algorithm of label information before adopting comprehensively, concrete methods of realizing is:
By vector table, each label is also corresponded to into a vector representation, and by each possible label vector with The eigenmatrix of current window is carried out side by side, is carried out similar one-dimensional convolution afterwards and is represented producing corresponding semantic feature;Net Network is that each possible forward direction label of each sentence component units exports the vector equal with task number of labels, vector Each element equally represent the probability of corresponding label;Conjugative tiansfer matrixA ij , using Viterbi decoding algorithm, obtain one Score value highest sequence label is used as annotation results;
Window semantics character representation under assuming to label before difference is calculated can be shared to intermediate result, so as to accelerate Network calculations speed, concrete accelerated method is:The part of its overlap is calculated first, i.e., tie in the middle of during label before not considering Really;Then calculate the part that different label vectors affect;Intermediate result is drawn into final meter plus the part that label affects finally Calculate result;
(5)After newly-increased and extension language material, parameter is adjusted using identical training algorithm on training network parameter basis, Or complete re -training network;Concrete training method is shown in step(4)It is described;
(6)After network training terminates, in the case where network parameter is given, using Viterbi decoding algorithm, a score value is obtained Result of the highest sequence label as mark.
CN201610950893.5A 2016-10-25 2016-10-25 Sequence labeling method in natural language processing based on deep learning Active CN106547737B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610950893.5A CN106547737B (en) 2016-10-25 2016-10-25 Sequence labeling method in natural language processing based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610950893.5A CN106547737B (en) 2016-10-25 2016-10-25 Sequence labeling method in natural language processing based on deep learning

Publications (2)

Publication Number Publication Date
CN106547737A true CN106547737A (en) 2017-03-29
CN106547737B CN106547737B (en) 2020-05-12

Family

ID=58392799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610950893.5A Active CN106547737B (en) 2016-10-25 2016-10-25 Sequence labeling method in natural language processing based on deep learning

Country Status (1)

Country Link
CN (1) CN106547737B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273355A (en) * 2017-06-12 2017-10-20 大连理工大学 A kind of Chinese word vector generation method based on words joint training
CN107832302A (en) * 2017-11-22 2018-03-23 北京百度网讯科技有限公司 Participle processing method, device, mobile terminal and computer-readable recording medium
CN107832301A (en) * 2017-11-22 2018-03-23 北京百度网讯科技有限公司 Participle processing method, device, mobile terminal and computer-readable recording medium
CN107894971A (en) * 2017-10-27 2018-04-10 北京大学 A kind of expansible sequence labelling method based on neutral net
CN108009285A (en) * 2017-12-22 2018-05-08 重庆邮电大学 Forest Ecology man-machine interaction method based on natural language processing
CN108549628A (en) * 2018-03-16 2018-09-18 北京云知声信息技术有限公司 The punctuate device and method of streaming natural language information
CN109635157A (en) * 2018-10-30 2019-04-16 北京奇艺世纪科技有限公司 Model generating method, video searching method, device, terminal and storage medium
CN109976807A (en) * 2019-01-14 2019-07-05 浙江工商大学 A kind of critical packet recognition methods based on software operational network
CN110047463A (en) * 2019-01-31 2019-07-23 北京捷通华声科技股份有限公司 A kind of phoneme synthesizing method, device and electronic equipment
CN110232182A (en) * 2018-04-10 2019-09-13 蔚来汽车有限公司 Method for recognizing semantics, device and speech dialogue system
CN110245353A (en) * 2019-06-20 2019-09-17 腾讯科技(深圳)有限公司 Natural language representation method, device, equipment and storage medium
CN110399614A (en) * 2018-07-26 2019-11-01 北京京东尚科信息技术有限公司 System and method for the identification of true product word
CN110852386A (en) * 2019-11-13 2020-02-28 精硕科技(北京)股份有限公司 Data classification method and device, computer equipment and readable storage medium
WO2021017268A1 (en) * 2019-07-30 2021-02-04 平安科技(深圳)有限公司 Double-architecture-based sequence labeling method, device, and computer device
CN112989801A (en) * 2021-05-11 2021-06-18 华南师范大学 Sequence labeling method, device and equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2950306A1 (en) * 2014-05-29 2015-12-02 Samsung Electronics Polska Spolka z organiczona odpowiedzialnoscia A method and system for building a language model
US9298702B1 (en) * 2008-11-18 2016-03-29 Semantic Research Inc. Systems and methods for pairing of a semantic network and a natural language processing information extraction system
CN105512209A (en) * 2015-11-28 2016-04-20 大连理工大学 Biomedicine event trigger word identification method based on characteristic automatic learning
CN105894088A (en) * 2016-03-25 2016-08-24 苏州赫博特医疗信息科技有限公司 Medical information extraction system and method based on depth learning and distributed semantic features
CN105955953A (en) * 2016-05-03 2016-09-21 成都数联铭品科技有限公司 Word segmentation system
CN106022239A (en) * 2016-05-13 2016-10-12 电子科技大学 Multi-target tracking method based on recurrent neural network
CN106021227A (en) * 2016-05-16 2016-10-12 南京大学 State transition and neural network-based Chinese chunk parsing method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9298702B1 (en) * 2008-11-18 2016-03-29 Semantic Research Inc. Systems and methods for pairing of a semantic network and a natural language processing information extraction system
EP2950306A1 (en) * 2014-05-29 2015-12-02 Samsung Electronics Polska Spolka z organiczona odpowiedzialnoscia A method and system for building a language model
CN105512209A (en) * 2015-11-28 2016-04-20 大连理工大学 Biomedicine event trigger word identification method based on characteristic automatic learning
CN105894088A (en) * 2016-03-25 2016-08-24 苏州赫博特医疗信息科技有限公司 Medical information extraction system and method based on depth learning and distributed semantic features
CN105955953A (en) * 2016-05-03 2016-09-21 成都数联铭品科技有限公司 Word segmentation system
CN106022239A (en) * 2016-05-13 2016-10-12 电子科技大学 Multi-target tracking method based on recurrent neural network
CN106021227A (en) * 2016-05-16 2016-10-12 南京大学 State transition and neural network-based Chinese chunk parsing method

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273355A (en) * 2017-06-12 2017-10-20 大连理工大学 A kind of Chinese word vector generation method based on words joint training
CN107273355B (en) * 2017-06-12 2020-07-14 大连理工大学 Chinese word vector generation method based on word and phrase joint training
CN107894971B (en) * 2017-10-27 2019-11-26 北京大学 A kind of expansible sequence labelling method neural network based
CN107894971A (en) * 2017-10-27 2018-04-10 北京大学 A kind of expansible sequence labelling method based on neutral net
CN107832302A (en) * 2017-11-22 2018-03-23 北京百度网讯科技有限公司 Participle processing method, device, mobile terminal and computer-readable recording medium
CN107832301A (en) * 2017-11-22 2018-03-23 北京百度网讯科技有限公司 Participle processing method, device, mobile terminal and computer-readable recording medium
CN107832301B (en) * 2017-11-22 2021-09-17 北京百度网讯科技有限公司 Word segmentation processing method and device, mobile terminal and computer readable storage medium
CN108009285A (en) * 2017-12-22 2018-05-08 重庆邮电大学 Forest Ecology man-machine interaction method based on natural language processing
CN108549628A (en) * 2018-03-16 2018-09-18 北京云知声信息技术有限公司 The punctuate device and method of streaming natural language information
CN110232182B (en) * 2018-04-10 2023-05-16 蔚来控股有限公司 Semantic recognition method and device and voice dialogue system
CN110232182A (en) * 2018-04-10 2019-09-13 蔚来汽车有限公司 Method for recognizing semantics, device and speech dialogue system
CN110399614A (en) * 2018-07-26 2019-11-01 北京京东尚科信息技术有限公司 System and method for the identification of true product word
CN109635157A (en) * 2018-10-30 2019-04-16 北京奇艺世纪科技有限公司 Model generating method, video searching method, device, terminal and storage medium
CN109976807B (en) * 2019-01-14 2022-11-25 深圳游禧科技有限公司 Key package identification method based on software operation network
CN109976807A (en) * 2019-01-14 2019-07-05 浙江工商大学 A kind of critical packet recognition methods based on software operational network
CN110047463B (en) * 2019-01-31 2021-03-02 北京捷通华声科技股份有限公司 Voice synthesis method and device and electronic equipment
CN110047463A (en) * 2019-01-31 2019-07-23 北京捷通华声科技股份有限公司 A kind of phoneme synthesizing method, device and electronic equipment
CN110245353A (en) * 2019-06-20 2019-09-17 腾讯科技(深圳)有限公司 Natural language representation method, device, equipment and storage medium
CN110245353B (en) * 2019-06-20 2022-10-28 腾讯科技(深圳)有限公司 Natural language expression method, device, equipment and storage medium
WO2021017268A1 (en) * 2019-07-30 2021-02-04 平安科技(深圳)有限公司 Double-architecture-based sequence labeling method, device, and computer device
CN110852386A (en) * 2019-11-13 2020-02-28 精硕科技(北京)股份有限公司 Data classification method and device, computer equipment and readable storage medium
CN110852386B (en) * 2019-11-13 2023-05-02 北京秒针人工智能科技有限公司 Data classification method, apparatus, computer device and readable storage medium
CN112989801A (en) * 2021-05-11 2021-06-18 华南师范大学 Sequence labeling method, device and equipment
CN112989801B (en) * 2021-05-11 2021-08-13 华南师范大学 Sequence labeling method, device and equipment

Also Published As

Publication number Publication date
CN106547737B (en) 2020-05-12

Similar Documents

Publication Publication Date Title
CN106547737A (en) Based on the sequence labelling method in the natural language processing of deep learning
CN111612103B (en) Image description generation method, system and medium combined with abstract semantic representation
CN109783817B (en) Text semantic similarity calculation model based on deep reinforcement learning
CN107563498B (en) Image description method and system based on visual and semantic attention combined strategy
CN107480143B (en) Method and system for segmenting conversation topics based on context correlation
CN108446271B (en) Text emotion analysis method of convolutional neural network based on Chinese character component characteristics
CN111145728B (en) Speech recognition model training method, system, mobile terminal and storage medium
CN107526834B (en) Word2vec improvement method for training correlation factors of united parts of speech and word order
CN111709243B (en) Knowledge extraction method and device based on deep learning
CN110705294A (en) Named entity recognition model training method, named entity recognition method and device
CN109710744B (en) Data matching method, device, equipment and storage medium
JP7291183B2 (en) Methods, apparatus, devices, media, and program products for training models
CN109635124A (en) A kind of remote supervisory Relation extraction method of combination background knowledge
CN112990296B (en) Image-text matching model compression and acceleration method and system based on orthogonal similarity distillation
CN106547735A (en) The structure and using method of the dynamic word or word vector based on the context-aware of deep learning
CN110879938A (en) Text emotion classification method, device, equipment and storage medium
CN110826298B (en) Statement coding method used in intelligent auxiliary password-fixing system
CN113705237A (en) Relation extraction method and device fusing relation phrase knowledge and electronic equipment
CN112489622A (en) Method and system for recognizing voice content of multi-language continuous voice stream
CN114254645A (en) Artificial intelligence auxiliary writing system
CN110245353B (en) Natural language expression method, device, equipment and storage medium
CN113282721A (en) Visual question-answering method based on network structure search
CN110610006B (en) Morphological double-channel Chinese word embedding method based on strokes and fonts
CN109670171B (en) Word vector representation learning method based on word pair asymmetric co-occurrence
Fernandes et al. Entropy-guided feature generation for structured learning of Portuguese dependency parsing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant