CN110008467A

CN110008467A - A kind of interdependent syntactic analysis method of Burmese based on transfer learning

Info

Publication number: CN110008467A
Application number: CN201910158572.5A
Authority: CN
Inventors: 毛存礼; 马文举; 余正涛; 高盛详; 王振晗; 林颂凯
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2019-03-04
Filing date: 2019-03-04
Publication date: 2019-07-12

Abstract

The present invention relates to a kind of interdependent syntactic analysis methods of the Burmese based on transfer learning, belong to natural language processing field.First with the interdependent syntactic analysis model of the interdependent syntactic analysis corpus training English of resourceful language English, secondly on the interdependent syntactic analysis corpus of Burmese on the basis of trained English interdependent syntactic analysis model using ideological trends network parameter to the low-resource of transfer learning, the interdependent syntactic analysis corpus of Burmese low quality is eventually adding to obtain the interdependent syntactic analysis model of Burmese to model tuning.This method can effectively promote the performance of low-resource language dependency syntactic analysis.

Description

A kind of interdependent syntactic analysis method of Burmese based on transfer learning

Technical field

The present invention relates to a kind of interdependent syntactic analysis methods of the Burmese based on transfer learning, belong to natural language processing neck Domain.

Background technique

Syntactic analysis is intended to convert graph structure (usually according to certain grammer system from the sequence form of word for sentence Tree construction), it is one of key problem of natural language processing to portray the syntactic relation (Subject, Predicate and Object etc.) inside sentence.Effectively Support multiple Tasks such as information extraction, sentiment analysis, machine translation.

The interdependent syntactic analysis method of mainstream is divided into based on figure (Graph-based) and based on transfer at present (Transitionbased) two class: interdependent syntactic analysis is regarded as based on the method for figure and finds maximum life from complete directed graph The problem of at tree, the side in figure indicate a possibility that there are certain syntactic relations between two words；Method based on transfer passes through A series of transfer actions such as shift-ins, specification construct an interdependent syntax tree, and the target of study is to find optimal action sequence.With base It is compared in the method for figure, the algorithm complexity based on transfer is lower, therefore has higher analysis efficiency, simultaneously because energy Using richer feature, it is also suitable with the method based on figure to analyze accuracy rate.

Traditional method indicates state by extracting the feature of Manual definitions a series of, that is, the foundation classified, such as stack top Word, part of speech, first word, part of speech in caching, the most left or most right word of generating portion dependency tree, these cores that are otherwise known as Heart feature.In order to improve the precision of classification, it is also necessary to the various assemblage characteristics of Manual definition.Such as Zhang and Nivre (2011) Once the feature and feature gang form of register optimization were given, 20 kinds of core features, 72 kinds of assemblage characteristics are shared.

Depth learning technology is successfully applied to interdependent sentence earliest by the Chen and Manning (2014) of Stanford University Method analysis.What the nonlinear activation function in deep learning can also imply achievees the effect that feature combines in conventional method, from And cumbersome assemblage characteristic design is avoided, it finally achieves and the comparable accuracy rate of conventional method.At the same time, this method by This extremely time-consuming operation is combined in not needing explicitly to carry out feature, is accompanied by precomputation even depth study computing technique, most The speed of interdependent syntactic analysis is also greatly accelerated eventually.

These methods are all to have the method for supervision, require the corpus of mark high quality, but Burmese belongs to low-resource language Speech, without available labeled data, artificial mark cost is too high；Existing some ways for low-resource language are using bilingual The interdependent treebank of the method construct low-resource of mapping, but the dependency tree inventory that such method obtains is in very big noise (syntax It has differences, word presence can not match)；For low-resource natural language processing, transfer learning obtains preferable effect；Therefore It is proposed that the interdependent syntactic analysis method of Burmese based on transfer learning, the mould of benchmark is trained using English resources abundant Type, and on the basis of this model, the corpus of some interdependent syntactic analyses of low quality Burmese is added to train Burmese interdependent Syntactic analysis model.

Summary of the invention

It is low for promoting low-resource the present invention provides a kind of interdependent syntactic analysis method of the Burmese based on transfer learning The training effect of the model of the interdependent syntactic analysis of the Burmese of quality.

The technical scheme is that obtaining Burmese interdependent syntax first with English-Burmese mapping thinking Corpus is analyzed, Burmese corpus and English corpus are mapped to same semantic space；Secondly it is trained using bilingual term vector The interdependent syntactic analysis model of English, and using the thought of transfer learning, Burmese interdependent syntactic analysis corpus is added, adjusts The parameter of model obtains the model of the interdependent syntactic analysis of Burma.

Specific step is as follows:

Step1, the interdependent syntactic analysis corpus of Burmese is constructed using the Burmese bilingual parallel corporas of English-；

Step1.1, respectively in bilingual teaching mode English corpus and Burma's corpus segment；

Step1.2, the Burmese word corresponding relationship of English-is established；

Step1.3, using Stamford neural network training model to English corpus do interdependent syntactic analysis obtain English according to Deposit syntactic analysis corpus；

Step1.4, using the word corresponding relationship in step Step1.2, the interdependent syntactic analysis of English corpus is mapped to The interdependent syntactic analysis corpus of Burmese is arrived on Burmese corpus；

Step2, Burmese corpus and English corpus are mapped to same semantic space；Find a matrix of a linear transformation W Realize arg min ∑_i||X_iW-Z_i||², wherein X_iAnd Z_iRespectively indicate the Burmese corpus and English corpus of the insertion of i-th of entry Word, realized by matrix of a linear transformation W and Burmese corpus and English corpus be mapped to same semantic space, then used Input of the term vector of the same semantic space obtained as interdependent syntactic analysis model；

Step2.1, Orthonormality constraints mapping matrix is utilized

During minimizing space length, matrix W can go excessively to be fitted it Mikolov et al. for fitting data Middle a part of data lose other features in order to enable matrix W to learn global information, rather than for a portion. Required list language invariance carrys out the dot product after reserved mapping, and Burmese term vector quality after mapping is avoided to decline.Increase in the map It is orthogonal matrix, as (W that regular terms, which wherein requires W,^TW=I).Orthogonality enhances the of overall importance of W matrix, therefore can be more preferable The bilingual mapping of study, constraint W is orthogonal matrix, can effectively constrain the study of W, so that W is learnt entire information, rather than its In primary comparison.

The length normalization method of Step2.2, maximum cosine

Bilingual term vector mapping is by the remote bilingual term vector of above-mentioned English, it cannot be guaranteed that each dimension of vector is big It is small, when there is a dimension particularly large or small, exception is had during calculating, the appearance of the problem is because not having Each dimension that a vector is calculated with equal extent needs dimension is unified into the same section.It will be macaronic Word insertion standardization is that unit vector can guarantee that all trained examples are identical to the contribution of optimization aim.As long as W be it is orthogonal, This is equivalent to the sum of the cosine similarity for maximizing dictionary entry, this is commonly used in the calculating of similarity:

The use of last regular terms can make in bilingual dictionary corresponding process, will not make every for the correspondence of some words The overall situation is all taken into account during secondary study transfer matrix, learns the full dose information in Burmese and English.

Step2.3, iterative algorithm update bilingual dictionary

Expand dictionary by the method for iteration, iteratively solves each corresponding relationship, by corresponding relationship matrix W, find word Not corresponding word is expanded in allusion quotation, is calculated new W matrix next time, obtains dictionary after update.During corresponding, such as The remote bilingual term vector similarity of the fruit Chinese is lower than 80%, would not correspond to, the bilingual vector for only meeting the condition just will be considered that It is corresponding, it is added in bilingual dictionary, is trained next time, until after certain number is trained, bilingual dictionary is not increasing, Think that model convergence terminates trained or super more certain numbers and just stops.Because not right in dictionary during self study The word answered should be labeled as 0, and the word being matched to is labeled as 1, introduce a new matrix D, and when Dij=1 is exactly the remote bilingual word of the Chinese Remittance successful match, is exactly no successful match if it is 0.Then, target is to find optimal mapping matrix W, so that map source is embedding The quadratic sum for entering the Euclidean distance between the target insertion Zj of X i W and dictionary entry D ij minimizes:

Step3, interdependent syntax is trained under the neural network of Stamford using the interdependent syntactic analysis corpus of English in Step1 Analysis model；

The input vector and the English for having marked dependence that resolver neural metwork training in Stamford needs are provided first The interdependent syntactic analysis corpus of text, input vector is respectively that term vector, POS label vector and feature tag vector, term vector are The term vector of same semantic space, the vector that POS label vector and feature tag vector are randomly generated are obtained in Step2；It is hidden Layer is hidden to useTo train x^w, x^t, x^lRespectively indicate hidden layer Term vector, pos label vector and feature tag vector,Respectively indicate term vector, the pos label of hidden layer The corresponding weight matrix of vector sum feature tag vector, b₁Indicate biasing；The activation primitive that output layer uses: p=softmax (W₂H), W₂Indicate weight.

Step3.1、[x^w,x^t,x^l] input as model；WhereinIt is a d The term vector of dimension；x^t,x^lWhat is respectively indicated is POS label vector and feature tag vector；

The activation primitive of Step3.2, model hidden layer:

WhereinAs biasing；d_hIt is hidden layer Node number；

The function of Step3.3, output layer:

P=softmax (W₂h)

It is therein

Step4, optimize the training acquisition interdependent syntactic analysis model of Burmese using transfer learning shared parameter, it is specially sharp With the Burmese of the identical semantic space obtained in the interdependent syntactic analysis corpus of the Burmese obtained in Step1 step and Step2 Term vector adjust the parameter of the interdependent syntactic analysis model of English that Step3 is obtained, and then training obtains the interdependent sentence of Burmese Method analysis model.

Beneficial effects of the present invention:

The present invention obtains the interdependent syntactic analysis corpus of Burmese using mapping, and utilizes the ideological trends parameter of transfer learning The effect for optimizing the interdependent syntactic analysis model of Burmese, effectively improves the performance of the interdependent syntactic analysis of Burmese.

Detailed description of the invention

Fig. 1 is the flow chart in the present invention；

Fig. 2 is model training figure in the present invention；

Fig. 3 is the interdependent syntactic analysis model training figure that network parameter is shared in the present invention.

Specific embodiment

The present invention will be further explained below with reference to the attached drawings and specific examples.

Embodiment 1: a kind of interdependent syntactic analysis method of Burmese based on transfer learning, process is as shown in Figure 1, first By using the Burmese interdependent syntax of remote parallel treebank (English-Burmese the is aligned sentence pair 20028) building of Asian language English Corpus is analyzed, using pairs of English-Burma word to the interdependent syntactic analysis relationship map of the English that will have been had been built up to Burmese On sentence, the interdependent syntactic analysis corpus of the low-quality Burmese mapped totally 1799, low-quality 17688；In total 20028 Item；

Burmese and English are mapped to same semantic space by matrix of a linear transformation W realization, then using acquisition Input of the term vector of same semantic space as the term vector of the model of interdependent syntactic analysis.Use the interdependent sentence of English above-mentioned It is as shown in Figure 2 that method analyzes the corpus interdependent syntactic analysis model of training, specific training process under the neural network of Stamford.

Using the interdependent syntactic analysis corpus of the low quality Burmese of acquisition, and the identical semantic space that obtains is Burmese Term vector is adjusted the parameter of interdependent syntactic analysis model to obtain the model of the interdependent syntactic analysis of Burmese.

What Fig. 3 was indicated is the neural network model figure of shared parameter, and Source Data and Target Data are English respectively The interdependent syntactic analysis corpus of language (source corpus) and the interdependent syntactic analysis corpus of Burmese (target corpus).

Wherein the influence in the term vector of different dimensions to model training is different, and the results are shown in Table 1.

Influence of the term vector of 1 different dimensions of table to model training

As shown in Table 1, experimental result shows the effect that model can be improved by the dimension for increasing term vector, and UAS is promoted About two percentage points, LAS promotes about one percentage point；

This method and other two methods are compared in the present embodiment, specific comparison is with Stamford neural network model On the basis of be embedded in word2vec term vector training the interdependent syntactic analysis model of Burmese effect and be embedded in identical semanteme The term vector in space trains the term vector instruction of the modelling effect come and the thought that transfer learning is added and identical semantic space The modelling effect come is practised, the results are shown in Table 2, and experimental result shows that the effect that this law obtains is neural better than common Stamford Network training model, UAS promote about half percentage point, and LAS promotes about one percentage point；

2 distinct methods Comparative result of table

Above in conjunction with attached drawing, the embodiment of the present invention is explained in detail, but the present invention is not limited to above-mentioned Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept Put that various changes can be made.

Claims

1. a kind of interdependent syntactic analysis method of Burmese based on transfer learning, it is characterised in that: first with English-Burmese The thinking of mapping obtains Burmese interdependent syntactic analysis corpus, Burmese corpus and English corpus is mapped to same semantic empty Between；Secondly the interdependent syntactic analysis model of English is trained using bilingual term vector, and using the thought of transfer learning, is added Burmese interdependent syntactic analysis corpus, the parameter for adjusting model obtain the model of the interdependent syntactic analysis of Burma.

2. the interdependent syntactic analysis method of the Burmese according to claim 1 based on transfer learning, it is characterised in that: described Specific step is as follows for the interdependent syntactic analysis method of Burmese based on transfer learning:

Step1.3, it interdependent syntactic analysis is done to English corpus using Stamford neural network training model obtains the interdependent sentence of English Method analyzes corpus；

Step1.4, using the word corresponding relationship in step Step1.2, the interdependent syntactic analysis of English corpus is mapped to Burmese The interdependent syntactic analysis corpus of Burmese is arrived on corpus；

Step2, Burmese corpus and English corpus are mapped to same semantic space；It is real to find a matrix of a linear transformation W It is existing^argmin∑_i||X_iW-Z_i||², wherein X_iAnd Z_iThe Burmese corpus of i-th of entry insertion and the word of English corpus are respectively indicated, It is realized by matrix of a linear transformation W and Burmese corpus and English corpus is mapped to same semantic space, then using acquisition Same semantic space input of the term vector as interdependent syntactic analysis model；

Step3, the interdependent syntactic analysis of training under the neural network of Stamford using the interdependent syntactic analysis corpus of English in Step1 Model；

First provide Stamford resolver neural metwork training need input vector and marked dependence English according to Syntactic analysis corpus is deposited, input vector is respectively term vector, POS label vector and feature tag vector, and term vector is in Step2 Obtain the term vector of same semantic space, the vector that POS label vector and feature tag vector are randomly generated；Hidden layer uses H=(W₁ ^wx^w+W₁ ^tx^t+W₁ ^lx^l+b₁)³, Lai Xunlian [W₁ ^w, W₁ ^t, W₁ ^l], x^w, x^t, x^lRespectively indicate term vector, the pos mark of hidden layer Sign vector sum feature tag vector, W₁ ^w, W₁ ^t, W₁ ^lRespectively indicate term vector, pos label vector and the feature tag of hidden layer to Measure corresponding weight matrix, b₁Indicate biasing；The activation primitive that output layer uses: p=softmax (W₂H), W₂Indicate weight.

Step4, optimize the training acquisition interdependent syntactic analysis model of Burmese using transfer learning shared parameter, specially utilize The identical semantic space obtained in the interdependent syntactic analysis corpus of the Burmese obtained in Step1 step and Step2 it is Burmese Term vector adjusts the parameter of the interdependent syntactic analysis model of English that Step3 is obtained, and then training obtains the interdependent syntax of Burmese Analysis model.