CN110196980A

CN110196980A - A kind of field migration based on convolutional network in Chinese word segmentation task

Info

Publication number: CN110196980A
Application number: CN201910487638.5A
Authority: CN
Inventors: 李思; 李明正; 孙忆南; 徐雅静; 陈�光; 王蓬辉; 周欣雅
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2019-06-05
Filing date: 2019-06-05
Publication date: 2019-09-03
Anticipated expiration: 2039-06-05
Also published as: CN110196980B

Abstract

The present invention provides a kind of field moving methods based on convolutional network in Chinese word segmentation task, on the basis of the convolutional neural networks for Chinese word segmentation task, by in the maximization mean difference Maximum Mean Discrepancy method for being traditionally used for calculating different field distributional difference, introduce attention mechanism attention mechanism, so that during training neural network, attention mechanism, which can be obtained, migrates the more helpful sentence information of task for field, mean difference method will be maximized preferably to be introduced into sequence labelling task；Simultaneously, during calculating maximization mean difference, biggish weight is added to be capable of the sentence of positive migration, and it is the weight for not having helpful sentence or the sentence having a negative impact instead addition very little, realize more efficient field migration, the mark of artificial corpus is reduced, since mark corpus work bring is artificial and the pressure of time when alleviating natural language processing NLP research.

Description

A kind of field migration based on convolutional network in Chinese word segmentation task

Technical field

The present invention relates to Internet technical field more particularly to a kind of necks based on convolutional network in Chinese word segmentation task Domain migration.

Background technique

With technical development of computer, computer is calculated power and is gradually reinforced, and machine learning, depth learning technology further obtain Development, natural language processing are gradually applied to each scene, for example, using Text Classification film comment, shopping commodity User preference is excavated in comment, induction and conclusion is carried out to articles such as news using summarization generation technology, or passes through machine translation Technology realizes synchronous translation etc..A large amount of application scenarios need technology, while with the increase of domestic Internet user, producing Raw information is also more and more.For mass data, automatic processing text information more highlights its significance.Therefore, because Natural language processing technique can not replace and its for text-processing Ultra-High Efficiency, by social extensive concern.For state For interior, Chinese processing is closely bound up with us.Chinese Automatic Word Segmentation technology, as the background task of natural language processing, Its development is more crucial for other natural language processings.

Chinese word segmentation task is split i.e. by Chinese sentence or paragraph according to word so that higher from In right language processing tasks, word bring more information improving performance can be passed through for the processing of Chinese.In why wanting Text participle is because of in Modern Chinese, and one contains the word of specific meaning usually by two or more character representations, and It cannot be understood by simple Chinese character.It is different often to occur meaning of the same character in different terms in Chinese The case where.Therefore, it when carrying out other natural language processing tasks, needs and it is necessary to first carry out at participle to Chinese Reason.Especially for the natural language processing task of part-of-speech tagging, the relatively low layer of name Entity recognition etc., for word segmentation processing according to Lai Geng great.The accuracy of Chinese word segmentation will directly influence the superiority and inferiority of these mission performances.

Chinese word segmentation task passes through certain algorithm, handles computer automatically to Chinese language text, by word and word Segmentation.Conventional method for Chinese word segmentation includes Forward Maximum Method, reversed maximum matching algorithm, point that probability disambiguates is added Word algorithm, condition random field, structuring perceptron and maximum entropy model etc..Feedover mind in the deep learning method developed in recent years It is all applied to through network, shot and long term Memory Neural Networks, convolutional neural networks in Chinese word segmentation task and in several large sizes Higher accuracy rate is obtained on corpus.

Neural network method needs to utilize the data marked on a large scale.However existing large-scale corpus data contain only In terms of News Field, and in the large-scale corpus of patent, literature and medical domain almost without this also results in existing mind It is difficult to obtain higher accuracy rate on these fields through network technology.Therefore, the method for field migration in recent years is applied to In Chinese word segmentation task, it is intended to be lacked by helping to be promoted using existing extensive mark corpus not marking corpus or only have The Chinese word segmentation accuracy rate in amount mark corpus field.In the migration of field, there is the corpus marked on a large scale to be referred to as source domain Data, the corpus for not marking or only marking on a small quantity are referred to as target domain data.Meanwhile utilizing the target domain of no mark The field migration of data is referred to as unsupervised field migration, is referred to as using the field migration of the target domain data marked on a small quantity Semi-supervised field migration.

Be currently used for the field migrating technology of Chinese word segmentation, a part of method be based on dictionary, using trained word to Amount and word vector realize field migration；Another part method is directly modeled to transportable information by changing model, By extracting transportable characteristic information in extensive mark corpus, field migration is realized.

As shown in Figure 1, " the Learning Transferable Features with Deep of one of prior art In Adaptation Networks " article, mentions and being solved with depth adaptive network (Deep Adaptation Networks) In the field migration problem of picture classification:

Firstly, carrying out pre-training to network using the Image data set containing a large amount of image datas；Secondly, being led using source Domain labeled data either source domain labeled data and a small amount of target domain labeled data are finely adjusted network, preceding in fine tuning Three layers are fixed as the convolution layer parameter for extracting general features, contain the convolution of field special characteristic for latter two layers as extraction field Layer, is finely adjusted, and last three layers full articulamentum pass through MK-MMD (Multi-kernel Maximum Mean Discrepancy) It carries out adaptive.

As shown in Fig. 2, the two of the prior art " Neural Aggregation Network for Video Face Recognition " article proposes the technical side that recognition of face in video is solved the problems, such as by attention mechanism (attention) Case,

Firstly, each frame in video, which is obtained face characteristic by convolutional neural networks, to be indicated, wherein convolutional Neural net Network uses GoogLeNet, generates 128 dimensional features for each frame picture.Secondly, feature is as the input for paying attention to power module, input Into first attention power module, wherein pay attention to power module in can learning parameter q, and output aggregation features be expressed as follows institute Show:

e_k=q^Tf_kFormula (2-1)

Wherein f_kFor the feature of the CNN each frame picture extracted, q is the core that can learn, e_kFor the non-normalizing of each frame picture Change weight distribution.

Wherein a_kFor the distribution of each frame picture normalized weight.

R=∑_ka_kf_kFormula (2-3)

Wherein, r is unrelated with picture sequence, according to the aggregation features after the weighting of attention core.

Third proceeds immediately to second attention module by the aggregation features of an attention module, into The further characteristic aggregation of row, the learning to assess of second attention module are calculated as follows:

q¹=tanh (Wr⁰+ b) formula (2-4)

Wherein, W is weight matrix, and b is bias term, and the two is all the parameter that can learn, and tanh is that tanh is non-linear Function, r⁰Indicate the output of first attention module, q¹The core of second attention module, aggregation features r¹Meter The same formula of calculation process (2-1), formula (2-2) and formula (2-3).

Finally, realizing identification mission by mean comparisons' loss function training network.

Inventor has found in the course of the study: for " Learning Transferable Features with Deep Adaptation Networks”、“Neural Aggregation Network for Video Face Recognition " is in the prior art:

1, image recognition tasks are directed to, the sequence labelling of natural language processing is not suitable for, Chinese word segmentation is not suitable for and appoints Business；

2, only account for traditional MMD method, do not consider calculate MMD during, source domain it is that may be present uncorrelated or It is the sample for generating counter productive；

Following disadvantage exists in the prior art since above-mentioned technical problem results in:

1, it is bad to directly apply to Chinese word segmentation mission effectiveness for tradition MMD method；

2, due to the difference of source domain sample, it is difficult to during calculating MMD, only extract the sample beneficial to target domain This.

Summary of the invention

In order to solve the above-mentioned technical problems, the present invention provides a kind of necks based on convolutional network in Chinese word segmentation task Domain migration method, based on the improvement to traditional MMD method, the sequence labelling task that can be applied in natural language processing, Meanwhile by the way that attention module is added, so that model can be adaptively selected to moving to target data in MMD calculating Beneficial source domain sample inhibits noise, improves field and migrates the effect in Chinese word segmentation task.

The present invention provides a kind of field moving method based on convolutional network in Chinese word segmentation task, do not have in training When the target domain corpus data of mark, this method comprises:

Step 1: corpus is divided into source domain data and target domain data, by the field containing sentence negligible amounts Data, by recycling, add to identical as another field sentence quantity；

Step 2: the Chinese character of source domain and target domain using same dictionary, is mapped as vector expression, input Text, that is, numerical value to be segmented turn to each character vector and arrange the numerical matrix being formed by connecting；

Step 3: numerical matrix input feature vector convolutional layer is extracted to obtain source domain character representation and target domain feature It indicates；

Step 4: the source domain character representation input of extraction is paid attention to power module, weight vectors, weight vectors are calculated The source domain character representation that the source domain character representation that dot product is extracted is weighted；

Step 5: using the source domain character representation of weighting and the target domain character representation of extraction as calculating MMD Two input, obtain MMD calculated result；

Step 6: the prediction label that the source domain character representation input classification convolutional layer of extraction is obtained each character is general Rate；

Step 7: by the label probability of each character and true label probability input condition random field (Conditional Random Field, CRF), calculate likelihood probability；

Step 8: negative logarithm and MMD calculated result weighted sum of the loss function by likelihood probability, MMD calculated result is made For regular terms, is calculated by back-propagation algorithm (Back Propagation, BP) and update each layer weight of network.

Further, when in training containing the target domain corpus data marked on a small quantity, step 6 is replaced, is replaced It is as follows:

Step 6: the input of the target domain character representation by the source domain character representation of extraction and on a small quantity containing true tag Classification convolutional layer obtains the prediction label probability of each character.

Further, in non-training situation, when Chinese word segmentation, step 1 is replaced to step 8, is replaced as follows:

Step 1: the target domain data segmented will be needed as the input of neural network；

Step 2: the Chinese character of target domain data that will need to segment, using with dictionary identical in training process, It is mapped as vector expression；

Step 3: vector is indicated input feature vector convolutional layer, extraction obtains character representation；

Step 4: character representation, which is inputted classification convolutional layer, obtains the prediction label probability of each character；

Step 5: decoding prediction label probability input Viterbi (Viterbi) algorithm of each character to obtain optimal sequence Column complete participle.

Further, in the step 2, by the Chinese character of source domain and target domain, same dictionary, mapping are utilized For vector expression, comprising:

The mapping dictionary of random initializtion is that identical character random initializtion is identical dense using word embedding grammar Vector indicates, then each Chinese character of corpus data is mapped as dense vector expression by mapping dictionary；

Trained mapping dictionary utilizes bag of words Skip-Gram or Continuous Bag-of-Words (CBOW), training, which obtains the vector comprising certain word information, indicates, each Chinese character of corpus data is passed through mapping Dictionary, which is mapped as dense vector, to be indicated.

Further, in the step 3, numerical matrix input feature vector convolutional layer is extracted to obtain character representation, is calculated such as Under:

Wherein, m ∈ R^d×wThe convolution kernel for being w for window size, d is identical as the line number of input matrix x,Indicate convolution behaviour Make, x is the output of numerical matrix or upper layer feature convolutional layer, and b is bias term, and f is line rectification function (Rectified Linear Unit, ReLU), y is the vector that dimension is n, the feature that vector y, that is, feature convolutional layer extracts；

Wherein, line rectification function ReLU formula is as follows:

F (x)=max (0, x)

Wherein, the case where being vector or matrix for above-mentioned input, x are the element in vector or matrix.

Further, in the step 4, the source domain character representation input of extraction is paid attention into power module, power is calculated Weight vector, the source domain character representation that the source domain character representation that weight vectors dot product is extracted is weighted calculate as follows:

Wherein, k ∈ R^i×l×dFor weight matrix, i indicates the quantity of input neural network sentence, and l indicates that input sentence is fixed Length, d are characterized the dimension of expression, and ⊙ indicates that dot product, the i character representation that y is characterized convolutional layer extraction are formed by connecting, g table Show and the second of dot product result and third dimension be first subjected to average computation, then does softmax calculating,The source domain weighted Character representation；

Wherein, softmax is calculated, and is expressed as follows:

Wherein, x is vector, x_iFor i-th of element in vector.

Further, in the step 5, MMD calculation formula is as follows:

Wherein,Indicate MMD calculated result, p, q are respectively the distribution of two FIELD Datas, n_sIndicate the data of source domain Input sum, k () indicate gaussian kernel function, x^sAnd x^tRespectively indicate source domain weighted featureAnd target domain feature y；

Wherein, gaussian kernel function calculates as follows:

Wherein, x and z is two inputs of Gaussian kernel, and σ is Gaussian kernel bandwidth.

Further, in the step 6, by the prediction label probability for each character that classification convolutional layer extracts Calculating process is consistent with feature convolutional layer, but the ReLU nonlinear function f that feature convolutional layer uses is replaced with softmax meter It calculates.

Further, in the step 7, likelihood probability calculating process is as follows:

Wherein, S and y^*Input sentence and the truth sentence sequence label are respectively indicated, score is indicated each The prediction label probability of sentence is calculated by a transfer matrix for the prediction label probability of character, and y ' is all possible sentences Prediction label probability；

Wherein, score function calculates as follows:

Wherein, A_ijFor transfer matrix, s () is the label probability of the single character of neural network prediction, and n is sequence length.

Further, in the step 8, loss function calculates as follows:

Wherein, λ indicates the weight of regularization term MMD.

A kind of field moving method based on convolutional network in Chinese word segmentation task provided by the invention, will be traditional MMD method is applied in sequence labelling task, realizes one kind in Chinese word segmentation task, and the field of feature level migrates, side Help the application for having widened the field moving method in sequence labelling task in natural language processing；By increasing attention mechanism mould Block, enable model be adaptive selected for MMD calculate source domain sample, inhibit noise, realize feature level more Efficient field migration；By utilizing existing extensive labeled data, the Chinese word segmentation accuracy rate being lifted in small corpus is delayed The pressure of the artificial mark corpus of solution.

Detailed description of the invention

Fig. 1 is depth adaptation network (the Deep Adaptation Network) schematic diagram for solving image recognition tasks；

Fig. 2 is neural converging network (Neural Aggregation Network) schematic diagram；

Fig. 3 is the flow chart of embodiment one；

Fig. 4 is a kind of field moving method process based on convolutional network in Chinese word segmentation task provided by the invention Figure.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.Wherein, the abbreviation and Key Term occurred in the present embodiment is defined as follows:

BP:Back Propagation backpropagation；

The continuous bag of words of CBOW:Continuous Bag-of-Words；

CNN:Convolutional Neural Network convolutional neural networks；

CRF:Conditional Random Field condition random field；

CTB:Chinese Treebank Penn Chinese treebank；

LSTM:Long Short-Term Memory shot and long term Memory Neural Networks；

ME:Maximum Entropy maximum entropy model；

MMD:Maximum Mean Discrepancy maximizes mean difference；

MK-MMD:Multi-Kernel Maximum Mean Discrepancy multicore maximizes mean difference；

NLP:Natural Language Processing natural language processing；

NN:Neural Network neural network；

PKU:Corpus of Peking University Peking University opens corpus；

ReLU:Rectified Linear Unit line rectification function, is a kind of activation primitive.

Embodiment one

Referring to shown in Fig. 3,4, Fig. 3,4 show a kind of convolutional network that is based on provided by the invention in Chinese word segmentation task Field moving method, specifically, training in do not have mark target domain corpus data when, this method comprises:

Wherein, it is 128 that maximum sentence length is arranged in the present embodiment, using extensive mark source domain corpus be PKU with And CTB5 and CTB7, target domain corpus are patent, put to death celestial and medicine forum corpus；When training every time simultaneously, half sentence is Source domain data, the other half be then target domain data.

Further, trained mapping dictionary utilizes bag of words Skip-Gram or Continuous Bag-of- Words (CBOW), training, which obtains the vector comprising certain word information, to be indicated, each Chinese character of corpus data is passed through Mapping dictionary, which is mapped as dense vector, to be indicated, comprising:

Using the large amount of text information pre-training word vector of wikipedia；Building mapping dictionary all is not weighed by finding out then Multiple character is each character number, and the vector of each identical characters indicates identical, and the vector of kinds of characters indicates different, together When one vector of setting indicate all characters not occurred in training corpus set, for unknown character；In training network, Dropout mechanism is introduced, at random by a part of parameter zero setting.

Pre-training is carried out to word vector using Skip-Gram in the present embodiment；Map vector dimension is arranged in each word 200；This step realizes that character, which is mapped as not sparse vector, to be indicated, training corpus is carried out first by a mapping dictionary Traversal, find out all unduplicated characters, be each character number, it is assumed that share M character, then establish 200 rows (word to Amount mapping dimension is that 200), the matrix of M+1 column, the vector of each identical characters indicates identical, and the vector of kinds of characters indicates not Together, other than M character, also setting up a vector indicates all characters not occurred in training corpus set, for not Know character.In this step, invention introduces dropout mechanism at random sets a part of parameter in training network Zero, this avoid over-fitting and provide a kind of many different neuronal structures for effectively increasing indexation substantially In conjunction with method.

In the present embodiment, the dimension of each character feature extracted by feature convolutional layer is 200 dimensions, and convolution kernel size is set It is set to 3, the feature convolutional layer number of plies is set as 4, meanwhile, source domain and target domain character representation are calculated by shared convolutional layer It arrives.

In the present embodiment, the sentence number for inputting neural network is set as 16.

Further, in the step 5, MMD calculation formula is as follows:

Wherein,Indicate MMD calculated result, p, q are respectively the distribution of two FIELD Datas, n_sIndicate the data of source domain Input sum, k () indicate Gauss kernel method, x^sAnd x^tRespectively indicate source domain weighted featureAnd target domain feature y；

Further, in the step 6, by the prediction label probability for each character that classification convolutional layer extracts Calculating process is consistent with feature convolutional layer, but the ReLU nonlinear function f that feature convolutional layer uses is replaced with softmax meter It calculates；

In the present embodiment, there are four labels, including { B, M, E, S } for each character, wherein B indicates prefix word, M table Show that word in word, E indicate that suffix word, S indicate monosyllabic word；Therefore, it is 4 that classification convolutional layer, which exports the feature of each character,.

Wherein, S and y^*Input sentence and the truth sentence sequence label are respectively indicated, score is indicated each The prediction label probability of sentence is calculated by a transfer matrix for the prediction label probability of character, and y ' is all possible sentences Prediction label probability.

Step 8: negative logarithm and MMD calculated result weighted sum of the loss function by likelihood probability, MMD calculated result is made For regular terms, is calculated by back-propagation algorithm (Back Propagation, BP) and update each layer weight of network；

Further, in the step 8, loss function calculates as follows:

Wherein, λ indicates the weight of regularization term MMD calculated result.

One preferred embodiment, as shown in figure 3, each of sentence character is mapped as a dense vector first, to Amount dimension is n, and by convolution, extraction obtains the feature of each word in a word；In the training process, convolution is extracted to obtain Feature two parts are divided into source domain and target domain, source domain data characteristics is calculated every by paying attention to power module The weight of words；By source domain data characteristics and multiplied by weight, first input calculated as MMD；Target domain data are special Levy second input calculated as MMD；The numerical value of MMD is calculated multiplied by regularization term weight as loss letter in MMD module Several regularization terms；Source domain feature obtains the prediction probability of each label of each character by classification convolutional layer, passes through CRF calculates log-likelihood probability；Negative log-likelihood probability is added to obtain final damage with MMD regularization term as objective function Lose function；Model is to minimize loss function as target, by BP algorithm undated parameter；In non-training situation, by classification convolution The prediction probability for each character each label that layer obtains is directly over viterbi algorithm, calculates last sequence label, Complete participle.

The embodiment of the present invention one, which passes through, utilizes traditional MMD method, realizes in Chinese word segmentation task, feature level Field migration, extends the field moving method of Chinese word segmentation；It will notice that power module introduces during calculating MMD, enable to Model independently selects the source domain data sample beneficial for target domain data, inhibits the noise in the transition process of field, real Existing more efficient field migration, alleviates the pressure of extensive mark corpus, improves field migration in small-scale labeled data Accuracy rate on collection.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims

1. a kind of field moving method based on convolutional network in Chinese word segmentation task, which is characterized in that do not marked in training When the target domain corpus data of note, this method comprises:

Step 1: corpus is divided into source domain data and target domain data, by the number in the field containing sentence negligible amounts According to, by recycling, add to identical as another field sentence quantity；

Step 2: by the Chinese character of source domain and target domain, using same dictionary, be mapped as vector expression, input to Participle text, that is, numerical value turns to each character vector and arranges the numerical matrix being formed by connecting；

Step 3: numerical matrix input feature vector convolutional layer is extracted to obtain source domain character representation and target domain mark sheet Show；

Step 4: the source domain character representation input of extraction is paid attention to power module, weight vectors, weight vectors dot product is calculated The source domain character representation that the source domain character representation of extraction is weighted；

Step 5: using the source domain character representation of weighting and the target domain character representation of extraction as calculating the two of MMD A input obtains MMD calculated result；

Step 6: the source domain character representation input classification convolutional layer of extraction is obtained the prediction label probability of each character；

Step 8: negative logarithm and MMD calculated result weighted sum of the loss function by likelihood probability, MMD calculated result is as just Then item is calculated by back-propagation algorithm (Back Propagation, BP) and updates each layer weight of network.

2. the method as described in claim 1, which is characterized in that contain the target domain corpus data marked on a small quantity in training When, step 6 is replaced, is replaced as follows:

Step 6: the input classification of the target domain character representation by the source domain character representation of extraction and on a small quantity containing true tag Convolutional layer obtains the prediction label probability of each character.

3. the method as described in claim 1, which is characterized in that in non-training situation, when Chinese word segmentation, by step 1 to step Eight are replaced, and are replaced as follows:

Step 2: the Chinese character of target domain data that will need to segment, using with dictionary identical in training process, mapping For vector expression；

Step 5: decode prediction label probability input Viterbi (Viterbi) algorithm of each character to obtain optimal sequence, it is complete At participle.

4. the method as described in claim 1, which is characterized in that in the step 2, by the Chinese of source domain and target domain Character is mapped as vector expression using same dictionary, comprising:

The mapping dictionary of random initializtion is the identical dense vector of identical character random initializtion using word embedding grammar It indicates, then each Chinese character of corpus data is mapped as dense vector expression by mapping dictionary；

Trained mapping dictionary utilizes bag of words Skip-Gram or Continuous Bag-of-Words (CBOW), instruction Getting the vector comprising certain word information indicates, each Chinese character of corpus data is mapped as by mapping dictionary Dense vector indicates.

5. the method as described in claim 1, which is characterized in that in the step 3, by numerical matrix input feature vector convolutional layer Extraction obtains character representation, calculates as follows:

Wherein, m ∈ R^d×wThe convolution kernel for being w for window size, d is identical as the line number of input matrix x,Indicate convolution operation, x For the output of numerical matrix or upper layer feature convolutional layer, b is bias term, and f is line rectification function (Rectified Linear Unit, ReLU), y is the vector that dimension is n, the feature that vector y, that is, feature convolutional layer extracts.

6. the method as described in claim 1, which is characterized in that in the step 4, the source domain character representation of extraction is defeated Enter to pay attention to power module, weight vectors are calculated, the source neck that the source domain character representation that weight vectors dot product is extracted is weighted Characteristic of field indicates, calculates as follows:

Wherein, k ∈ R^i×l×dFor weight matrix, i indicates the quantity of input neural network sentence, and l indicates the fixed length of input sentence Degree, d are characterized the dimension of expression, and ⊙ indicates that dot product, the i character representation that y is characterized convolutional layer extraction are formed by connecting, and g is indicated The second of dot product result and third dimension are first subjected to average computation, then do softmax calculating,The source domain weighted is special Sign indicates.

7. the method as described in claim 1, which is characterized in that in the step 5, MMD calculation formula is as follows:

Wherein,Indicate MMD calculated result, p, q are respectively the distribution of two FIELD Datas, n_sIndicate the data input of source domain Sum, k () indicate gaussian kernel function, x^sAnd x^tRespectively indicate source domain weighted featureAnd target domain feature y.

8. the method as described in claim 1, which is characterized in that in the step 6, extracted by convolutional layer of classifying every The prediction label probability calculation process of one character is consistent with feature convolutional layer, but the ReLU that feature convolutional layer is used is non-linear Function f replaces with softmax calculating.

9. the method as described in claim 1, which is characterized in that in the step 7, likelihood probability calculating process is as follows:

Wherein, S and y^*Input sentence and the truth sentence sequence label are respectively indicated, score is indicated each character The prediction label probability of sentence is calculated by a transfer matrix for prediction label probability, and y ' is the prediction of all possible sentences Label probability.

10. the method as described in claim 1, which is characterized in that in the step 8, loss function calculates as follows:

Wherein, λ indicates the weight of regularization term MMD.