CN115935959A - Method for labeling low-resource glue word sequence - Google Patents

Method for labeling low-resource glue word sequence Download PDF

Info

Publication number
CN115935959A
CN115935959A CN202211612122.7A CN202211612122A CN115935959A CN 115935959 A CN115935959 A CN 115935959A CN 202211612122 A CN202211612122 A CN 202211612122A CN 115935959 A CN115935959 A CN 115935959A
Authority
CN
China
Prior art keywords
word
representation
language
character
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211612122.7A
Other languages
Chinese (zh)
Inventor
刘畅
哈里旦木·阿布都克里木
阿布都克力木·阿布力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinjiang University Of Finance & Economics
Original Assignee
Xinjiang University Of Finance & Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinjiang University Of Finance & Economics filed Critical Xinjiang University Of Finance & Economics
Priority to CN202211612122.7A priority Critical patent/CN115935959A/en
Publication of CN115935959A publication Critical patent/CN115935959A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a method for labeling low-resource glue language sequences, which comprises the steps of firstly carrying out enhancement processing on a data set of a target language, then carrying out primary extraction on multi-language features of a target text, and finally further extracting the multi-language features of the target text by utilizing a W2NER framework and carrying out prediction to generate a prediction tag sequence. The method can quickly obtain the information such as morphemes, parts of speech, time, places, people and the like, remarkably reduces the classification time and reduces the labor cost. The invention provides effective technical support for sequence labeling tasks such as form segmentation, named entity recognition, part of speech labeling and the like of middle and sub low-resource adhesive words such as Uygur language, kazakstan language, korkinj and the like, is fully applied to the fields of form and part of speech analysis and information extraction of the middle and sub low-resource adhesive words, is applied to downstream tasks such as machine translation, question-answering system, emotion analysis and the like, and has reference and popularization significance for sequence labeling tasks of other low-resource languages in China.

Description

Method for labeling low-resource glue word sequence
Technical Field
The invention relates to a sequence labeling method in the field of natural language processing, in particular to a sequence labeling method suitable for the field of minority language information processing.
Background
Sequence labeling is a complex natural language understanding task, and a target tag sequence of a sentence needs to be predicted to achieve the purposes of classifying words or morphemes, extracting text information and the like. Sequence labeling methods are mainly divided into three methods, namely a rule/dictionary-based method, a statistical-based method and a deep learning-based method, and the existing method mainly focuses on deep learning.
mBERT is a multilingual version of BERT (Bidirectional Encoder responses from Transformers), primarily used for multilingual natural language understanding tasks. mBERT is pre-trained on 104 languages, where multiple languages are represented in the same semantic space. BERT is based on the encoder portion of the transform architecture, and pre-trains tasks using Mask Language Modeling (MLM) and Next Sentence Prediction (NSP). MLM is a self-supervised method that learns semantic information by replacing original words with random words or [ MASK ] labels. The NSP mechanism is used to evaluate whether two input sentences are adjacent, but under the condition of massive pre-training texts, the performance of the model in most understanding tasks is reduced. mBERT effectively utilizes multi-language features, can process multiple languages simultaneously, and has good effect on low-resource language understanding tasks.
XLM-R (Cross-Language Model-RoBERTA) is mainly based on the RoBERTA Model, adopts larger-scale multi-Language corpus, contains more multi-Language features, removes NSP mechanism and is obviously superior to mBERT in multi-Language downstream tasks. However, the low-resource glue words occupy a lower proportion in the pre-training corpus of the XLM-R, and the characteristic extraction of the XLM-R on the low-resource glue words is limited.
The CINO (Chinese Language entity Pre-trained Language Model) performs secondary Pre-training on Chinese mINOrity Language corpora (8 languages such as Uygur Language, kazakh and the like) based on XLM-R, so that the comprehension capability of the Model on low-resource adhesive words is effectively improved, but the morphemes or the internal relations of word sequences are not fully extracted.
W 2 The NER (unified Named Entity Recognition as Word-Word relationship) is a framework for unifying multiple Named Entity Recognition task types, further pays attention to the adjacent relation among Entity words, achieves the optimal level on multiple English-Chinese Named Entity Recognition evaluation standards, but lacks multi-language features and has higher requirements on data volume.
Generally, the existing sequence labeling work of low-resource glue words rarely considers the relationship between characters or words, and due to the lack of labels, the sequence labeling task is more difficult for syncope words with less data resources and complex grammar, such as Uygur language.
Disclosure of Invention
In order to alleviate the problem of data shortage of low-resource glue words and fully consider the relation between characters or words, the method is suitable for the task of labeling the low-resource glue word sequence.
In order to realize the purpose of the invention, the invention specifically adopts the following technical scheme:
a method for labeling low-resource glue word sequences, which is characterized by comprising the following steps:
enhancing the data, and adding training set data of other similar languages of the same task into a training set of the deep learning model;
primarily extracting multilingual features of a target text by utilizing a Chinese mINOrity Language Pre-training Language Model (CINO);
using W 2 NER framework further extracts objectsAnd predicting the multi-language characteristics of the text, acquiring the relation between characters or words, and generating a prediction tag sequence.
Preferably, the Chinese minority language pre-training language model is only used for character representation or word representation, model parameters are not adjusted, and the Chinese minority language pre-training language model is used for preliminarily extracting multi-language features, wherein the method comprises the following steps:
segmenting the target text using the Sennce Piece toolkit, representing each input Sentence by a plurality of tokens, each token being mapped to a real number vector according to the generated dictionary, and then, each input Sentence X = { X = 1 ,x 2 ,...,x N Get preliminary multilingual character representations or word representations through a max-pooling mechanism, where x i N, N represents the number of characters or words of the input sentence, i =1, 2.
Preferably, using W 2 The NER framework understands the relationship between characters and characters or words and generates a sequence of predictive labels, including:
acquiring context information from an input sentence by adopting a Bidirectional Long Short-Term Memory (Bi-LSTM) network to obtain final multi-language character representation or word representation;
transmitting the character representation or the word representation into a convolutional Layer, generating a character pair representation or a word pair representation through Conditional Layer Normalization (CLN), BERT grid representation, multi-Layer Perceptron (MLP) and Multi-Granularity extended Convolution (Multi-Granularity extended Convolution), and further generating and optimizing the character pair grid representation or the word pair grid representation for subsequent relation classification between characters and characters or words;
preferentially selecting relationship scores from different angles by adopting a combined prediction layer consisting of a Biaffine prediction layer and an MLP prediction layer, and performing combined reasoning on the relationships contained in all character pairs or word pairs;
and taking the relation between characters or words as a directed graph, decoding by adopting NNW and THW mechanisms to obtain a prediction probability, and finally obtaining a prediction label sequence according to the prediction probability.
Preferably, the conditional layer normalization extends the character representation or word representation from 2 dimensions to 3 dimensions, resulting in a character-to-lattice representation or word-to-lattice representation.
Preferably, the BERT grid representation connects the character pair grid representation information or the word pair grid representation information, the character pair position information or the word pair position information and the grid area information as BERT input representation to obtain a position-area perception representation of the grid, and then reduces dimensionality by the multi-layer perceptron to enrich the character pair grid representation or the word pair grid representation.
Preferably, the multi-granularity expansion convolution captures interactions between characters or words of different distances.
Preferably, the input of the Biaffine prediction layer is from a two-way long-short term memory network, and the output of the Biaffine prediction layer is a character pair or word pair relation score calculated by a classifier; the input of the MLP prediction layer is from the character pair grid representation or the word pair grid representation of the convolution layer, and the character pair or the word pair relation score is calculated by utilizing a multilayer perceptron; and adding the outputs of the Biaffine prediction layer and the MLP prediction layer into a Softmax function, and calculating to obtain a final result of the character pair or word pair relation score.
The invention provides a method for labeling low-resource adhesive language sequence, which comprises the steps of firstly, carrying out enhancement processing on a data set of a target language, providing more multi-language features, learning additional semantic information through similarity among languages, relieving the problems of insufficient labeled data of the target language and the like, and relieving the problem of insufficient labeled data of a downstream W 2 The NER frame parameters are large and difficult to train; then, CINO is utilized to carry out multi-language characteristic preliminary extraction on a target text to obtain more accurate character representation or word representation of the target language and data enhancement language, the CINO is only used for character representation or word representation, model parameters are not adjusted, the calculation force requirement is reduced, and the problems of negative migration among multiple languages (namely, the problem of cursing the multi-language) and forgetting of low-resource languages can be relieved; finally using W 2 The NER framework further extracts and predicts the multi-language features of the target text, deeply excavates the complex relation between low-resource glue language characters and characters or words and words,a predicted tag sequence is generated. The three parts of the invention are mutually matched and are advanced layer by layer, massive unstructured low-resource adhesive language text data are converted into structured data, characters, words and phrases of a target text are effectively classified, information such as morphemes, parts of speech, time, places, figures and the like is rapidly obtained, the classification time can be reduced, and the labor cost is reduced. The invention provides effective technical support for sequence labeling tasks such as form segmentation, named entity recognition, part of speech labeling and the like of middle and sub low-resource adhesive words such as Uygur language, kazakstan language, korkinj and the like, is fully applied to the fields of form and part of speech analysis and information extraction of the middle and sub low-resource adhesive words, is applied to downstream tasks such as machine translation, question-answering system, emotion analysis and the like, and has reference and popularization significance for sequence labeling tasks of other low-resource languages in China.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a flowchart illustrating a method for labeling low-resource glue sequences according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a deep learning model of a low-resource glue sequence labeling method according to a second embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples, it being understood that the specific examples described herein are for the purpose of illustration and description only and are not intended to limit the invention to the examples described below.
Example one
The present embodiment provides a method for labeling low-resource glue sequences, the main flow of which is shown in fig. 1, and the method includes:
and performing enhancement processing on the data, and adding training set data of other similar languages of the same task into a training set of the deep learning model.
And preliminarily extracting the multilingual features of the target text by utilizing a Chinese minority language pre-training language model CINO.
By means of W 2 NER framework further extracts target textAnd predicting the multi-language characteristics to obtain the relation between characters or words and generate a predicted label sequence.
The above steps are further described in detail below:
1. data enhancement
And adding the training set data of other similar languages of the same task into the training set of the deep learning model, and expanding the data set data of the deep learning model to realize the enhancement processing of the data. Data enhancement is beneficial to providing more multilingual features under the condition of low resources, extra semantic information is learned through similarity among languages, the problems of insufficient labeled data of a target language and the like can be solved, and therefore the problems of low downstream W (word-by-word) are solved 2 The NER frame parameters are large in quantity and difficult to train.
2. CINO preliminary extraction of multi-language features
Segmenting the target text by using a Sendence Piece toolkit, representing each input Sentence by a plurality of tokens, mapping each token to a real number vector according to a generated dictionary, and then enabling each input Sentence X = { X = { X = 1 ,x 2 ,...,x N Get preliminary multilingual character representations or word representations through a max-pooling mechanism, where x i N, N represents the number of characters or words of the input sentence, i =1, 2.
Compared with other pre-training language models, CINO contains more general semantic information of low-resource glue language, and more accurate character representation or word representation of the target language and the data enhancement language is easy to obtain. Unlike the pre-training-fine-tuning paradigm common in the low-resource language field, the present invention uses only CINO for character representation or word representation, does not adjust model parameters, reduces computational power requirements, and can alleviate the problem of negative migration between multiple languages (i.e., the problem of cursing multiple languages) and the problem of forgetfulness for low-resource languages.
3、W 2 The NER framework is divided into a coding layer, a convolutional layer and a combined prediction layer, and is used for further extracting features and predicting multi-language character representations or word representations obtained according to CINO, deeply mining low-resource glue language characters and characters or joint prediction layersComplex relationships from word to word and generates a sequence of predictive labels.
The method of the present embodiment is the same as the method of processing characters and words, and the following description will be given only by taking words as an example.
Using W 2 The NER framework further extracts multi-language features of the target text and predicts the multi-language features to obtain the relation between words and generate a prediction tag sequence, and the method specifically comprises the following steps:
and 3.1, acquiring context information from the input sentence by adopting Bi-LSTM to obtain a final word expression. The word representation may be expressed as
Figure BDA0003997691820000041
Wherein h is i Is x i Is expressed by the word(s) in->
Figure BDA0003997691820000042
Is a set of real numbers, d h Is h i Dimension (d) of (a).
And 3.2, transmitting the word representation into a convolutional layer, and refining the word pair grid representation through CLN, BERT grid representation, MLP and multi-granularity expansion convolution for subsequent relation classification between words.
CLN expands word representation from 2 dimension to 3 dimension to obtain 3 dimension matrix of word pair grid representation
Figure BDA0003997691820000051
Element S of the matrix ij Is a word pair (x) i ,x j ) Is represented by (A), S ij Is shown in formula 1:
Figure BDA0003997691820000052
wherein, γ ij And λ ij Respectively, the gain parameter and the layer normalized deviation, mu and sigma respectively representing h j The average and standard deviation between elements, as indicated by Hardamard product.
The BERT grid represents the word pair grid representation information S and the word pair position information I d And mesh region informationI t The position-area perception representation of the grid is obtained by connecting the two parts like BERT input representation, and then the dimensionality is reduced by a multilayer perceptron MLP, so that the word-pair grid representation can be enriched, as shown in a formula 2:
G=MLP([s,I d ,I t ]) Equation 2
The multi-granularity expansion convolution calculates the relationship between words with different distances as shown in equation 3:
R l =GELU(DC l (G) Equation 3)
Where l is the expansion ratio, l =1,2,3; the GELU is a GELU activation function, and the DC is a multi-granularity extension convolution. The final word pair lattice representation is shown in equation 4:
Figure BDA0003997691820000053
wherein d is G Is the dimension of G.
And 3.3, dividing the combined prediction layer into a Biaffine prediction layer and an MLP prediction layer, preferentially selecting relation scores from different angles by the combined prediction layer, and performing combined reasoning on the relation among all word pairs. Biaffine's main function is to act as a residual join, with its input coming from the Bi-LSTM part. Then, the word pair (x) is calculated by the classifier i ,x j ) As shown in equations 5-7:
m i =MLP(h i ) Equation 5
n j =MLP(h j ) Equation 6
Figure BDA0003997691820000054
Wherein A, B and B are trainable parameters, m i And n j Respectively, a target word and other words of the same sentence.
The MLP prediction layer uses word pairs (x) from the convolutional layer i ,x j ) Relation R ij As input, a relationship score is calculated using MLP, as shown in equation 8:
y″ ij =MLP(R ij ) Equation 8
Finally, summing the results output by the Biaffine prediction layer and the MLP prediction layer, and inputting the sum into a Softmax function to calculate a word pair (x) i ,x j ) The final result of the relationship score, as shown in equation 9:
y ij =Softmax(y′ ij +y″ ij ) Equation 9
And 3.4, regarding the relation between words as a directed word graph, decoding and predicting by adopting NNW and THW mechanisms to obtain prediction probability, and finally obtaining a prediction label sequence according to the prediction probability.
3.5, in order to be closer to the target label sequence of each sentence, only setting a log-likelihood loss function in the process of training the deep learning model, and adjusting parameters, as shown in formula 10:
Figure BDA0003997691820000061
wherein
Figure BDA0003997691820000062
Is (x) i ,x j ) R denotes->
Figure BDA0003997691820000066
In a predetermined relationship of (a), based on a number r of predetermined relationships>
Figure BDA0003997691820000063
Represents (x) i ,x j ) And/or a relationship tag of>
Figure BDA0003997691820000064
Comprising a pair->
Figure BDA0003997691820000065
The prediction probability of (2).
4. Step 2 and step 3 are general steps of training a model and testing the model or actually applying the model. And (3) performing multiple rounds of training and parameter adjustment on the model in a training stage according to the training set after data enhancement and the loss function in the step 3.5, obtaining models with different parameters through each round of training, selecting an optimal model according to a verification set of the target language data set, and finally evaluating the test set of the target language data set or predicting the test set of the target language data set according to a target language text in practical application.
Example two
This embodiment provides a method for labeling low-resource glue language sequence, which takes the tasks of morphological segmentation, named entity recognition and part-of-speech tagging of Uyghur language as an example, and combines FIG. 2 to enhance data, extract CINO multi-language features, and W 2 The three specific embodiments of the NER framework for generating predicted tag sequences are described in detail.
1. Data set
The Uygur language data set used in the experiment is from a morphological segmentation data set THUUMS (text mainly comes from Tianshan network), a named entity recognition data set Wikian and a part-of-speech tagging data set Universal dependences, and the data set of each task is divided into a training set, a verification set and a training set. The THUUMS is labeled by taking characters as units, morpheme boundaries are used as research targets, labels are b (egin), m (iddle), e (nd) and s (ingle), and the labels respectively mean beginning characters, middle characters and ending characters of morphemes and independent characters (namely, the independent characters form morphemes); wikian labels words as units, and extracts named entities as research targets, and labels are 0 (None), LOC (location), PER (person) and ORG (organization); the Universal dependences are labeled in units of words to distinguish PARTs of speech as a research target, and the labels are non (common non), PUNCT (progression), ADP (progression), NUM (numerical), SYM (non-progression) symbol, SCONJ (supervision connection), ADJ (objective), PART (partial), DET (determiner), CCONJ (correlation connection), prop (positive non), prop (positive), X (other), ADV (advertisement), INTJ (interpretation), mail (main), and AUX (automatic cover), object, and person). The training and verification set is used in the deep learning model training stage, parameters are automatically adjusted, the optimal version is selected, and the test set is used for testing the performance of the deep learning model (the target text in any same language can be changed during actual application).
2. Data enhancement
And adding training set data of the same task of other similar languages into the training set of the deep learning model. Training set data for data enhancement were from wikian (kazakh, turkey, ashbyjiang, kolbenzi, and uzzibeth) and Universal Dependencies (kazakh and turkey). The Uygur language morphological segmentation task has sufficient data, and does not consider the data enhancement of the task. Data enhancement is beneficial to providing more multi-language features under the condition of low resources, and the model learns additional semantic information through the similarity between languages, so that the problems of insufficient labeled data of the target language and the like are solved.
3. Extracting features using a multilingual feature extractor, comprising the steps of:
a1, segmenting a target text by using a Sennce Piece toolkit to obtain a token;
a2, each token is mapped to a real number vector according to the generated dictionary;
and A3, obtaining a preliminary multi-language word representation through a maximum pooling mechanism.
4. Using W 2 The NER framework further extracts word sequence features, W 2 The NER framework mainly comprises a coding layer, a convolutional layer and a joint prediction layer.
The method of the present embodiment is the same as the method of processing characters and words, and only words will be described below as an example.
And the coding layer inputs the multi-language word representation obtained in the step A3 into the Bi-LSTM, and obtains the final word representation according to the context information of each sentence.
The step of the convolution layer comprises:
c1, expanding the word representation from 2 dimensions to 3 dimensions through CLN to obtain representation word pair grid representation;
c2, connecting word pair grid representation information, word pair position information and grid region information like BERT input representation through BERT grid representation to obtain position-region perception representation of the grid;
c3, reducing dimensionality through a multilayer perceptron, and enriching word-pair grid representation;
and C4, capturing interaction between words with different distances through multi-granularity expansion convolution.
And the joint prediction layer adopts a joint prediction layer consisting of a Biaffine prediction layer and an MLP prediction layer to preferentially select relationship scores from different angles, joint reasoning is carried out on the relationships contained in all word pairs, the relationships between the words are taken as directed graphs, decoding prediction is carried out by adopting NNW and THW mechanisms to obtain prediction probability, and a prediction tag sequence is finally obtained according to the prediction probability. The method specifically comprises the following steps:
d1, the Biaffine predictor mainly functions as a residual error connection, the input of the residual error connection comes from a Bi-LSTM part, and the output of the residual error connection is a word-pair relation score calculated by the classifier;
d2, calculating a word pair relation score from another angle by the MLP predictor, inputting a word pair grid representation from the convolutional layer, and calculating the word pair relation score by utilizing a multilayer perceptron;
d3, summing the two results of the D1 and the D2 and then inputting the summed results into a Softmax function, and calculating a final result of the word pair relationship score;
d4, decoding;
and D5, obtaining a predicted tag sequence on the basis of deep abstract representation.
Wherein, the step D4 comprises:
e1, calculating the relation between entity words by an NNW mechanism;
the E2, THW mechanism identifies the boundaries of each entity;
and E3, regarding the relation between the words as a directed word graph, predicting through NNW and THW mechanisms, and finally obtaining a predicted label sequence according to the prediction probability.
5. And performing multiple rounds of training and parameter adjustment on the model in a training stage according to the training set and the log-likelihood loss function after data enhancement, obtaining models with different parameters in each round of training, selecting an optimal model according to a verification set of the Uygur language data set, and finally performing evaluation on a test set of the Uygur language data set.
In the method for labeling low-resource glue word sequences of the present embodiment, the ablation experiment result is shown in table 1, the example prediction effect is shown in table 2, and the example prediction result is shown in table 3.
TABLE 1
Figure BDA0003997691820000081
TABLE 2
Model (model) Morphological segmentation (macro-F1) Named entity recognition (micro-F1) Part of speech tagging (Accuracy)
mBERT-uncased 97.57 66.67 83.85
XLM-R-Large 97.21 67.75 88.81
CINO-Large-v2 97.53 71.15 89.49
W2NER 97.57 44.44 57.31
The invention 98.10 79.11 91.00
TABLE 3
Figure BDA0003997691820000091
Therefore, the method for labeling the low-resource glue word sequence of the embodiment makes full use of the multi-language features and the relation between characters or words, and the prediction effect is obviously superior to that of all existing method models. With each portion removed, there is a significant degradation in performance.

Claims (7)

1. A method for labeling low-resource glue word sequences, which is characterized by comprising the following steps:
enhancing the data, and adding training set data of other similar languages of the same task into a training set of the deep learning model;
preliminarily extracting multi-language features of a target text by utilizing a Chinese minority language pre-training language model;
using W 2 The NER framework further extracts and predicts the multi-language features of the target text, obtains the relation between characters or words and generates a prediction label sequence.
2. The method for labeling low-resource glue sequences as claimed in claim 1, wherein the Chinese minority language pre-training language model is only used for character representation or word representation, without adjusting model parameters, and the method for preliminarily extracting multi-language features by using the Chinese minority language pre-training language model comprises:
using SentenThe ce Piece toolkit is used for segmenting the target text, each input sentence is represented by a plurality of tokens, each token is mapped to a real number vector according to the generated dictionary, and then each input sentence X = { X = (the number of the input sentences is one) is obtained 1 ,x 2 ,...,x N Get preliminary multilingual character representations or word representations through a max-pooling mechanism, where x i N, N represents the number of characters or words of the input sentence, i =1, 2.
3. The method of claim 1, wherein W is used for labeling of low-resource glue sequences 2 The NER framework understands the relation between characters and characters or words and generates a prediction label sequence, and comprises the following steps:
acquiring context information from an input sentence by adopting a bidirectional long-short term memory network to obtain a final multilingual character representation or word representation;
transmitting the character representation or word representation into a convolutional layer, generating character pair representation or word pair representation through condition layer normalization, BERT grid representation, a multilayer perceptron and multi-granularity expansion convolution, further generating and optimizing the character pair grid representation or word pair grid representation, and using the character pair grid representation or word pair grid representation for subsequent relation classification between characters or words;
preferentially selecting relation scores from different angles by adopting a combined prediction layer consisting of a Biaffine prediction layer and an MLP prediction layer, and performing combined reasoning on the relations contained in all character pairs or word pairs;
and taking the relation between characters or words as a directed graph, decoding by adopting NNW and THW mechanisms to obtain a prediction probability, and finally obtaining a prediction label sequence according to the prediction probability.
4. The method for low-resource glue word sequence annotation of claim 3, wherein the condition layer normalizes the expansion of the character representation or word representation from 2 dimensions to 3 dimensions, resulting in a character-to-grid representation or word-to-grid representation.
5. The method for low-resource glue word sequence annotation defined in claim 3 wherein the BERT grid representation links the character-to-grid representation information or word-to-grid representation information, character-to-position information or word-to-position information and grid region information as the BERT input representation to obtain the position-region perception representation of the grid, and further wherein the multi-layer perceptron is used to reduce dimensionality to enrich the character-to-grid representation or word-to-grid representation.
6. The method for low-resource glue word sequence annotation of claim 3, wherein multi-granular extended convolution captures interactions between characters or words of different distances.
7. The method for low-resource glue phrase sequence tagging of claim 3, wherein the input to the Biaffine prediction layer is from a two-way long-short term memory network, the output of which is a character pair or word pair relationship score calculated by a classifier; the input of the MLP prediction layer is from the character pair grid representation or the word pair grid representation of the convolution layer, and the character pair or the word pair relation score is calculated by utilizing a multilayer perceptron; and adding the outputs of the Biaffine prediction layer and the MLP prediction layer into a Softmax function, and calculating to obtain a final result of the character pair or word pair relation score.
CN202211612122.7A 2022-12-14 2022-12-14 Method for labeling low-resource glue word sequence Pending CN115935959A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211612122.7A CN115935959A (en) 2022-12-14 2022-12-14 Method for labeling low-resource glue word sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211612122.7A CN115935959A (en) 2022-12-14 2022-12-14 Method for labeling low-resource glue word sequence

Publications (1)

Publication Number Publication Date
CN115935959A true CN115935959A (en) 2023-04-07

Family

ID=86551807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211612122.7A Pending CN115935959A (en) 2022-12-14 2022-12-14 Method for labeling low-resource glue word sequence

Country Status (1)

Country Link
CN (1) CN115935959A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738345A (en) * 2023-08-15 2023-09-12 腾讯科技(深圳)有限公司 Classification processing method, related device and medium
CN116977436A (en) * 2023-09-21 2023-10-31 小语智能信息科技(云南)有限公司 Burmese text image recognition method and device based on Burmese character cluster characteristics

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738345A (en) * 2023-08-15 2023-09-12 腾讯科技(深圳)有限公司 Classification processing method, related device and medium
CN116738345B (en) * 2023-08-15 2024-03-01 腾讯科技(深圳)有限公司 Classification processing method, related device and medium
CN116977436A (en) * 2023-09-21 2023-10-31 小语智能信息科技(云南)有限公司 Burmese text image recognition method and device based on Burmese character cluster characteristics
CN116977436B (en) * 2023-09-21 2023-12-05 小语智能信息科技(云南)有限公司 Burmese text image recognition method and device based on Burmese character cluster characteristics

Similar Documents

Publication Publication Date Title
US11631007B2 (en) Method and device for text-enhanced knowledge graph joint representation learning
KR102490752B1 (en) Deep context-based grammatical error correction using artificial neural networks
CN106599032B (en) Text event extraction method combining sparse coding and structure sensing machine
CN111914097A (en) Entity extraction method and device based on attention mechanism and multi-level feature fusion
CN110895932A (en) Multi-language voice recognition method based on language type and voice content collaborative classification
CN115935959A (en) Method for labeling low-resource glue word sequence
CN111858932A (en) Multiple-feature Chinese and English emotion classification method and system based on Transformer
CN109086269B (en) Semantic bilingual recognition method based on semantic resource word representation and collocation relationship
CN114580382A (en) Text error correction method and device
CN111832293B (en) Entity and relation joint extraction method based on head entity prediction
CN112232053A (en) Text similarity calculation system, method and storage medium based on multi-keyword pair matching
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
CN114417851A (en) Emotion analysis method based on keyword weighted information
Chen et al. Research on automatic essay scoring of composition based on CNN and OR
CN111553157A (en) Entity replacement-based dialog intention identification method
CN114118113A (en) Machine translation method based on context recognition
CN113869054A (en) Deep learning-based electric power field project feature identification method
CN111815426B (en) Data processing method and terminal related to financial investment and research
KR102204395B1 (en) Method and system for automatic word spacing of voice recognition using named entity recognition
CN113076744A (en) Cultural relic knowledge relation extraction method based on convolutional neural network
CN113065350A (en) Biomedical text word sense disambiguation method based on attention neural network
CN115906854A (en) Multi-level confrontation-based cross-language named entity recognition model training method
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism
WO2021129410A1 (en) Method and device for text processing
Sreeram et al. A Novel Approach for Effective Recognition of the Code-Switched Data on Monolingual Language Model.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination