CN109783809A - A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus - Google Patents
A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus Download PDFInfo
- Publication number
- CN109783809A CN109783809A CN201811577667.2A CN201811577667A CN109783809A CN 109783809 A CN109783809 A CN 109783809A CN 201811577667 A CN201811577667 A CN 201811577667A CN 109783809 A CN109783809 A CN 109783809A
- Authority
- CN
- China
- Prior art keywords
- sentence
- alignment
- corpus
- chinese
- encoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 9
- 239000013598 vector Substances 0.000 claims description 37
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000012360 testing method Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 4
- 238000011160 research Methods 0.000 abstract description 5
- 238000000605 extraction Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 abstract description 3
- 238000010801 machine learning Methods 0.000 abstract description 2
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009412 basement excavation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Landscapes
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention discloses a kind of methods that alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus, belong to natural language processing and machine learning techniques field.The corpus that chapter grade is aligned by the present invention first carries out the processing of regular expression using python, get rid of noise data, and as input, since Laotian and the sentence sequence of Chinese are consistent, so can first handle the corpus of chapter grade for single alignment sentence, the sentence of alignment is split later.These sentences being aligned are segmented later, using this language of participle as the input of LSTM, result is exported to the intermediate of list entries by retaining LSTM encoder, one model of training is selectively learnt to input to these and is associated output sequence when model exports, to extract parallel sentence pairs from bilingualism corpora.The present invention has certain research significance in the extraction of Laotian parallel sentence pairs.
Description
Technical field
The present invention relates to a kind of method for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, especially one
Kind is based on LSTM (Long Short-Term Memory shot and long term memory network) from Laos-Chinese chapter grade alignment corpus
The method for extracting alignment sentence, belongs to natural language processing and machine learning techniques field.
Background technique
Bilingual corpora is the important foundation money of the research fields such as statistical machine translation, cross-language retrieval, bilingual dictionary building
Source, the quantity and quality of bilingual corpora largely influence the final result for even determining inter-related task.And parallel sentence pairs
Excavation then be construct bilingual corpora key technology, thus have important researching value.In many cases, bilingual corpora I
The text that can obtain, but obtain be generally not be aligned as unit of sentence, such as some be with paragraph or by
It is aligned according to entire article.In this case, it is necessary to not be that the corpus arrangement being aligned as unit of sentence is formed a complete sentence by these
Son alignment format, to carry out the extraction of parallel sentence pairs.
Summary of the invention
It is aligned the technical problem to be solved by the present invention is providing a kind of extract from Laos-Chinese chapter grade alignment corpus
The method of sentence extracts alignment sentence for solving from Chinese-Laotian alignment corpus, can effectively improve sentence alignment
Accuracy rate.
The technical solution adopted by the present invention is that: it is a kind of to extract alignment sentence from Laos-Chinese chapter grade alignment corpus
Method includes the following steps:
The old bilingual corpora of the Chinese-is first passed through python code using regular expression to carry out noise processed, so by Step1
Data set division is carried out to these alignment segments afterwards, wherein the training set being aligned accounts for 90%, and out-of-order test set accounts for 10%;
Step2, according to training set and the sentence of test set, the phrase and each phrase for counting inequality therein go out
Existing number calculates the term vector of sentence by word-embedding;
Step3, using Step2 obtain term vector as the input of LSTM algorithm, i.e., at this time LSTM algorithm as the portion encoder
Point, and using these term vectors as the input at the end encoder, the initialization vector that the part encoder passes through LSTM algorithm carries out
Similarity calculation;
Step4, each term vector is exported via the part encoder, by softmax function, find out each sentence word to
The semantic coding C of amount forms a sequence vector;
Step5, by sequence vector obtained in Step4, as the initial input of the part decoder, in the part decoder
It joined Attention mechanism, when decoding, each step all can be selected selectively from the sequence vector of semantic coding C
A subset is further processed;So the output at each moment is as the defeated of subsequent time in the part decoder
Entering, each output can accomplish the information for making full use of list entries to carry, and so on, until ending;
Step6, by the calculating of encoder and the similarity of the part decoder, obtain the highest sentence word of similarity to
Amount, the sentence being made up of term vector, to complete the language for extracting the old bilingual alignment of the Chinese-from the chapter grade corpus of alignment
Sentence.
Specifically, alignment segment described in the Step1 is by the alignment chapter corpus after noise processed.
Specifically, the Step2 is encoded by python, is carried out sentence participle to initial chapter grade alignment corpus, is led to
It crosses code and realizes the participle of single sentence Laotian sentence and Chinese sentence, and count word number.
Specifically, specific step is as follows by the Step3:
The sentence branched away is inputted, sentence is segmented, by, as inputting, being input to after word-embedding
In LSTM, hidden layer information h then is obtained by hidden layer1, h2..., first moment of the part encoder during this time
Hidden-state be assumed to be Z0(initializaing variable) then uses Z0And h1, h2... similarity calculation is carried out, is obtained each
The a at moment10,a20,a30,…aij, wherein the subscript i of a indicates the subscript of hidden layer information in encoder, and the subscript j of a is indicated
The subscript of the initializaing variable of neural network.
Specifically, the step Step5 can have an input, to input sequence in each step decoding of decoder stage
Arrange the information h of all hidden layers1,h2,…htIt is weighted summation, that is, every time all can be all when predicting next word
The hidden layer information of list entries is all read through, determine it is most related to those of list entries word when prediction current word,
Attention mechanism represented in the decoding decoder stage, can input the vector C an of context every timei, hidden layer
New state SiAccording to the state S of previous stepi-1,Yi,CiThe nonlinear function of three obtains, such as formula (1), wherein CiFor
The weighted average of per moment output state in encoder stage and, solutions mode be formula (2), Si-1,YiRespectively decoder
The previous state in stage and the preceding predicted value once exported, here hjFor each moment output state in encoder stage, aijFor
The corresponding h of input i in each decoder stagejWeighted value size;
Si=F (Si-1, Yi, Ci) (1)
Specifically, the step Step6 is after by similarity calculation, the sentence being made up of term vector, thus
Complete the sentence that the old bilingual alignment of the Chinese-is extracted from the chapter grade corpus of alignment.
The beneficial effects of the present invention are:
It (1) should be based on extracting in alignment sentence method from Laos-Chinese chapter grade alignment corpus based on LSTM, relatively
Algorithm model than the one-side encoder-decoder accuracy rate in Chinese-Laotian extracts increases.
(2) it should be aligned in sentence method, be used based on being extracted from Laos-Chinese chapter grade alignment corpus for LSTM
LSTM algorithm, compare other algorithms, there is goodr raising in the effect of feature extraction.
(3) it should be aligned in sentence method based on being extracted from Laos-Chinese chapter grade alignment corpus for LSTM, and incorporate Laos
The grammar property of language grammar property and Chinese, can be come out by deep learning with automatic identification, compared to manual identified, speed
Faster, generalization is stronger, time saving and energy saving.
Detailed description of the invention
Fig. 1 is the flow chart in the present invention;
Fig. 2 is the basic block diagram of LSTM used in the present invention training term vector;
Fig. 3 is the encoder-decoder model schematic of Attention mechanism of the present invention;
Fig. 4 is that Attention model of the present invention calculates term vector schematic diagram.
Specific embodiment
Embodiment 1: as shown in Figs 1-4, a kind of side extracting alignment sentence from Laos-Chinese chapter grade alignment corpus
Method includes the following steps::
The old bilingual corpora of the Chinese-is first passed through python code using regular expression to carry out noise processed, so by Step1
Data set division is carried out to these alignment segments afterwards, wherein the training set being aligned accounts for 90%, and out-of-order test set accounts for 10%;
Step2, according to training set and the sentence of test set, the phrase and each phrase for counting inequality therein go out
Existing number calculates the term vector of sentence by word-embedding;
Step3, using Step2 obtain term vector as the input of LSTM algorithm, i.e., at this time LSTM algorithm as the portion encoder
Point, and using these term vectors as the input at the end encoder, the initialization vector that the part encoder passes through LSTM algorithm carries out
Similarity calculation;
Step4, each term vector is exported via the part encoder, by softmax function, find out each sentence word to
The semantic coding C of amount forms a sequence vector;
Step5, by sequence vector obtained in Step4, as the initial input of the part decoder, in the part decoder
It joined Attention mechanism, when decoding, each step all can be selected selectively from the sequence vector of semantic coding C
A subset is further processed;So the output at each moment is as the defeated of subsequent time in the part decoder
Entering, each output can accomplish the information for making full use of list entries to carry, and so on, until ending;
Step6, by the calculating of encoder and the similarity of the part decoder, obtain the highest sentence word of similarity to
Amount, the sentence being made up of term vector, to complete the language for extracting the old bilingual alignment of the Chinese-from the chapter grade corpus of alignment
Sentence.
Further, alignment segment described in the Step1 is by the alignment chapter corpus after noise processed.
Further, the Step2 is encoded by python, carries out sentence participle to initial chapter grade alignment corpus,
The participle of single sentence Laotian sentence and Chinese sentence is realized by code, and counts word number.
Further, specific step is as follows by the Step3:
The sentence branched away is inputted, sentence is segmented, by, as inputting, being input to after word-embedding
In LSTM, hidden layer information h then is obtained by hidden layer1, h2..., first moment of the part encoder during this time
Hidden-state be assumed to be Z0(initializaing variable) then uses Z0And h1, h2... similarity calculation is carried out, is obtained each
The a at moment10,a20,a30,…aij, wherein the subscript i of a indicates the subscript of hidden layer information in encoder, and the subscript j of a is indicated
The subscript of the initializaing variable of neural network.
Further, the step Step5 can have an input, to input in each step decoding of decoder stage
The information h of all hidden layers of sequence1,h2,…htIt is weighted summation, that is, every time all can be institute when predicting next word
There is the hidden layer information of list entries all to read through, determine it is most related to those of list entries word when prediction current word,
Attention mechanism represented in the decoding decoder stage, can input the vector C an of context every timei, hidden layer
New state SiAccording to the state S of previous stepi-1,Yi,CiThe nonlinear function of three obtains, such as formula (1), wherein CiFor
The weighted average of per moment output state in encoder stage and, solutions mode be formula (2), Si-1,YiRespectively decoder
The previous state in stage and the preceding predicted value once exported, here hjFor each moment output state in encoder stage, aijFor
The corresponding h of input i in each decoder stagejWeighted value size;
Si=F (Si-1, Yi, Ci) (1)
Further, the step Step6 is after by similarity calculation, the sentence being made up of term vector, from
And complete the sentence that the old bilingual alignment of the Chinese-is extracted from the chapter grade corpus of alignment.
Bilingualism corpora is used as the important language resource of natural language research field the most, and the research of language information processing is deep
Enter, in the acquisition of corpus, processing has significant progress.The present invention has mainly merged Laotian linguistic feature to algorithm model
In, the method that a variety of Model Fusions have been selected in the use of model improves accuracy of identification, (is paid attention to using Attention mechanism
Power mechanism), and take LSTM as encoder-decoder (coder-decoder).The corpus that chapter grade is aligned first uses
Python carries out the processing of regular expression, noise data is got rid of, and as input, since the sentence of Laotian and Chinese is arranged
Sequence is consistent, it is possible to first be handled the corpus of chapter grade for single alignment sentence, later be carried out the sentence of alignment
It splits.These sentences being aligned are segmented later, using this language of participle as the input of LSTM, by retaining LSTM coding
Device is to output among list entries as a result, training a model selectively to be learnt and these inputs in model
Output sequence is associated when output, to extract parallel sentence pairs from bilingualism corpora.The present invention is parallel in Laotian
Sentence pair has certain research significance on extracting.
In conjunction with attached drawing, the embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned
Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept
Put that various changes can be made.
Claims (6)
1. a kind of method for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, it is characterised in that: including walking as follows
It is rapid:
The old bilingual corpora of the Chinese-is first passed through python code and carries out noise processed using regular expression by Step1, then right
These alignment segments carry out data set division, wherein the training set being aligned accounts for 90%, and out-of-order test set accounts for 10%;
Step2, according to training set and the sentence of test set, what the phrase and each phrase for counting inequality therein occurred
Number calculates the term vector of sentence by word-embedding;
Step3, using Step2 obtain term vector as the input of LSTM algorithm, i.e., at this time LSTM algorithm as the part encoder,
And using these term vectors as the input at the end encoder, the initialization vector that the part encoder passes through LSTM algorithm carries out similar
Degree calculates;
Step4, each term vector are exported via the part encoder, by softmax function, find out each sentence term vector
Semantic coding C forms a sequence vector;
Sequence vector obtained in Step4 is added as the initial input of the part decoder in the part decoder Step5
Attention mechanism, when decoding, each step all can selectively select one from the sequence vector of semantic coding C
Subset is further processed;So in the part decoder, input of the output at each moment as subsequent time, often
One output, can accomplish the information for making full use of list entries to carry, and so on, until ending;
Step6 obtains the highest sentence term vector of similarity by the calculating of encoder and the similarity of the part decoder,
The sentence being made up of term vector, to complete the sentence for extracting the old bilingual alignment of the Chinese-from the chapter grade corpus of alignment.
2. the method according to claim 1 for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, feature
Be: alignment segment described in the Stepl is by the alignment chapter corpus after noise processed.
3. the method according to claim 1 for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, feature
Be: the Step2 is encoded by python, is carried out sentence participle to initial chapter grade alignment corpus, is realized by code
The participle of single sentence Laotian sentence and Chinese sentence, and count word number.
4. the method according to claim 1 for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, feature
Be: specific step is as follows by the Step3:
The sentence branched away is inputted, sentence is segmented, by, as inputting, being input to LSTM after word-embedding
In, then hidden layer information h is obtained by hidden layer1, h2..., first moment of the part encoder during this time
Hidden-state is assumed to be Z0(initializaing variable) then uses Z0And h1, h2... similarity calculation is carried out, when obtaining each
The a at quarter10, a20, a30... aij, wherein the subscript i of a indicates the subscript of hidden layer information in encoder, and the subscript j of a indicates mind
The subscript of initializaing variable through network.
5. the method according to claim 4 for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, feature
Be: the step Step5 can have an input in each step decoding of decoder stage, all to list entries to hide
The information h of layer1, h2... htIt is weighted summation, that is, every time all can be all list entries when predicting next word
Hidden layer information is all read through, and determines most related to those of list entries word when prediction current word, and Attention mechanism represents
In the decoding decoder stage, the vector C an of context can be inputted every timei, the new state Si of hidden layer is according to previous step
State Si-1, Yi, CiThe nonlinear function of three obtains, such as formula (1), wherein CiFor per moment in encoder stage
The weighted average of output state and, solutions mode be formula (2), Si-1, YiRespectively the previous state in decoder stage is with before
The predicted value once exported, here hjFor each moment output state in encoder stage, aijFor each decoder stage
Input the corresponding h of ijWeighted value size;
Si=F (Si-1, Yi, Ci) (1)
。
6. the method according to claim 1 for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, feature
Be: the step Step6 is after by similarity calculation, the sentence being made up of term vector, to complete from alignment
Chapter grade corpus in extract the sentence of the old bilingual alignment of the Chinese-.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811577667.2A CN109783809B (en) | 2018-12-22 | 2018-12-22 | Method for extracting aligned sentences from Laos-Chinese chapter level aligned corpus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811577667.2A CN109783809B (en) | 2018-12-22 | 2018-12-22 | Method for extracting aligned sentences from Laos-Chinese chapter level aligned corpus |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109783809A true CN109783809A (en) | 2019-05-21 |
CN109783809B CN109783809B (en) | 2022-04-12 |
Family
ID=66498083
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811577667.2A Active CN109783809B (en) | 2018-12-22 | 2018-12-22 | Method for extracting aligned sentences from Laos-Chinese chapter level aligned corpus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109783809B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110362820A (en) * | 2019-06-17 | 2019-10-22 | 昆明理工大学 | A kind of bilingual parallel sentence extraction method of old man based on Bi-LSTM algorithm |
CN110414009A (en) * | 2019-07-09 | 2019-11-05 | 昆明理工大学 | The remote bilingual parallel sentence pairs abstracting method of English based on BiLSTM-CNN and device |
CN110717341A (en) * | 2019-09-11 | 2020-01-21 | 昆明理工大学 | Method and device for constructing old-Chinese bilingual corpus with Thai as pivot |
CN112232090A (en) * | 2020-09-17 | 2021-01-15 | 昆明理工大学 | Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM |
CN112287688A (en) * | 2020-09-17 | 2021-01-29 | 昆明理工大学 | English-Burmese bilingual parallel sentence pair extraction method and device integrating pre-training language model and structural features |
WO2021017025A1 (en) * | 2019-07-29 | 2021-02-04 | 东北大学 | Method for automatically generating python codes from natural language |
CN113095091A (en) * | 2021-04-09 | 2021-07-09 | 天津大学 | Chapter machine translation system and method capable of selecting context information |
CN113705168A (en) * | 2021-08-31 | 2021-11-26 | 苏州大学 | Cross-level attention mechanism-based chapter neural machine translation method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391885A (en) * | 2014-11-07 | 2015-03-04 | 哈尔滨工业大学 | Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training |
CN105022728A (en) * | 2015-07-13 | 2015-11-04 | 广西达译商务服务有限责任公司 | Automatic acquisition system of Chinese and Lao bilingual parallel texts and implementation method |
CN107967262A (en) * | 2017-11-02 | 2018-04-27 | 内蒙古工业大学 | A kind of neutral net covers Chinese machine translation method |
JP2018072979A (en) * | 2016-10-26 | 2018-05-10 | 株式会社エヌ・ティ・ティ・データ | Parallel translation sentence extraction device, parallel translation sentence extraction method and program |
CN108549629A (en) * | 2018-03-19 | 2018-09-18 | 昆明理工大学 | A kind of combination similarity and scheme matched old-Chinese bilingual sentence alignment schemes |
CN109062897A (en) * | 2018-07-26 | 2018-12-21 | 苏州大学 | Sentence alignment method based on deep neural network |
-
2018
- 2018-12-22 CN CN201811577667.2A patent/CN109783809B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391885A (en) * | 2014-11-07 | 2015-03-04 | 哈尔滨工业大学 | Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training |
CN105022728A (en) * | 2015-07-13 | 2015-11-04 | 广西达译商务服务有限责任公司 | Automatic acquisition system of Chinese and Lao bilingual parallel texts and implementation method |
JP2018072979A (en) * | 2016-10-26 | 2018-05-10 | 株式会社エヌ・ティ・ティ・データ | Parallel translation sentence extraction device, parallel translation sentence extraction method and program |
CN107967262A (en) * | 2017-11-02 | 2018-04-27 | 内蒙古工业大学 | A kind of neutral net covers Chinese machine translation method |
CN108549629A (en) * | 2018-03-19 | 2018-09-18 | 昆明理工大学 | A kind of combination similarity and scheme matched old-Chinese bilingual sentence alignment schemes |
CN109062897A (en) * | 2018-07-26 | 2018-12-21 | 苏州大学 | Sentence alignment method based on deep neural network |
Non-Patent Citations (2)
Title |
---|
让子强: "汉老双语句子对齐方法研究", 《中国优秀硕士论文全文数据库 信息科技辑》 * |
贾善崇 等: "融入多特征的汉-老双语对齐方法", 《中 国 水 运》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110362820A (en) * | 2019-06-17 | 2019-10-22 | 昆明理工大学 | A kind of bilingual parallel sentence extraction method of old man based on Bi-LSTM algorithm |
CN110362820B (en) * | 2019-06-17 | 2022-11-01 | 昆明理工大学 | Bi-LSTM algorithm-based method for extracting bilingual parallel sentences in old and Chinese |
CN110414009A (en) * | 2019-07-09 | 2019-11-05 | 昆明理工大学 | The remote bilingual parallel sentence pairs abstracting method of English based on BiLSTM-CNN and device |
WO2021017025A1 (en) * | 2019-07-29 | 2021-02-04 | 东北大学 | Method for automatically generating python codes from natural language |
CN110717341A (en) * | 2019-09-11 | 2020-01-21 | 昆明理工大学 | Method and device for constructing old-Chinese bilingual corpus with Thai as pivot |
CN110717341B (en) * | 2019-09-11 | 2022-06-14 | 昆明理工大学 | Method and device for constructing old-Chinese bilingual corpus with Thai as pivot |
CN112232090A (en) * | 2020-09-17 | 2021-01-15 | 昆明理工大学 | Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM |
CN112287688A (en) * | 2020-09-17 | 2021-01-29 | 昆明理工大学 | English-Burmese bilingual parallel sentence pair extraction method and device integrating pre-training language model and structural features |
CN112287688B (en) * | 2020-09-17 | 2022-02-11 | 昆明理工大学 | English-Burmese bilingual parallel sentence pair extraction method and device integrating pre-training language model and structural features |
CN113095091A (en) * | 2021-04-09 | 2021-07-09 | 天津大学 | Chapter machine translation system and method capable of selecting context information |
CN113705168A (en) * | 2021-08-31 | 2021-11-26 | 苏州大学 | Cross-level attention mechanism-based chapter neural machine translation method and system |
Also Published As
Publication number | Publication date |
---|---|
CN109783809B (en) | 2022-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109783809A (en) | A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus | |
CN109657239B (en) | Chinese named entity recognition method based on attention mechanism and language model learning | |
CN107168945B (en) | Bidirectional cyclic neural network fine-grained opinion mining method integrating multiple features | |
CN111241294B (en) | Relationship extraction method of graph convolution network based on dependency analysis and keywords | |
CN109871535A (en) | A kind of French name entity recognition method based on deep neural network | |
CN110083826A (en) | A kind of old man's bilingual alignment method based on Transformer model | |
CN108984526A (en) | A kind of document subject matter vector abstracting method based on deep learning | |
CN109948152A (en) | A kind of Chinese text grammer error correcting model method based on LSTM | |
CN109522411A (en) | A kind of writing householder method neural network based | |
CN109284400A (en) | A kind of name entity recognition method based on Lattice LSTM and language model | |
CN110414009B (en) | Burma bilingual parallel sentence pair extraction method and device based on BilSTM-CNN | |
CN108491372B (en) | Chinese word segmentation method based on seq2seq model | |
CN112231472B (en) | Judicial public opinion sensitive information identification method integrated with domain term dictionary | |
CN112183064B (en) | Text emotion reason recognition system based on multi-task joint learning | |
CN114861600B (en) | NER-oriented Chinese clinical text data enhancement method and device | |
CN110188175A (en) | A kind of question and answer based on BiLSTM-CRF model are to abstracting method, system and storage medium | |
CN113239663B (en) | Multi-meaning word Chinese entity relation identification method based on Hopkinson | |
CN110046356A (en) | Label is embedded in the application study in the classification of microblogging text mood multi-tag | |
CN107894975A (en) | A kind of segmenting method based on Bi LSTM | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN110134950A (en) | A kind of text auto-collation that words combines | |
Han et al. | MAF‐CNER: A Chinese Named Entity Recognition Model Based on Multifeature Adaptive Fusion | |
CN112434514A (en) | Multi-granularity multi-channel neural network based semantic matching method and device and computer equipment | |
CN113536799B (en) | Medical named entity recognition modeling method based on fusion attention | |
CN114564953A (en) | Emotion target extraction model based on multiple word embedding fusion and attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |