CN109783809A - A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus - Google Patents

A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus Download PDF

Info

Publication number
CN109783809A
CN109783809A CN201811577667.2A CN201811577667A CN109783809A CN 109783809 A CN109783809 A CN 109783809A CN 201811577667 A CN201811577667 A CN 201811577667A CN 109783809 A CN109783809 A CN 109783809A
Authority
CN
China
Prior art keywords
sentence
alignment
corpus
chinese
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811577667.2A
Other languages
Chinese (zh)
Other versions
CN109783809B (en
Inventor
周兰江
贾善崇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201811577667.2A priority Critical patent/CN109783809B/en
Publication of CN109783809A publication Critical patent/CN109783809A/en
Application granted granted Critical
Publication of CN109783809B publication Critical patent/CN109783809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a kind of methods that alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus, belong to natural language processing and machine learning techniques field.The corpus that chapter grade is aligned by the present invention first carries out the processing of regular expression using python, get rid of noise data, and as input, since Laotian and the sentence sequence of Chinese are consistent, so can first handle the corpus of chapter grade for single alignment sentence, the sentence of alignment is split later.These sentences being aligned are segmented later, using this language of participle as the input of LSTM, result is exported to the intermediate of list entries by retaining LSTM encoder, one model of training is selectively learnt to input to these and is associated output sequence when model exports, to extract parallel sentence pairs from bilingualism corpora.The present invention has certain research significance in the extraction of Laotian parallel sentence pairs.

Description

A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus
Technical field
The present invention relates to a kind of method for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, especially one Kind is based on LSTM (Long Short-Term Memory shot and long term memory network) from Laos-Chinese chapter grade alignment corpus The method for extracting alignment sentence, belongs to natural language processing and machine learning techniques field.
Background technique
Bilingual corpora is the important foundation money of the research fields such as statistical machine translation, cross-language retrieval, bilingual dictionary building Source, the quantity and quality of bilingual corpora largely influence the final result for even determining inter-related task.And parallel sentence pairs Excavation then be construct bilingual corpora key technology, thus have important researching value.In many cases, bilingual corpora I The text that can obtain, but obtain be generally not be aligned as unit of sentence, such as some be with paragraph or by It is aligned according to entire article.In this case, it is necessary to not be that the corpus arrangement being aligned as unit of sentence is formed a complete sentence by these Son alignment format, to carry out the extraction of parallel sentence pairs.
Summary of the invention
It is aligned the technical problem to be solved by the present invention is providing a kind of extract from Laos-Chinese chapter grade alignment corpus The method of sentence extracts alignment sentence for solving from Chinese-Laotian alignment corpus, can effectively improve sentence alignment Accuracy rate.
The technical solution adopted by the present invention is that: it is a kind of to extract alignment sentence from Laos-Chinese chapter grade alignment corpus Method includes the following steps:
The old bilingual corpora of the Chinese-is first passed through python code using regular expression to carry out noise processed, so by Step1 Data set division is carried out to these alignment segments afterwards, wherein the training set being aligned accounts for 90%, and out-of-order test set accounts for 10%;
Step2, according to training set and the sentence of test set, the phrase and each phrase for counting inequality therein go out Existing number calculates the term vector of sentence by word-embedding;
Step3, using Step2 obtain term vector as the input of LSTM algorithm, i.e., at this time LSTM algorithm as the portion encoder Point, and using these term vectors as the input at the end encoder, the initialization vector that the part encoder passes through LSTM algorithm carries out Similarity calculation;
Step4, each term vector is exported via the part encoder, by softmax function, find out each sentence word to The semantic coding C of amount forms a sequence vector;
Step5, by sequence vector obtained in Step4, as the initial input of the part decoder, in the part decoder It joined Attention mechanism, when decoding, each step all can be selected selectively from the sequence vector of semantic coding C A subset is further processed;So the output at each moment is as the defeated of subsequent time in the part decoder Entering, each output can accomplish the information for making full use of list entries to carry, and so on, until ending;
Step6, by the calculating of encoder and the similarity of the part decoder, obtain the highest sentence word of similarity to Amount, the sentence being made up of term vector, to complete the language for extracting the old bilingual alignment of the Chinese-from the chapter grade corpus of alignment Sentence.
Specifically, alignment segment described in the Step1 is by the alignment chapter corpus after noise processed.
Specifically, the Step2 is encoded by python, is carried out sentence participle to initial chapter grade alignment corpus, is led to It crosses code and realizes the participle of single sentence Laotian sentence and Chinese sentence, and count word number.
Specifically, specific step is as follows by the Step3:
The sentence branched away is inputted, sentence is segmented, by, as inputting, being input to after word-embedding In LSTM, hidden layer information h then is obtained by hidden layer1, h2..., first moment of the part encoder during this time Hidden-state be assumed to be Z0(initializaing variable) then uses Z0And h1, h2... similarity calculation is carried out, is obtained each The a at moment10,a20,a30,…aij, wherein the subscript i of a indicates the subscript of hidden layer information in encoder, and the subscript j of a is indicated The subscript of the initializaing variable of neural network.
Specifically, the step Step5 can have an input, to input sequence in each step decoding of decoder stage Arrange the information h of all hidden layers1,h2,…htIt is weighted summation, that is, every time all can be all when predicting next word The hidden layer information of list entries is all read through, determine it is most related to those of list entries word when prediction current word, Attention mechanism represented in the decoding decoder stage, can input the vector C an of context every timei, hidden layer New state SiAccording to the state S of previous stepi-1,Yi,CiThe nonlinear function of three obtains, such as formula (1), wherein CiFor The weighted average of per moment output state in encoder stage and, solutions mode be formula (2), Si-1,YiRespectively decoder The previous state in stage and the preceding predicted value once exported, here hjFor each moment output state in encoder stage, aijFor The corresponding h of input i in each decoder stagejWeighted value size;
Si=F (Si-1, Yi, Ci) (1)
Specifically, the step Step6 is after by similarity calculation, the sentence being made up of term vector, thus Complete the sentence that the old bilingual alignment of the Chinese-is extracted from the chapter grade corpus of alignment.
The beneficial effects of the present invention are:
It (1) should be based on extracting in alignment sentence method from Laos-Chinese chapter grade alignment corpus based on LSTM, relatively Algorithm model than the one-side encoder-decoder accuracy rate in Chinese-Laotian extracts increases.
(2) it should be aligned in sentence method, be used based on being extracted from Laos-Chinese chapter grade alignment corpus for LSTM LSTM algorithm, compare other algorithms, there is goodr raising in the effect of feature extraction.
(3) it should be aligned in sentence method based on being extracted from Laos-Chinese chapter grade alignment corpus for LSTM, and incorporate Laos The grammar property of language grammar property and Chinese, can be come out by deep learning with automatic identification, compared to manual identified, speed Faster, generalization is stronger, time saving and energy saving.
Detailed description of the invention
Fig. 1 is the flow chart in the present invention;
Fig. 2 is the basic block diagram of LSTM used in the present invention training term vector;
Fig. 3 is the encoder-decoder model schematic of Attention mechanism of the present invention;
Fig. 4 is that Attention model of the present invention calculates term vector schematic diagram.
Specific embodiment
Embodiment 1: as shown in Figs 1-4, a kind of side extracting alignment sentence from Laos-Chinese chapter grade alignment corpus Method includes the following steps::
The old bilingual corpora of the Chinese-is first passed through python code using regular expression to carry out noise processed, so by Step1 Data set division is carried out to these alignment segments afterwards, wherein the training set being aligned accounts for 90%, and out-of-order test set accounts for 10%;
Step2, according to training set and the sentence of test set, the phrase and each phrase for counting inequality therein go out Existing number calculates the term vector of sentence by word-embedding;
Step3, using Step2 obtain term vector as the input of LSTM algorithm, i.e., at this time LSTM algorithm as the portion encoder Point, and using these term vectors as the input at the end encoder, the initialization vector that the part encoder passes through LSTM algorithm carries out Similarity calculation;
Step4, each term vector is exported via the part encoder, by softmax function, find out each sentence word to The semantic coding C of amount forms a sequence vector;
Step5, by sequence vector obtained in Step4, as the initial input of the part decoder, in the part decoder It joined Attention mechanism, when decoding, each step all can be selected selectively from the sequence vector of semantic coding C A subset is further processed;So the output at each moment is as the defeated of subsequent time in the part decoder Entering, each output can accomplish the information for making full use of list entries to carry, and so on, until ending;
Step6, by the calculating of encoder and the similarity of the part decoder, obtain the highest sentence word of similarity to Amount, the sentence being made up of term vector, to complete the language for extracting the old bilingual alignment of the Chinese-from the chapter grade corpus of alignment Sentence.
Further, alignment segment described in the Step1 is by the alignment chapter corpus after noise processed.
Further, the Step2 is encoded by python, carries out sentence participle to initial chapter grade alignment corpus, The participle of single sentence Laotian sentence and Chinese sentence is realized by code, and counts word number.
Further, specific step is as follows by the Step3:
The sentence branched away is inputted, sentence is segmented, by, as inputting, being input to after word-embedding In LSTM, hidden layer information h then is obtained by hidden layer1, h2..., first moment of the part encoder during this time Hidden-state be assumed to be Z0(initializaing variable) then uses Z0And h1, h2... similarity calculation is carried out, is obtained each The a at moment10,a20,a30,…aij, wherein the subscript i of a indicates the subscript of hidden layer information in encoder, and the subscript j of a is indicated The subscript of the initializaing variable of neural network.
Further, the step Step5 can have an input, to input in each step decoding of decoder stage The information h of all hidden layers of sequence1,h2,…htIt is weighted summation, that is, every time all can be institute when predicting next word There is the hidden layer information of list entries all to read through, determine it is most related to those of list entries word when prediction current word, Attention mechanism represented in the decoding decoder stage, can input the vector C an of context every timei, hidden layer New state SiAccording to the state S of previous stepi-1,Yi,CiThe nonlinear function of three obtains, such as formula (1), wherein CiFor The weighted average of per moment output state in encoder stage and, solutions mode be formula (2), Si-1,YiRespectively decoder The previous state in stage and the preceding predicted value once exported, here hjFor each moment output state in encoder stage, aijFor The corresponding h of input i in each decoder stagejWeighted value size;
Si=F (Si-1, Yi, Ci) (1)
Further, the step Step6 is after by similarity calculation, the sentence being made up of term vector, from And complete the sentence that the old bilingual alignment of the Chinese-is extracted from the chapter grade corpus of alignment.
Bilingualism corpora is used as the important language resource of natural language research field the most, and the research of language information processing is deep Enter, in the acquisition of corpus, processing has significant progress.The present invention has mainly merged Laotian linguistic feature to algorithm model In, the method that a variety of Model Fusions have been selected in the use of model improves accuracy of identification, (is paid attention to using Attention mechanism Power mechanism), and take LSTM as encoder-decoder (coder-decoder).The corpus that chapter grade is aligned first uses Python carries out the processing of regular expression, noise data is got rid of, and as input, since the sentence of Laotian and Chinese is arranged Sequence is consistent, it is possible to first be handled the corpus of chapter grade for single alignment sentence, later be carried out the sentence of alignment It splits.These sentences being aligned are segmented later, using this language of participle as the input of LSTM, by retaining LSTM coding Device is to output among list entries as a result, training a model selectively to be learnt and these inputs in model Output sequence is associated when output, to extract parallel sentence pairs from bilingualism corpora.The present invention is parallel in Laotian Sentence pair has certain research significance on extracting.
In conjunction with attached drawing, the embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept Put that various changes can be made.

Claims (6)

1. a kind of method for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, it is characterised in that: including walking as follows It is rapid:
The old bilingual corpora of the Chinese-is first passed through python code and carries out noise processed using regular expression by Step1, then right These alignment segments carry out data set division, wherein the training set being aligned accounts for 90%, and out-of-order test set accounts for 10%;
Step2, according to training set and the sentence of test set, what the phrase and each phrase for counting inequality therein occurred Number calculates the term vector of sentence by word-embedding;
Step3, using Step2 obtain term vector as the input of LSTM algorithm, i.e., at this time LSTM algorithm as the part encoder, And using these term vectors as the input at the end encoder, the initialization vector that the part encoder passes through LSTM algorithm carries out similar Degree calculates;
Step4, each term vector are exported via the part encoder, by softmax function, find out each sentence term vector Semantic coding C forms a sequence vector;
Sequence vector obtained in Step4 is added as the initial input of the part decoder in the part decoder Step5 Attention mechanism, when decoding, each step all can selectively select one from the sequence vector of semantic coding C Subset is further processed;So in the part decoder, input of the output at each moment as subsequent time, often One output, can accomplish the information for making full use of list entries to carry, and so on, until ending;
Step6 obtains the highest sentence term vector of similarity by the calculating of encoder and the similarity of the part decoder, The sentence being made up of term vector, to complete the sentence for extracting the old bilingual alignment of the Chinese-from the chapter grade corpus of alignment.
2. the method according to claim 1 for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, feature Be: alignment segment described in the Stepl is by the alignment chapter corpus after noise processed.
3. the method according to claim 1 for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, feature Be: the Step2 is encoded by python, is carried out sentence participle to initial chapter grade alignment corpus, is realized by code The participle of single sentence Laotian sentence and Chinese sentence, and count word number.
4. the method according to claim 1 for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, feature Be: specific step is as follows by the Step3:
The sentence branched away is inputted, sentence is segmented, by, as inputting, being input to LSTM after word-embedding In, then hidden layer information h is obtained by hidden layer1, h2..., first moment of the part encoder during this time Hidden-state is assumed to be Z0(initializaing variable) then uses Z0And h1, h2... similarity calculation is carried out, when obtaining each The a at quarter10, a20, a30... aij, wherein the subscript i of a indicates the subscript of hidden layer information in encoder, and the subscript j of a indicates mind The subscript of initializaing variable through network.
5. the method according to claim 4 for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, feature Be: the step Step5 can have an input in each step decoding of decoder stage, all to list entries to hide The information h of layer1, h2... htIt is weighted summation, that is, every time all can be all list entries when predicting next word Hidden layer information is all read through, and determines most related to those of list entries word when prediction current word, and Attention mechanism represents In the decoding decoder stage, the vector C an of context can be inputted every timei, the new state Si of hidden layer is according to previous step State Si-1, Yi, CiThe nonlinear function of three obtains, such as formula (1), wherein CiFor per moment in encoder stage The weighted average of output state and, solutions mode be formula (2), Si-1, YiRespectively the previous state in decoder stage is with before The predicted value once exported, here hjFor each moment output state in encoder stage, aijFor each decoder stage Input the corresponding h of ijWeighted value size;
Si=F (Si-1, Yi, Ci) (1)
6. the method according to claim 1 for extracting alignment sentence from Laos-Chinese chapter grade alignment corpus, feature Be: the step Step6 is after by similarity calculation, the sentence being made up of term vector, to complete from alignment Chapter grade corpus in extract the sentence of the old bilingual alignment of the Chinese-.
CN201811577667.2A 2018-12-22 2018-12-22 Method for extracting aligned sentences from Laos-Chinese chapter level aligned corpus Active CN109783809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811577667.2A CN109783809B (en) 2018-12-22 2018-12-22 Method for extracting aligned sentences from Laos-Chinese chapter level aligned corpus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811577667.2A CN109783809B (en) 2018-12-22 2018-12-22 Method for extracting aligned sentences from Laos-Chinese chapter level aligned corpus

Publications (2)

Publication Number Publication Date
CN109783809A true CN109783809A (en) 2019-05-21
CN109783809B CN109783809B (en) 2022-04-12

Family

ID=66498083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811577667.2A Active CN109783809B (en) 2018-12-22 2018-12-22 Method for extracting aligned sentences from Laos-Chinese chapter level aligned corpus

Country Status (1)

Country Link
CN (1) CN109783809B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362820A (en) * 2019-06-17 2019-10-22 昆明理工大学 A kind of bilingual parallel sentence extraction method of old man based on Bi-LSTM algorithm
CN110414009A (en) * 2019-07-09 2019-11-05 昆明理工大学 The remote bilingual parallel sentence pairs abstracting method of English based on BiLSTM-CNN and device
CN110717341A (en) * 2019-09-11 2020-01-21 昆明理工大学 Method and device for constructing old-Chinese bilingual corpus with Thai as pivot
CN112232090A (en) * 2020-09-17 2021-01-15 昆明理工大学 Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM
CN112287688A (en) * 2020-09-17 2021-01-29 昆明理工大学 English-Burmese bilingual parallel sentence pair extraction method and device integrating pre-training language model and structural features
WO2021017025A1 (en) * 2019-07-29 2021-02-04 东北大学 Method for automatically generating python codes from natural language
CN113095091A (en) * 2021-04-09 2021-07-09 天津大学 Chapter machine translation system and method capable of selecting context information
CN113705168A (en) * 2021-08-31 2021-11-26 苏州大学 Cross-level attention mechanism-based chapter neural machine translation method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391885A (en) * 2014-11-07 2015-03-04 哈尔滨工业大学 Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training
CN105022728A (en) * 2015-07-13 2015-11-04 广西达译商务服务有限责任公司 Automatic acquisition system of Chinese and Lao bilingual parallel texts and implementation method
CN107967262A (en) * 2017-11-02 2018-04-27 内蒙古工业大学 A kind of neutral net covers Chinese machine translation method
JP2018072979A (en) * 2016-10-26 2018-05-10 株式会社エヌ・ティ・ティ・データ Parallel translation sentence extraction device, parallel translation sentence extraction method and program
CN108549629A (en) * 2018-03-19 2018-09-18 昆明理工大学 A kind of combination similarity and scheme matched old-Chinese bilingual sentence alignment schemes
CN109062897A (en) * 2018-07-26 2018-12-21 苏州大学 Sentence alignment method based on deep neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391885A (en) * 2014-11-07 2015-03-04 哈尔滨工业大学 Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training
CN105022728A (en) * 2015-07-13 2015-11-04 广西达译商务服务有限责任公司 Automatic acquisition system of Chinese and Lao bilingual parallel texts and implementation method
JP2018072979A (en) * 2016-10-26 2018-05-10 株式会社エヌ・ティ・ティ・データ Parallel translation sentence extraction device, parallel translation sentence extraction method and program
CN107967262A (en) * 2017-11-02 2018-04-27 内蒙古工业大学 A kind of neutral net covers Chinese machine translation method
CN108549629A (en) * 2018-03-19 2018-09-18 昆明理工大学 A kind of combination similarity and scheme matched old-Chinese bilingual sentence alignment schemes
CN109062897A (en) * 2018-07-26 2018-12-21 苏州大学 Sentence alignment method based on deep neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
让子强: "汉老双语句子对齐方法研究", 《中国优秀硕士论文全文数据库 信息科技辑》 *
贾善崇 等: "融入多特征的汉-老双语对齐方法", 《中 国 水 运》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362820A (en) * 2019-06-17 2019-10-22 昆明理工大学 A kind of bilingual parallel sentence extraction method of old man based on Bi-LSTM algorithm
CN110362820B (en) * 2019-06-17 2022-11-01 昆明理工大学 Bi-LSTM algorithm-based method for extracting bilingual parallel sentences in old and Chinese
CN110414009A (en) * 2019-07-09 2019-11-05 昆明理工大学 The remote bilingual parallel sentence pairs abstracting method of English based on BiLSTM-CNN and device
WO2021017025A1 (en) * 2019-07-29 2021-02-04 东北大学 Method for automatically generating python codes from natural language
CN110717341A (en) * 2019-09-11 2020-01-21 昆明理工大学 Method and device for constructing old-Chinese bilingual corpus with Thai as pivot
CN110717341B (en) * 2019-09-11 2022-06-14 昆明理工大学 Method and device for constructing old-Chinese bilingual corpus with Thai as pivot
CN112232090A (en) * 2020-09-17 2021-01-15 昆明理工大学 Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM
CN112287688A (en) * 2020-09-17 2021-01-29 昆明理工大学 English-Burmese bilingual parallel sentence pair extraction method and device integrating pre-training language model and structural features
CN112287688B (en) * 2020-09-17 2022-02-11 昆明理工大学 English-Burmese bilingual parallel sentence pair extraction method and device integrating pre-training language model and structural features
CN113095091A (en) * 2021-04-09 2021-07-09 天津大学 Chapter machine translation system and method capable of selecting context information
CN113705168A (en) * 2021-08-31 2021-11-26 苏州大学 Cross-level attention mechanism-based chapter neural machine translation method and system

Also Published As

Publication number Publication date
CN109783809B (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN109783809A (en) A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus
CN109657239B (en) Chinese named entity recognition method based on attention mechanism and language model learning
CN107168945B (en) Bidirectional cyclic neural network fine-grained opinion mining method integrating multiple features
CN111241294B (en) Relationship extraction method of graph convolution network based on dependency analysis and keywords
CN109871535A (en) A kind of French name entity recognition method based on deep neural network
CN110083826A (en) A kind of old man's bilingual alignment method based on Transformer model
CN108984526A (en) A kind of document subject matter vector abstracting method based on deep learning
CN109948152A (en) A kind of Chinese text grammer error correcting model method based on LSTM
CN109522411A (en) A kind of writing householder method neural network based
CN109284400A (en) A kind of name entity recognition method based on Lattice LSTM and language model
CN110414009B (en) Burma bilingual parallel sentence pair extraction method and device based on BilSTM-CNN
CN108491372B (en) Chinese word segmentation method based on seq2seq model
CN112231472B (en) Judicial public opinion sensitive information identification method integrated with domain term dictionary
CN112183064B (en) Text emotion reason recognition system based on multi-task joint learning
CN114861600B (en) NER-oriented Chinese clinical text data enhancement method and device
CN110188175A (en) A kind of question and answer based on BiLSTM-CRF model are to abstracting method, system and storage medium
CN113239663B (en) Multi-meaning word Chinese entity relation identification method based on Hopkinson
CN110046356A (en) Label is embedded in the application study in the classification of microblogging text mood multi-tag
CN107894975A (en) A kind of segmenting method based on Bi LSTM
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN110134950A (en) A kind of text auto-collation that words combines
Han et al. MAF‐CNER: A Chinese Named Entity Recognition Model Based on Multifeature Adaptive Fusion
CN112434514A (en) Multi-granularity multi-channel neural network based semantic matching method and device and computer equipment
CN113536799B (en) Medical named entity recognition modeling method based on fusion attention
CN114564953A (en) Emotion target extraction model based on multiple word embedding fusion and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant