CN112232090A - Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM - Google Patents
Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM Download PDFInfo
- Publication number
- CN112232090A CN112232090A CN202010978713.0A CN202010978713A CN112232090A CN 112232090 A CN112232090 A CN 112232090A CN 202010978713 A CN202010978713 A CN 202010978713A CN 112232090 A CN112232090 A CN 112232090A
- Authority
- CN
- China
- Prior art keywords
- chinese
- tree
- parallel
- crossing
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 22
- 239000013598 vector Substances 0.000 claims abstract description 49
- 238000012549 training Methods 0.000 claims abstract description 27
- 238000000034 method Methods 0.000 claims abstract description 18
- 238000013519 translation Methods 0.000 claims description 6
- 229940028444 muse Drugs 0.000 claims description 4
- GMVPRGQOIOIIMI-DWKJAMRDSA-N prostaglandin E1 Chemical compound CCCCC[C@H](O)\C=C\[C@H]1[C@H](O)CC(=O)[C@@H]1CCCCCCC(O)=O GMVPRGQOIOIIMI-DWKJAMRDSA-N 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims 2
- 238000013135 deep learning Methods 0.000 abstract description 3
- 238000013461 design Methods 0.000 abstract description 2
- 238000013507 mapping Methods 0.000 abstract 1
- 230000000694 effects Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000009193 crawling Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000003607 modifier Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a Hanyue parallel sentence pair extraction method fusing a syntactic structure and Tree-LSTM. The method comprises the steps of pre-training a Chinese-crossing bilingual word vector, mapping the Chinese-crossing bilingual word vector to the same semantic space, considering that the Chinese-crossing sentence structure has difference, converting a sentence sequence structure into a dependency Tree structure through a dependency syntax Tree, capturing syntax structure information of sentences through Tree-LSTM, splicing part-of-speech information of the Chinese-crossing bilingual sentences into sentence semantic vectors as characteristic vectors, and inputting the vectors into a full connection layer to train the Chinese-crossing parallel sentence pair classifier. The invention utilizes a deep learning method to automatically learn sentence expression rules in a large amount of data, and solves the problem that a large amount of human resources are consumed for extracting tasks to design characteristics in the traditional parallel sentences. Meanwhile, the method considers and solves the problem that the structural difference characteristics of the Hanyue language influence the performance of the extracted model, and improves the accuracy of the extracted model by the parallel sentences.
Description
Technical Field
The invention relates to a Chinese-crossing parallel sentence pair extraction method fusing a syntactic structure and Tree-LSTM, belonging to the technical field of natural language processing.
Background
Parallel corpora are an important resource for developing machine translation research, in recent years, communication and cooperation of middle and more countries become more and more intimate, machine translation is becoming an important tool for getting through various cooperation of the middle and more countries, and the parallel corpora have very considerable application prospects. Vietnamese is a typical scarce language, more parallel sentences of Chinese are rare to data, and more parallel sentences of Chinese are generated in a large number of comparable Chinese corpus by using a parallel sentence pair extraction technology, so that the problem of data sparsity in a machine translation task of Chinese can be solved. The traditional parallel sentence pair extraction method rarely considers the syntactic structure information of sentences, so the Chinese-crossing parallel sentence extraction method fusing the syntactic structure and Tree-LSTM has important significance in extracting the Chinese-crossing parallel sentence pair from the Chinese-crossing comparable corpus in Wikipedia.
Disclosure of Invention
The invention provides a Hanyue parallel sentence pair extraction method fusing a syntactic structure and Tree-LSTM, which solves the problem that a great amount of human resources are consumed for designing characteristics in the traditional parallel sentence pair extraction task, and simultaneously considers and solves the problem that the structural difference characteristic of Hanyue language influences the performance of an extraction model; the method utilizes deep learning to express sentence semantics, and considers the difference characteristic of the Chinese-crossing syntactic structure at the same time so as to improve the accuracy of the Chinese-crossing parallel sentences on the extraction model.
The technical scheme of the invention is as follows: a Hanyue parallel sentence pair extraction method fusing a syntactic structure and Tree-LSTM comprises the following steps:
step1, collecting Chinese-crossing parallel linguistic data used for training a parallel sentence pair extraction model and Chinese-crossing comparable linguistic data of wiki encyclopedia as an extraction source, and dividing the collected Chinese-crossing parallel linguistic data into training linguistic data and testing linguistic data; using Scapy as a crawler tool, simulating user operation, crawling Chinese-Yue parallel sentence pairs according to an Xpath path of page data elements, downloading Chinese and Dump data sets under Wikipedia at the same time, wherein the data contains all Wikipedia data of Chinese, and extracting Chinese-Yue comparable corpus according to ID alignment;
step2, training a Chinese-word-crossing vector by using the Chinese-word-crossing monolingual corpus, and training a Chinese-word-crossing bilingual word vector by using a bilingual dictionary;
step2.1, respectively training the Chinese-Yuetui single-language word vectors by using fastText for the Chinese-Yuetui single-language corpus;
step2.2, segmenting the collected bilingual corpus, and constructing a Chinese-Yue bilingual dictionary as a Chinese-Yue bilingual word vector training task label;
step2.3, training the bilingual word vectors in Hanyue using MUSE.
The bilingual word vector training enables the word vectors of more and more synonyms of Chinese to be close in a bilingual semantic space, and meanwhile, the distance of the word vectors in the bilingual semantic space of more and more synonyms of Chinese is not changed, the invention uses MUSE to train the bilingual word vectors of more and more Chinese, as shown in a formula:
argmin∑i‖Xi*W-Yi*‖2 (1)
step3, converting the sentence sequence of Hanyue into a Tree sequence of a dependency Tree structure by using a dependency syntax Tree model, and taking the Tree sequence as the input of the Tree-LSTM model;
in the Step3, Chinese and Vietnamese languages are isolated languages in the Step3, and grammatical means mainly adopts the use of word orders and imaginary words. The word sequences of the Chinese language and the Vietnamese language are similar, the main stem components of the two languages have consistent word sequences and are both of a main-predicate-guest (SVO) structure, the grammatical difference of the two languages is mainly reflected in that the sequence of modified words (fixed language and shape language) of the Vietnamese language is different from that of the Chinese language, and the modified words of the Chinese languageThe order is a biased structure, i.e. modifiers are in front of the core word, and the multilayer modification is the same, for example, Chinese "she is the most beautiful girl that I have seen. "Vietnamese is a forward biased structure, i.e. the modifiers are behind the core word, the same for the multi-layer modification, e.g. Vietnamese"l. a (she is)g a (girl) xinh(the most beautiful) m-a(I have seen). ". In order to extract The information of The Chinese Vietnamese method, a Chinese dependency syntax analysis tool provided by Stanford university is used for carrying out syntax analysis on Chinese to generate a syntax dependency tree, and The Vietnamese dependency treebank VnDT tool of Vietnamese is used for carrying out syntax analysis on Vietnamese to generate The syntax dependency tree; the generated syntax dependency Tree is used as an input of the Tree-LSTM model.
Step4, marking the part of speech of each word in the Chinese-character-crossing parallel sentence pair in the Chinese-character-crossing parallel corpus, converting the part of speech into a vector, and splicing the vector into a sentence semantic vector;
and Step5, performing element product and element difference capture difference on the final Hanyu semantic vector, and further inputting the Hanyu semantic vector into a full-connection layer for supervised training.
As a further scheme of the present invention, in Step4, part-of-speech tagging is performed on each word in the chinese-to-chinese parallel sentence pair in the chinese-to-chinese parallel corpus, and then the part-of-speech is converted into a vector, which is spliced into a sentence semantic vector to generate an input vector of the final extraction model.
In Step5, the sentence semantic vectors containing part-of-speech and syntactic structure information obtained finally are captured by element product and absolute element difference to capture their matching information, and the translation probability of sentences is calculated by using a full-connected layer, thereby performing supervised training.
The invention has the beneficial effects that:
1. when the method is used for coding sentences by using the Tree-LSTM model fused with sentence structure information, the effect is obviously superior to that of the Bi-LSTM based model, and the part-of-speech information is used as auxiliary information to improve the overall model performance;
2. the method utilizes a deep learning method to automatically learn sentence expression rules in a large amount of data, and solves the problem that a large amount of human resources are consumed for extracting tasks to design features in the traditional parallel sentences; meanwhile, the method considers and solves the problem that the structural difference characteristics of the Hanyue language influence the performance of the extracted model, and improves the accuracy of the extracted model by the parallel sentences. Experiments show that the method is superior to a baseline model in three indexes of accuracy, recall rate and F value.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is an illustration of the present invention converting sequences into tree structures.
Detailed Description
Example 1: as shown in fig. 1-2, a method for extracting hanyue parallel sentence pair fusing syntax structure and Tree-LSTM includes:
step1, collecting Chinese-crossing parallel linguistic data used for training a parallel sentence pair extraction model and Chinese-crossing comparable linguistic data of wiki encyclopedia as an extraction source, and dividing the collected Chinese-crossing parallel linguistic data into training linguistic data and testing linguistic data; using Scapy as a crawler tool, simulating user operation, crawling Chinese-Yue parallel sentence pairs according to an Xpath path of page data elements, downloading Chinese and Dump data sets under Wikipedia at the same time, wherein the data contains all Wikipedia data of Chinese, and extracting Chinese-Yue comparable corpus according to ID alignment;
step2, performing monolingual word vector training on the monolingual corpus of the Chinese and the more monolingual corpora by using fastText, acquiring a Chinese and more bilingual dictionary, supervising the training of the Chinese and more bilingual word vector, and performing training by using MUSE, wherein the formula is as follows:
argmin∑i‖Xi*W-Yi*‖2 (1)
step3, as shown in fig. 2, using The chinese dependency syntax analysis tool provided by stanford university and The Vietnamese dependency treebank VnDT tool, both chinese-to-chinese and Vietnamese parallel sentence pairs in chinese-to-chinese parallel corpus are converted into chinese dependency trees and Vietnamese dependency trees, and fig. 2 mainly converts The sequential structure of sentences into Tree structures as The input of Tree-LSTM model.
The Tree-LSTM:
fjk=σ(Wfxj+Ufhk+bf) (4)
cj=it*uj+∑k∈C(j)fjk*ck (7)
hj=oj*tanh(cj) (8)
step4, performing part-of-speech tagging on each word in the Chinese-character-crossing parallel sentence pair in the Chinese-character-crossing parallel corpus, converting each part-of-speech into a vector, and splicing the vector into a sentence semantic vector.
Step5, and obtaining the final vector containing the information of part of speech and syntactic structureTheir matching information is captured by element product and absolute element difference, and the translation probability of sentences to each other is calculated using the fully-connected layer.
p(yj|cj)=σ(Wchj+c) (20)
Step6, comparative experiment was conducted using the accuracy (P), recall (R), and F-value (F) as evaluation indexes.
In order to verify the effectiveness of the model, the following experiment is set, and in order to verify the influence of the pre-trained bilingual word vector model on the extraction method, a baseline model is directly used as a test, as shown in the following table:
table 1: effect of Pre-trained bilingual word vectors on the baseline model
From the above table, it can be seen that the pre-training word direction makes the parts of speech identical and close to each other on the basis of not changing the original word vector, and the finally obtained classification structures are all promoted in a small range.
Meanwhile, on the basis of pre-training word vectors, in order to research the characteristics of different language differences, a Bi-LSTM-based model is improved into a dependency syntax Tree-based Tree-LSTM model, and the experimental results are shown in the following table 2.
Table 2: influence of syntactic structure information and part-of-speech information on model
According to experimental results, when the set threshold value rho is 0.7, the model has higher confusion degree in the judgment process, and when the threshold value rho is 0.9, the three evaluation indexes are obviously improved, and meanwhile, when a sentence is coded by using the Tree-LSTM model fused with sentence structure information, the effect is obviously superior to that of the Bi-LSTM based model, and finally, the part-of-speech information can be seen as auxiliary information to improve the overall model performance.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (6)
1. The Hanyue parallel sentence pair extraction method fusing the syntactic structure and the Tree-LSTM is characterized by comprising the following steps of: the method comprises the following steps:
step1, collecting Chinese-crossing parallel linguistic data used for training a parallel sentence pair extraction model and Chinese-crossing comparable linguistic data of wiki encyclopedia as an extraction source, and dividing the collected Chinese-crossing parallel linguistic data into training linguistic data and testing linguistic data;
step2, training a Chinese-word-crossing vector by using the Chinese-word-crossing monolingual corpus, and training a Chinese-word-crossing bilingual word vector by using a bilingual dictionary;
step3, converting the sentence sequence of Hanyue into a Tree sequence of a dependency Tree structure by using a dependency syntax Tree model, and taking the Tree sequence as the input of the Tree-LSTM model;
step4, labeling the part of speech of each word in the Chinese-crossing parallel sentence pair in the Chinese-crossing parallel corpus, converting the part of speech into a vector, and splicing the vector into a sentence semantic vector;
and Step5, performing element product and element difference capture difference on the final Hanyu semantic vector, and further inputting the Hanyu semantic vector into a full-connection layer for supervised training.
2. The method for extracting pairs of hanyue parallel sentences fusing syntax structure and Tree-LSTM according to claim 1, characterized in that: in Step1, Scapy is used as a crawler tool, user operation is simulated, Chinese-Yue parallel sentence pairs are crawled according to an Xpath path of page data elements, Chinese and Dump data sets under Wikipedia are downloaded at the same time, all Wikipedia data are contained in the data, and Chinese-Yue comparable linguistic data can be extracted according to ID alignment.
3. The method for extracting pairs of hanyue parallel sentences fusing syntax structure and Tree-LSTM according to claim 1, characterized in that: the specific steps of Step2 are as follows:
step2.1, respectively training the Chinese-Yuetui single-language word vectors by using fastText for the Chinese-Yuetui single-language corpus;
step2.2, segmenting the collected bilingual corpus, and constructing a Chinese-Yue bilingual dictionary as a Chinese-Yue bilingual word vector training task label;
step2.3, training the bilingual word vectors in Hanyue using MUSE.
4. The method for extracting pairs of hanyue parallel sentences fusing syntax structure and Tree-LSTM according to claim 1, characterized in that: in Step3, in order to extract The information of The Chinese Vietnamese, a Chinese dependency syntax analysis tool provided by Stanford university is used for carrying out syntax analysis on Chinese to generate a syntax dependency tree, and The Vietnamese dependency tree is generated by using The Vietnamese dependency treebank VnDT tool of Vietnamese; the generated syntax dependency Tree is used as an input of the Tree-LSTM model.
5. The method for extracting pairs of hanyue parallel sentences fusing syntax structure and Tree-LSTM according to claim 1, characterized in that: in Step4, respectively labeling part of speech of each word in the Chinese-to-parallel sentence pair in the Chinese and the more-to-parallel corpus, then converting the part of speech into vectors, splicing the vectors into the semantic vectors of sentences, and generating the input vectors of the final extraction model.
6. The method for extracting pairs of hanyue parallel sentences fusing syntax structure and Tree-LSTM according to claim 1, characterized in that: in Step5, the finally obtained sentence semantic vectors containing part of speech and syntactic structure information are captured by element product and absolute element difference to capture the matching information of the sentence semantic vectors, and the translation probability of the sentences is calculated by using a full-connection layer, so that supervised training is performed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010978713.0A CN112232090A (en) | 2020-09-17 | 2020-09-17 | Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010978713.0A CN112232090A (en) | 2020-09-17 | 2020-09-17 | Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112232090A true CN112232090A (en) | 2021-01-15 |
Family
ID=74107018
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010978713.0A Pending CN112232090A (en) | 2020-09-17 | 2020-09-17 | Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112232090A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118095302A (en) * | 2024-04-26 | 2024-05-28 | 四川交通运输职业学校 | Auxiliary translation method and system based on computer |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108628829A (en) * | 2018-04-23 | 2018-10-09 | 苏州大学 | Automatic treebank method for transformation based on tree-like Recognition with Recurrent Neural Network and system |
CN109783809A (en) * | 2018-12-22 | 2019-05-21 | 昆明理工大学 | A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus |
CN110362820A (en) * | 2019-06-17 | 2019-10-22 | 昆明理工大学 | A kind of bilingual parallel sentence extraction method of old man based on Bi-LSTM algorithm |
CN110377918A (en) * | 2019-07-15 | 2019-10-25 | 昆明理工大学 | Merge the more neural machine translation method of the Chinese-of syntax analytic tree |
CN110414009A (en) * | 2019-07-09 | 2019-11-05 | 昆明理工大学 | The remote bilingual parallel sentence pairs abstracting method of English based on BiLSTM-CNN and device |
-
2020
- 2020-09-17 CN CN202010978713.0A patent/CN112232090A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108628829A (en) * | 2018-04-23 | 2018-10-09 | 苏州大学 | Automatic treebank method for transformation based on tree-like Recognition with Recurrent Neural Network and system |
CN109783809A (en) * | 2018-12-22 | 2019-05-21 | 昆明理工大学 | A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus |
CN110362820A (en) * | 2019-06-17 | 2019-10-22 | 昆明理工大学 | A kind of bilingual parallel sentence extraction method of old man based on Bi-LSTM algorithm |
CN110414009A (en) * | 2019-07-09 | 2019-11-05 | 昆明理工大学 | The remote bilingual parallel sentence pairs abstracting method of English based on BiLSTM-CNN and device |
CN110377918A (en) * | 2019-07-15 | 2019-10-25 | 昆明理工大学 | Merge the more neural machine translation method of the Chinese-of syntax analytic tree |
Non-Patent Citations (1)
Title |
---|
杨通胜: "基于多分支树的学术论文神经机器翻译研究", 《中国优秀博硕士论文全文数据库(硕士)哲学与人文科学辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118095302A (en) * | 2024-04-26 | 2024-05-28 | 四川交通运输职业学校 | Auxiliary translation method and system based on computer |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cao et al. | cw2vec: Learning chinese word embeddings with stroke n-gram information | |
CN107168945B (en) | Bidirectional cyclic neural network fine-grained opinion mining method integrating multiple features | |
CN108614875B (en) | Chinese emotion tendency classification method based on global average pooling convolutional neural network | |
Al-Muzaini et al. | Automatic Arabic image captioning using RNN-LSTM-based language model and CNN | |
CN109344391A (en) | Multiple features fusion Chinese newsletter archive abstraction generating method neural network based | |
CN109213995A (en) | A kind of across language text similarity assessment technology based on the insertion of bilingual word | |
CN110287323B (en) | Target-oriented emotion classification method | |
CN110414009B (en) | Burma bilingual parallel sentence pair extraction method and device based on BilSTM-CNN | |
CN108628828A (en) | A kind of joint abstracting method of viewpoint and its holder based on from attention | |
CN112069408A (en) | Recommendation system and method for fusion relation extraction | |
CN110717341B (en) | Method and device for constructing old-Chinese bilingual corpus with Thai as pivot | |
Zhao et al. | ZYJ123@ DravidianLangTech-EACL2021: Offensive language identification based on XLM-RoBERTa with DPCNN | |
Meetei et al. | WAT2019: English-Hindi translation on Hindi visual genome dataset | |
CN109597988A (en) | The former prediction technique of vocabulary justice, device and electronic equipment across language | |
CN110427616A (en) | A kind of text emotion analysis method based on deep learning | |
Natarajan et al. | Sentence2SignGesture: a hybrid neural machine translation network for sign language video generation | |
Xian et al. | Self-guiding multimodal LSTM—when we do not have a perfect training dataset for image captioning | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN112101014A (en) | Chinese chemical industry document word segmentation method based on mixed feature fusion | |
CN112163089A (en) | Military high-technology text classification method and system fusing named entity recognition | |
CN110096713A (en) | A kind of Laotian organization names recognition methods based on SVM-BiLSTM-CRF | |
CN110502759B (en) | Method for processing Chinese-Yue hybrid network neural machine translation out-of-set words fused into classification dictionary | |
CN106897274B (en) | Cross-language comment replying method | |
Zhang et al. | Word sense disambiguation with knowledge-enhanced and local self-attention-based extractive sense comprehension | |
Li et al. | Visual sentiment analysis based on image caption and adjective–noun–pair description |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210115 |
|
RJ01 | Rejection of invention patent application after publication |