CN112232090A - Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM - Google Patents

Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM Download PDF

Info

Publication number
CN112232090A
CN112232090A CN202010978713.0A CN202010978713A CN112232090A CN 112232090 A CN112232090 A CN 112232090A CN 202010978713 A CN202010978713 A CN 202010978713A CN 112232090 A CN112232090 A CN 112232090A
Authority
CN
China
Prior art keywords
chinese
tree
parallel
crossing
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010978713.0A
Other languages
Chinese (zh)
Inventor
高盛祥
张迎晨
余正涛
朱浩东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202010978713.0A priority Critical patent/CN112232090A/en
Publication of CN112232090A publication Critical patent/CN112232090A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a Hanyue parallel sentence pair extraction method fusing a syntactic structure and Tree-LSTM. The method comprises the steps of pre-training a Chinese-crossing bilingual word vector, mapping the Chinese-crossing bilingual word vector to the same semantic space, considering that the Chinese-crossing sentence structure has difference, converting a sentence sequence structure into a dependency Tree structure through a dependency syntax Tree, capturing syntax structure information of sentences through Tree-LSTM, splicing part-of-speech information of the Chinese-crossing bilingual sentences into sentence semantic vectors as characteristic vectors, and inputting the vectors into a full connection layer to train the Chinese-crossing parallel sentence pair classifier. The invention utilizes a deep learning method to automatically learn sentence expression rules in a large amount of data, and solves the problem that a large amount of human resources are consumed for extracting tasks to design characteristics in the traditional parallel sentences. Meanwhile, the method considers and solves the problem that the structural difference characteristics of the Hanyue language influence the performance of the extracted model, and improves the accuracy of the extracted model by the parallel sentences.

Description

Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM
Technical Field
The invention relates to a Chinese-crossing parallel sentence pair extraction method fusing a syntactic structure and Tree-LSTM, belonging to the technical field of natural language processing.
Background
Parallel corpora are an important resource for developing machine translation research, in recent years, communication and cooperation of middle and more countries become more and more intimate, machine translation is becoming an important tool for getting through various cooperation of the middle and more countries, and the parallel corpora have very considerable application prospects. Vietnamese is a typical scarce language, more parallel sentences of Chinese are rare to data, and more parallel sentences of Chinese are generated in a large number of comparable Chinese corpus by using a parallel sentence pair extraction technology, so that the problem of data sparsity in a machine translation task of Chinese can be solved. The traditional parallel sentence pair extraction method rarely considers the syntactic structure information of sentences, so the Chinese-crossing parallel sentence extraction method fusing the syntactic structure and Tree-LSTM has important significance in extracting the Chinese-crossing parallel sentence pair from the Chinese-crossing comparable corpus in Wikipedia.
Disclosure of Invention
The invention provides a Hanyue parallel sentence pair extraction method fusing a syntactic structure and Tree-LSTM, which solves the problem that a great amount of human resources are consumed for designing characteristics in the traditional parallel sentence pair extraction task, and simultaneously considers and solves the problem that the structural difference characteristic of Hanyue language influences the performance of an extraction model; the method utilizes deep learning to express sentence semantics, and considers the difference characteristic of the Chinese-crossing syntactic structure at the same time so as to improve the accuracy of the Chinese-crossing parallel sentences on the extraction model.
The technical scheme of the invention is as follows: a Hanyue parallel sentence pair extraction method fusing a syntactic structure and Tree-LSTM comprises the following steps:
step1, collecting Chinese-crossing parallel linguistic data used for training a parallel sentence pair extraction model and Chinese-crossing comparable linguistic data of wiki encyclopedia as an extraction source, and dividing the collected Chinese-crossing parallel linguistic data into training linguistic data and testing linguistic data; using Scapy as a crawler tool, simulating user operation, crawling Chinese-Yue parallel sentence pairs according to an Xpath path of page data elements, downloading Chinese and Dump data sets under Wikipedia at the same time, wherein the data contains all Wikipedia data of Chinese, and extracting Chinese-Yue comparable corpus according to ID alignment;
step2, training a Chinese-word-crossing vector by using the Chinese-word-crossing monolingual corpus, and training a Chinese-word-crossing bilingual word vector by using a bilingual dictionary;
step2.1, respectively training the Chinese-Yuetui single-language word vectors by using fastText for the Chinese-Yuetui single-language corpus;
step2.2, segmenting the collected bilingual corpus, and constructing a Chinese-Yue bilingual dictionary as a Chinese-Yue bilingual word vector training task label;
step2.3, training the bilingual word vectors in Hanyue using MUSE.
The bilingual word vector training enables the word vectors of more and more synonyms of Chinese to be close in a bilingual semantic space, and meanwhile, the distance of the word vectors in the bilingual semantic space of more and more synonyms of Chinese is not changed, the invention uses MUSE to train the bilingual word vectors of more and more Chinese, as shown in a formula:
argmin∑i‖Xi*W-Yi*2 (1)
step3, converting the sentence sequence of Hanyue into a Tree sequence of a dependency Tree structure by using a dependency syntax Tree model, and taking the Tree sequence as the input of the Tree-LSTM model;
in the Step3, Chinese and Vietnamese languages are isolated languages in the Step3, and grammatical means mainly adopts the use of word orders and imaginary words. The word sequences of the Chinese language and the Vietnamese language are similar, the main stem components of the two languages have consistent word sequences and are both of a main-predicate-guest (SVO) structure, the grammatical difference of the two languages is mainly reflected in that the sequence of modified words (fixed language and shape language) of the Vietnamese language is different from that of the Chinese language, and the modified words of the Chinese languageThe order is a biased structure, i.e. modifiers are in front of the core word, and the multilayer modification is the same, for example, Chinese "she is the most beautiful girl that I have seen. "Vietnamese is a forward biased structure, i.e. the modifiers are behind the core word, the same for the multi-layer modification, e.g. Vietnamese"
Figure BDA0002686774840000021
l. a (she is)
Figure BDA0002686774840000022
g a (girl) xinh
Figure BDA0002686774840000023
(the most beautiful) m-a
Figure BDA0002686774840000024
(I have seen). ". In order to extract The information of The Chinese Vietnamese method, a Chinese dependency syntax analysis tool provided by Stanford university is used for carrying out syntax analysis on Chinese to generate a syntax dependency tree, and The Vietnamese dependency treebank VnDT tool of Vietnamese is used for carrying out syntax analysis on Vietnamese to generate The syntax dependency tree; the generated syntax dependency Tree is used as an input of the Tree-LSTM model.
Step4, marking the part of speech of each word in the Chinese-character-crossing parallel sentence pair in the Chinese-character-crossing parallel corpus, converting the part of speech into a vector, and splicing the vector into a sentence semantic vector;
and Step5, performing element product and element difference capture difference on the final Hanyu semantic vector, and further inputting the Hanyu semantic vector into a full-connection layer for supervised training.
As a further scheme of the present invention, in Step4, part-of-speech tagging is performed on each word in the chinese-to-chinese parallel sentence pair in the chinese-to-chinese parallel corpus, and then the part-of-speech is converted into a vector, which is spliced into a sentence semantic vector to generate an input vector of the final extraction model.
In Step5, the sentence semantic vectors containing part-of-speech and syntactic structure information obtained finally are captured by element product and absolute element difference to capture their matching information, and the translation probability of sentences is calculated by using a full-connected layer, thereby performing supervised training.
The invention has the beneficial effects that:
1. when the method is used for coding sentences by using the Tree-LSTM model fused with sentence structure information, the effect is obviously superior to that of the Bi-LSTM based model, and the part-of-speech information is used as auxiliary information to improve the overall model performance;
2. the method utilizes a deep learning method to automatically learn sentence expression rules in a large amount of data, and solves the problem that a large amount of human resources are consumed for extracting tasks to design features in the traditional parallel sentences; meanwhile, the method considers and solves the problem that the structural difference characteristics of the Hanyue language influence the performance of the extracted model, and improves the accuracy of the extracted model by the parallel sentences. Experiments show that the method is superior to a baseline model in three indexes of accuracy, recall rate and F value.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is an illustration of the present invention converting sequences into tree structures.
Detailed Description
Example 1: as shown in fig. 1-2, a method for extracting hanyue parallel sentence pair fusing syntax structure and Tree-LSTM includes:
step1, collecting Chinese-crossing parallel linguistic data used for training a parallel sentence pair extraction model and Chinese-crossing comparable linguistic data of wiki encyclopedia as an extraction source, and dividing the collected Chinese-crossing parallel linguistic data into training linguistic data and testing linguistic data; using Scapy as a crawler tool, simulating user operation, crawling Chinese-Yue parallel sentence pairs according to an Xpath path of page data elements, downloading Chinese and Dump data sets under Wikipedia at the same time, wherein the data contains all Wikipedia data of Chinese, and extracting Chinese-Yue comparable corpus according to ID alignment;
step2, performing monolingual word vector training on the monolingual corpus of the Chinese and the more monolingual corpora by using fastText, acquiring a Chinese and more bilingual dictionary, supervising the training of the Chinese and more bilingual word vector, and performing training by using MUSE, wherein the formula is as follows:
argmin∑i‖Xi*W-Yi*2 (1)
step3, as shown in fig. 2, using The chinese dependency syntax analysis tool provided by stanford university and The Vietnamese dependency treebank VnDT tool, both chinese-to-chinese and Vietnamese parallel sentence pairs in chinese-to-chinese parallel corpus are converted into chinese dependency trees and Vietnamese dependency trees, and fig. 2 mainly converts The sequential structure of sentences into Tree structures as The input of Tree-LSTM model.
The Tree-LSTM:
Figure BDA0002686774840000031
Figure BDA0002686774840000032
fjk=σ(Wfxj+Ufhk+bf) (4)
Figure BDA0002686774840000033
Figure BDA0002686774840000041
cj=it*uj+∑k∈C(j)fjk*ck (7)
hj=oj*tanh(cj) (8)
step4, performing part-of-speech tagging on each word in the Chinese-character-crossing parallel sentence pair in the Chinese-character-crossing parallel corpus, converting each part-of-speech into a vector, and splicing the vector into a sentence semantic vector.
Step5, and obtaining the final vector containing the information of part of speech and syntactic structure
Figure BDA0002686774840000042
Their matching information is captured by element product and absolute element difference, and the translation probability of sentences to each other is calculated using the fully-connected layer.
Figure BDA0002686774840000043
Figure BDA0002686774840000044
Figure BDA0002686774840000045
p(yj|cj)=σ(Wchj+c) (20)
Figure BDA0002686774840000046
Step6, comparative experiment was conducted using the accuracy (P), recall (R), and F-value (F) as evaluation indexes.
Figure BDA0002686774840000047
Figure BDA0002686774840000048
Figure BDA0002686774840000049
In order to verify the effectiveness of the model, the following experiment is set, and in order to verify the influence of the pre-trained bilingual word vector model on the extraction method, a baseline model is directly used as a test, as shown in the following table:
table 1: effect of Pre-trained bilingual word vectors on the baseline model
Figure BDA00026867748400000410
From the above table, it can be seen that the pre-training word direction makes the parts of speech identical and close to each other on the basis of not changing the original word vector, and the finally obtained classification structures are all promoted in a small range.
Meanwhile, on the basis of pre-training word vectors, in order to research the characteristics of different language differences, a Bi-LSTM-based model is improved into a dependency syntax Tree-based Tree-LSTM model, and the experimental results are shown in the following table 2.
Table 2: influence of syntactic structure information and part-of-speech information on model
Figure BDA0002686774840000051
According to experimental results, when the set threshold value rho is 0.7, the model has higher confusion degree in the judgment process, and when the threshold value rho is 0.9, the three evaluation indexes are obviously improved, and meanwhile, when a sentence is coded by using the Tree-LSTM model fused with sentence structure information, the effect is obviously superior to that of the Bi-LSTM based model, and finally, the part-of-speech information can be seen as auxiliary information to improve the overall model performance.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (6)

1. The Hanyue parallel sentence pair extraction method fusing the syntactic structure and the Tree-LSTM is characterized by comprising the following steps of: the method comprises the following steps:
step1, collecting Chinese-crossing parallel linguistic data used for training a parallel sentence pair extraction model and Chinese-crossing comparable linguistic data of wiki encyclopedia as an extraction source, and dividing the collected Chinese-crossing parallel linguistic data into training linguistic data and testing linguistic data;
step2, training a Chinese-word-crossing vector by using the Chinese-word-crossing monolingual corpus, and training a Chinese-word-crossing bilingual word vector by using a bilingual dictionary;
step3, converting the sentence sequence of Hanyue into a Tree sequence of a dependency Tree structure by using a dependency syntax Tree model, and taking the Tree sequence as the input of the Tree-LSTM model;
step4, labeling the part of speech of each word in the Chinese-crossing parallel sentence pair in the Chinese-crossing parallel corpus, converting the part of speech into a vector, and splicing the vector into a sentence semantic vector;
and Step5, performing element product and element difference capture difference on the final Hanyu semantic vector, and further inputting the Hanyu semantic vector into a full-connection layer for supervised training.
2. The method for extracting pairs of hanyue parallel sentences fusing syntax structure and Tree-LSTM according to claim 1, characterized in that: in Step1, Scapy is used as a crawler tool, user operation is simulated, Chinese-Yue parallel sentence pairs are crawled according to an Xpath path of page data elements, Chinese and Dump data sets under Wikipedia are downloaded at the same time, all Wikipedia data are contained in the data, and Chinese-Yue comparable linguistic data can be extracted according to ID alignment.
3. The method for extracting pairs of hanyue parallel sentences fusing syntax structure and Tree-LSTM according to claim 1, characterized in that: the specific steps of Step2 are as follows:
step2.1, respectively training the Chinese-Yuetui single-language word vectors by using fastText for the Chinese-Yuetui single-language corpus;
step2.2, segmenting the collected bilingual corpus, and constructing a Chinese-Yue bilingual dictionary as a Chinese-Yue bilingual word vector training task label;
step2.3, training the bilingual word vectors in Hanyue using MUSE.
4. The method for extracting pairs of hanyue parallel sentences fusing syntax structure and Tree-LSTM according to claim 1, characterized in that: in Step3, in order to extract The information of The Chinese Vietnamese, a Chinese dependency syntax analysis tool provided by Stanford university is used for carrying out syntax analysis on Chinese to generate a syntax dependency tree, and The Vietnamese dependency tree is generated by using The Vietnamese dependency treebank VnDT tool of Vietnamese; the generated syntax dependency Tree is used as an input of the Tree-LSTM model.
5. The method for extracting pairs of hanyue parallel sentences fusing syntax structure and Tree-LSTM according to claim 1, characterized in that: in Step4, respectively labeling part of speech of each word in the Chinese-to-parallel sentence pair in the Chinese and the more-to-parallel corpus, then converting the part of speech into vectors, splicing the vectors into the semantic vectors of sentences, and generating the input vectors of the final extraction model.
6. The method for extracting pairs of hanyue parallel sentences fusing syntax structure and Tree-LSTM according to claim 1, characterized in that: in Step5, the finally obtained sentence semantic vectors containing part of speech and syntactic structure information are captured by element product and absolute element difference to capture the matching information of the sentence semantic vectors, and the translation probability of the sentences is calculated by using a full-connection layer, so that supervised training is performed.
CN202010978713.0A 2020-09-17 2020-09-17 Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM Pending CN112232090A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010978713.0A CN112232090A (en) 2020-09-17 2020-09-17 Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010978713.0A CN112232090A (en) 2020-09-17 2020-09-17 Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM

Publications (1)

Publication Number Publication Date
CN112232090A true CN112232090A (en) 2021-01-15

Family

ID=74107018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010978713.0A Pending CN112232090A (en) 2020-09-17 2020-09-17 Chinese-crossing parallel sentence pair extraction method fusing syntactic structure and Tree-LSTM

Country Status (1)

Country Link
CN (1) CN112232090A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118095302A (en) * 2024-04-26 2024-05-28 四川交通运输职业学校 Auxiliary translation method and system based on computer

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628829A (en) * 2018-04-23 2018-10-09 苏州大学 Automatic treebank method for transformation based on tree-like Recognition with Recurrent Neural Network and system
CN109783809A (en) * 2018-12-22 2019-05-21 昆明理工大学 A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus
CN110362820A (en) * 2019-06-17 2019-10-22 昆明理工大学 A kind of bilingual parallel sentence extraction method of old man based on Bi-LSTM algorithm
CN110377918A (en) * 2019-07-15 2019-10-25 昆明理工大学 Merge the more neural machine translation method of the Chinese-of syntax analytic tree
CN110414009A (en) * 2019-07-09 2019-11-05 昆明理工大学 The remote bilingual parallel sentence pairs abstracting method of English based on BiLSTM-CNN and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628829A (en) * 2018-04-23 2018-10-09 苏州大学 Automatic treebank method for transformation based on tree-like Recognition with Recurrent Neural Network and system
CN109783809A (en) * 2018-12-22 2019-05-21 昆明理工大学 A method of alignment sentence is extracted from Laos-Chinese chapter grade alignment corpus
CN110362820A (en) * 2019-06-17 2019-10-22 昆明理工大学 A kind of bilingual parallel sentence extraction method of old man based on Bi-LSTM algorithm
CN110414009A (en) * 2019-07-09 2019-11-05 昆明理工大学 The remote bilingual parallel sentence pairs abstracting method of English based on BiLSTM-CNN and device
CN110377918A (en) * 2019-07-15 2019-10-25 昆明理工大学 Merge the more neural machine translation method of the Chinese-of syntax analytic tree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨通胜: "基于多分支树的学术论文神经机器翻译研究", 《中国优秀博硕士论文全文数据库(硕士)哲学与人文科学辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118095302A (en) * 2024-04-26 2024-05-28 四川交通运输职业学校 Auxiliary translation method and system based on computer

Similar Documents

Publication Publication Date Title
Cao et al. cw2vec: Learning chinese word embeddings with stroke n-gram information
CN107168945B (en) Bidirectional cyclic neural network fine-grained opinion mining method integrating multiple features
CN108614875B (en) Chinese emotion tendency classification method based on global average pooling convolutional neural network
Al-Muzaini et al. Automatic Arabic image captioning using RNN-LSTM-based language model and CNN
CN109344391A (en) Multiple features fusion Chinese newsletter archive abstraction generating method neural network based
CN109213995A (en) A kind of across language text similarity assessment technology based on the insertion of bilingual word
CN110287323B (en) Target-oriented emotion classification method
CN110414009B (en) Burma bilingual parallel sentence pair extraction method and device based on BilSTM-CNN
CN108628828A (en) A kind of joint abstracting method of viewpoint and its holder based on from attention
CN112069408A (en) Recommendation system and method for fusion relation extraction
CN110717341B (en) Method and device for constructing old-Chinese bilingual corpus with Thai as pivot
Zhao et al. ZYJ123@ DravidianLangTech-EACL2021: Offensive language identification based on XLM-RoBERTa with DPCNN
Meetei et al. WAT2019: English-Hindi translation on Hindi visual genome dataset
CN109597988A (en) The former prediction technique of vocabulary justice, device and electronic equipment across language
CN110427616A (en) A kind of text emotion analysis method based on deep learning
Natarajan et al. Sentence2SignGesture: a hybrid neural machine translation network for sign language video generation
Xian et al. Self-guiding multimodal LSTM—when we do not have a perfect training dataset for image captioning
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN112101014A (en) Chinese chemical industry document word segmentation method based on mixed feature fusion
CN112163089A (en) Military high-technology text classification method and system fusing named entity recognition
CN110096713A (en) A kind of Laotian organization names recognition methods based on SVM-BiLSTM-CRF
CN110502759B (en) Method for processing Chinese-Yue hybrid network neural machine translation out-of-set words fused into classification dictionary
CN106897274B (en) Cross-language comment replying method
Zhang et al. Word sense disambiguation with knowledge-enhanced and local self-attention-based extractive sense comprehension
Li et al. Visual sentiment analysis based on image caption and adjective–noun–pair description

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210115

RJ01 Rejection of invention patent application after publication