CN109885686A - A kind of multilingual file classification method merging subject information and BiLSTM-CNN - Google Patents

A kind of multilingual file classification method merging subject information and BiLSTM-CNN Download PDF

Info

Publication number
CN109885686A
CN109885686A CN201910127535.8A CN201910127535A CN109885686A CN 109885686 A CN109885686 A CN 109885686A CN 201910127535 A CN201910127535 A CN 201910127535A CN 109885686 A CN109885686 A CN 109885686A
Authority
CN
China
Prior art keywords
multilingual
text
languages
neural network
subject information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910127535.8A
Other languages
Chinese (zh)
Inventor
崔荣一
孟先艳
赵亚慧
易志伟
田明杰
徐凯斌
杨飞扬
王琪
黄政豪
金国哲
张振国
胡荣
王大千
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanbian University
Original Assignee
Yanbian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanbian University filed Critical Yanbian University
Priority to CN201910127535.8A priority Critical patent/CN109885686A/en
Publication of CN109885686A publication Critical patent/CN109885686A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The present invention relates to text classification technical fields in natural language processing, and in particular to a kind of multilingual file classification method for merging subject information and BiLSTM-CNN implements process are as follows: construct Parallel Corpus towards the multilingual parallel corpora of English in collecting first;Languages text each in corpus is pre-processed;Utilize the term vector of each languages of word embedded technology training;Each languages text subject vector is extracted using topic model;It builds and is suitable for multilingual neural network model, and merge subject information, carry out multilingual text representation.The method of text classification, solves aphasis, has very strong adaptability, is able to satisfy the demand of multilingual text classification, practical.

Description

A kind of multilingual file classification method merging subject information and BiLSTM-CNN
Technical field
The present invention relates to text classification technical fields in natural language processing, and in particular to a kind of fusion subject information and The multilingual file classification method of BiLSTM-CNN.
Background technique
With the rapid development of Internet, more and more internet datas exist in a text form, simultaneous International development, multilingual text data are more and more common.People are increasingly not content with the text under single language environment This information, the demand to multilingual text information are constantly promoted.People are urgent to be wished to from multilingual text data quickly Effectively find oneself required information.Research direction of the multilingual text classification as natural language processing is that solution is more The effective ways of languages text information development.
Multilingual text classification, objective are in the case where not needing manual intervention by existing automatic Text Categorization skill Art is expanded to multilingual by single languages.With the progress of globalization process, the research of multilingual text classification has obtained extensive pass Note and development, there are mainly four types of different methods at present.
Method based on dictionary.The strategy of bilingual dictionary is used, this method is simple and easy.For example, Olsson et al. is logical English Training document is translated into Czech document by the mode for crossing the bilingual dictionary of probability, to carry out across language text classification. But this method can not solve the problems, such as polysemy.
Method based on corpus.This method is divided into Parallel Corpus and comparison corpus, and Parallel Corpus refers to same Information is described with different language;Compare corpus and refer to that the information of same subject is described with different languages, In document be aligned according to discussed theme.But this method needs the highly developed and comprehensive corpus of covering, to reality The condition of testing causes significant limitation, is unfavorable for extending.
Method based on machine translation.By machine translation tools by the document translation of multiple languages at unified language mould Type is classified.This method is fairly simple, but depends critically upon the accuracy rate of machine translation, and efficiency is caused to reduce.
The method of word-based insertion.This method is by establishing the feature representation model based on deep learning, and training is multi-lingual Kind term vector.This method combination context accurately obtains semantic information, and feature is made to obtain specifically indicating very much.
One main difficulty of multilingual text classification is multilingual text representation, it is therefore proposed that a kind of new is multilingual Text representation and neural network model, solve the problems, such as languages.
Summary of the invention
For the problems mentioned above in the background art, the invention discloses a kind of fusion subject information and BiLSTM-CNN Multilingual file classification method, be able to solve languages problem.
A kind of multilingual file classification method merging subject information and BiLSTM-CNN, comprising the following steps:
1) Parallel Corpus is constructed towards the multilingual parallel corpora of English in collecting;
2) languages text each in corpus is pre-processed;
3) term vector of each languages of word embedded technology training is utilized;
4) each languages text subject vector is extracted using topic model;
5) it builds suitable for multilingual neural network model, and merges subject information, carry out multilingual text representation.
Preferably, when constructing multilingual Parallel Corpus in the step 1), towards 13 classes of three kinds of languages of English in collection Other scientific and technical literature abstract, the multilingual Parallel Corpus of content construction alignment;
Preferably, when handling each languages text in the step 2), detailed process is as follows:
S1: for Chinese corpus, building includes the scieintific and technical dictionary of biology, medicine, physics profession term, as The preference of participle is added in dictionary for word segmentation, optimizes Chinese word segmentation effect;
S2: extracting the stem of English word to English corpus, i.e., English word is reduced into its stem indicates;
S3: to needing to remove termination suffix and conjunction towards literary corpus, pronouns, general term for nouns, numerals and measure words and predicate are left;
Preferably, the term vector that each languages are trained in the step 3), is obtained using the CBOW model training of Word2vec The term vector that dimension is 220;
Preferably, the method that theme vector uses latent semantic analysis is extracted in the step 4), respectively to different language Text Feature Extraction its theme vector;
Preferably, it is built in the step 5) suitable for multilingual neural network model, the neural network model It is divided into three submodels, Chinese neural network model, English neural network model and Chao Wen neural network model, each submodel Neural network structure having the same, while the text of training different language obtains different model parameters, three submodels exist Finally cascade obtains complete neural network model, realizes multilingual text classification;
Preferably, in the step 5) Artificial Neural Network Structures of fusion subject information be divided into input layer, BiLSTM layers, CNN layers, full articulamentum and output layer.
The utility model has the advantages that
1) present invention efficiently solves the problems, such as languages, does not need by external resource, and each languages train alone the mind of oneself Through network model, each languages semantic information is accurately utilized, obtaining validity feature indicates, has certain versatility.
2) present invention indicates text using the feature that model group cooperation is text, obtains text time and two, space The text information of dimension, more accurately expresses text semantic.
3) present invention makes full use of subject information, is extracted the theme vector of each languages, in conjunction with text subject information and language Adopted information improves the accuracy of text modeling.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1: overall flow block diagram of the present invention;
Fig. 2: Text Pretreatment flow chart of the present invention;
Fig. 3: neural network model figure of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
The environment configurations of this example are as follows: Windows makees system, and CPU frequency 3.30GHZ inside saves as 16GB, programming language For Python, deep learning frame is Tensorflow, is completed at Integrated Development Environment PyCharm.
As shown in Figure 1, the specific implementation step of this algorithm are as follows:
Step 1: constructing Parallel Corpus towards the multilingual parallel corpora of English in collecting first;
Step 2: languages text each in corpus is pre-processed;
Step 3: utilizing the term vector of each languages of word embedded technology training;
Step 4: extracting each languages text subject vector using topic model;
Step 5: building suitable for multilingual neural network model, and merge subject information, carry out multilingual text table Show.
In above-mentioned steps 1, China and Britain are compiled towards scientific and technical literature summary texts, 32688 texts of every kind of languages are total 98064 texts are divided into 13 classifications, construct multilingual Parallel Corpus.
In above-mentioned steps 2, the text being collected into is pre-processed, due to the text information comprising three kinds of languages, so Point languages are pre-processed, and specific steps are as shown in Figure 2:
Step 2.1, for Chinese corpus, stop words is removed, is segmented, building includes the professional art such as biology, medicine, physics The scieintific and technical dictionary of language is added in dictionary for word segmentation as the preference of participle, optimizes Chinese word segmentation effect.
Step 2.2, to English corpus, by capitalization lower, and the stem of English word is extracted, i.e., will English word is reduced into the expression of its stem.
Step 2.3, it needs to remove termination suffix and conjunction etc. to towards literary corpus, leaves pronouns, general term for nouns, numerals and measure words and predicate.
By above step, the multilingual text in corpus is pre-processed, building experiment text set.
In step 3, each languages train alone its term vector, using the CBOW model of Word2vec, and have ignored in text Vocabulary of the word frequency less than 10.CBOW model is to predict centre word by the word of front and back, is three layers of processing model, is respectively as follows:
Input layer: Vu, wherein
Projection layer:
Output layer corresponds to a Huffman tree, and leaf node is the word in sample, and n omicronn-leaf child node is virtual nodes, and It is non-real to be assigned with space.
Its learning objective is to maximize log-likelihood function:
Wherein, w indicates any one word in corpus c, and when objective function obtains maximum value, corresponding term vector is just It is very good.
In step 4, using latent semantic analysis method, each languages extract alone its text subject vector, and steps are as follows:
Collection of document is analyzed, lexical item-document matrix is established;
Singular value decomposition is carried out to lexical item-document matrix;
To the matrix after singular value decomposition, its document-theme matrix is extracted, as the theme vector of each text.
In step 5, the neural network model for incorporating subject information is built, model is as shown in Figure 3.
By Fig. 3, it can be seen that, which is divided into three submodels, i.e., Chinese neural network model, English neural network mould Type and Chao Wen neural network model, each submodel neural network structure having the same, while the text of training different language Different model parameters is obtained, three submodels obtain complete neural network model, it can be achieved that multilingual text in last cascade This classification.
Artificial Neural Network Structures shown in Fig. 3 are divided into input layer, BiLSTM layers, CNN layers, full articulamentum, output layer etc.. Every layer of concrete meaning is as follows:
Input layer is spliced to form by term vector and theme vector:
Wherein w represents the term vector obtained by Word2vec training, and dimension is 220 dimensions, and θ representative is mentioned by latent semantic analysis The theme vector taken is equal with term vector dimension;
BiLSTM layers are two-way length memory networks in short-term, include two LSTM: forward directionWith it is backward BiLSTM layers of output is willOutputWithOutputCascade obtains the output O of t momentt, it may be assumed that
The number of hidden layer neuron is set as 150 in BiLSTM, and BiLSTM layers of effect is the word order that body takes text Information.
CNN layers are made of convolutional layer, normalization layer, active coating and pond layer.
The size of the convolution kernel of convolutional layer is 3,4,5, and convolution kernel number is 128.
It normalizes layer and uses Batch-Normalization, calculating process is as follows:
Layer functions are activated to select frelu, formula are as follows:
The pond stage using maximum pondization strategy, reduces the error that convolutional layer parameter error causes estimation mean shift, more More reservation local messages.
The result cascade that three kinds of languages are obtained after Processing with Neural Network, inputs to softmax function, carries out classification Prediction.
Parallel Corpus is divided into training set and test set by the method for ten folding cross validations, carries out experimental verification.
Cause Dropout machine in full articulamentum to prevent over-fitting with training set training neural network by above-mentioned steps System, ignores some neurons with certain probability, and Dropout value is 0.5 in this experiment, while introducing L2 regularization mechanism, Its principle are as follows:
c0Original loss function is represented,L2 regularization term, be by the quadratic sum of all parameter w, divided by The size n of training set is obtained.λ is exactly regularization coefficient.
Other parameter settings are as follows: batch-size takes 128, epoch to take 200, learning rate le-3.
The performance of the inventive method is verified with test set, evaluation index selects accuracy rate and cross entropy.
Accuracy rate, is defined as: for given test data set, the sample number and total number of samples that classifier is correctly classified The ratio between.
Cross entropy embodies the probability distribution and authentic specimen of model output as the common valuation functions of deep learning The similarity degree of probability distribution.Is defined as:
Wherein, y indicates authentic specimen value, and p indicates the class probability obtained through model prediction.
Embodiment one
The languages text set in multilingual text classification corpus that one embodiment selects step 1 to establish carries out The validity of experimental verification submodel.In the parameter setting of this example, embedding-size takes 220 dimensions, hidden layer neuron Number equally takes 150 dimensions, and theme number is set as 220, and batch-size takes 64 etc..The model of comparison is TextCNN: by one A convolutional layer, active coating, pond layer and full articulamentum are constituted, and demonstrate the text classification accuracy that submodel can improve.
Embodiment two
The present embodiment is basically the same as the first embodiment, and difference is:
The multilingual text corpus that the present embodiment selection step 1 is established, carries out multilingual text classification.Model is expanded It opens up to three kinds of languages, while each languages text of training, is cascaded in last neural net layer, this method can be accurately to multi-lingual Kind text is classified.
To sum up, this patent method can realize multilingual text classification, the multilingual neural network that this method training obtains Classify to also single languages, while solving language kind obstacle, improves the accuracy rate of multilingual text classification, and have Scalability.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to the foregoing embodiments Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation Technical solution documented by example is modified or equivalent replacement of some of the technical features;And these modification or Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims (7)

1. a kind of multilingual file classification method for merging subject information and BiLSTM-CNN, which is characterized in that including following step It is rapid:
1) Parallel Corpus is constructed towards the multilingual parallel corpora of English in collecting;
2) languages text each in corpus is pre-processed;
3) term vector of each languages of word embedded technology training is utilized;
4) each languages text subject vector is extracted using topic model;
5) it builds suitable for multilingual neural network model, and merges subject information, carry out multilingual text representation.
2. the multilingual file classification method of fusion subject information and BiLSTM-CNN according to claim 1, feature It is: when constructing multilingual Parallel Corpus in the step 1), towards the science and technology text of 13 classifications of three kinds of languages of English in collection Offer abstract, the multilingual Parallel Corpus of content construction alignment.
3. the multilingual file classification method of fusion subject information and BiLSTM-CNN according to claim 1, feature Be: when handling each languages text in the step 2), detailed process is as follows,
S1: for Chinese corpus, building includes the scieintific and technical dictionary of biology, medicine, physics profession term, as participle Preference be added in dictionary for word segmentation, optimize Chinese word segmentation effect;
S2: extracting the stem of English word to English corpus, i.e., English word is reduced into its stem indicates;
S3: to needing to remove termination suffix and conjunction towards literary corpus, pronouns, general term for nouns, numerals and measure words and predicate are left.
4. the multilingual file classification method of fusion subject information and BiLSTM-CNN according to claim 1, feature Be: the term vector of each languages of training in the step 3) uses the CBOW model training of Word2vec to obtain dimension as 220 Term vector.
5. the multilingual file classification method of fusion subject information and BiLSTM-CNN according to claim 1, feature It is: the method that theme vector uses latent semantic analysis is extracted in the step 4), respectively to the Text Feature Extraction of different language Its theme vector.
6. the multilingual file classification method of fusion subject information and BiLSTM-CNN according to claim 1, feature It is: is built in the step 5) suitable for multilingual neural network model, the neural network model is divided into three sons Model, Chinese neural network model, English neural network model and Chao Wen neural network model, each submodel are having the same Neural network structure, while the text of training different language obtains different model parameters, three submodels are cascaded finally To complete neural network model, multilingual text classification is realized.
7. the multilingual file classification method of fusion subject information and BiLSTM-CNN according to claim 1, feature Be: the Artificial Neural Network Structures of fusion subject information are divided into input layer, BiLSTM layers, CNN layers, Quan Lian in the step 5) Connect layer and output layer.
CN201910127535.8A 2019-02-20 2019-02-20 A kind of multilingual file classification method merging subject information and BiLSTM-CNN Pending CN109885686A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910127535.8A CN109885686A (en) 2019-02-20 2019-02-20 A kind of multilingual file classification method merging subject information and BiLSTM-CNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910127535.8A CN109885686A (en) 2019-02-20 2019-02-20 A kind of multilingual file classification method merging subject information and BiLSTM-CNN

Publications (1)

Publication Number Publication Date
CN109885686A true CN109885686A (en) 2019-06-14

Family

ID=66928567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910127535.8A Pending CN109885686A (en) 2019-02-20 2019-02-20 A kind of multilingual file classification method merging subject information and BiLSTM-CNN

Country Status (1)

Country Link
CN (1) CN109885686A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414009A (en) * 2019-07-09 2019-11-05 昆明理工大学 The remote bilingual parallel sentence pairs abstracting method of English based on BiLSTM-CNN and device
CN110717341A (en) * 2019-09-11 2020-01-21 昆明理工大学 Method and device for constructing old-Chinese bilingual corpus with Thai as pivot
CN110781690A (en) * 2019-10-31 2020-02-11 北京理工大学 Fusion and compression method of multi-source neural machine translation model
CN111191028A (en) * 2019-12-16 2020-05-22 浙江大搜车软件技术有限公司 Sample labeling method and device, computer equipment and storage medium
CN111666754A (en) * 2020-05-28 2020-09-15 平安医疗健康管理股份有限公司 Entity identification method and system based on electronic disease text and computer equipment
CN111797607A (en) * 2020-06-04 2020-10-20 语联网(武汉)信息技术有限公司 Sparse noun alignment method and system
CN111984762A (en) * 2020-08-05 2020-11-24 中国科学院重庆绿色智能技术研究院 Text classification method sensitive to attack resistance
CN112052750A (en) * 2020-08-20 2020-12-08 南京信息工程大学 Arrhythmia classification method based on class imbalance sensing data and depth model
CN112612889A (en) * 2020-12-28 2021-04-06 中科院计算技术研究所大数据研究院 Multilingual document classification method and device and storage medium
CN112685374A (en) * 2019-10-17 2021-04-20 中国移动通信集团浙江有限公司 Log classification method and device and electronic equipment
CN112765996A (en) * 2021-01-19 2021-05-07 延边大学 Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation
CN113076467A (en) * 2021-03-26 2021-07-06 昆明理工大学 Chinese-crossing news topic discovery method based on cross-language neural topic model
CN114492401A (en) * 2022-01-24 2022-05-13 重庆工业职业技术学院 Working method for extracting English vocabulary based on big data
CN115017921A (en) * 2022-03-10 2022-09-06 延边大学 Chinese-oriented neural machine translation method based on multi-granularity characterization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528536A (en) * 2016-11-14 2017-03-22 北京赛思信安技术股份有限公司 Multilingual word segmentation method based on dictionaries and grammar analysis
CN107562729A (en) * 2017-09-14 2018-01-09 云南大学 The Party building document representation method strengthened based on neutral net and theme
CN107894980A (en) * 2017-12-06 2018-04-10 陈件 A kind of multiple statement is to corpus of text sorting technique and grader
CN109299253A (en) * 2018-09-03 2019-02-01 华南理工大学 A kind of social text Emotion identification model construction method of Chinese based on depth integration neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528536A (en) * 2016-11-14 2017-03-22 北京赛思信安技术股份有限公司 Multilingual word segmentation method based on dictionaries and grammar analysis
CN107562729A (en) * 2017-09-14 2018-01-09 云南大学 The Party building document representation method strengthened based on neutral net and theme
CN107894980A (en) * 2017-12-06 2018-04-10 陈件 A kind of multiple statement is to corpus of text sorting technique and grader
CN109299253A (en) * 2018-09-03 2019-02-01 华南理工大学 A kind of social text Emotion identification model construction method of Chinese based on depth integration neural network

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
刘娇: "基于深度学习的多语种短文本分类方法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
张群等: "词向量与LDA相融合的短文本分类方法", 《现代图书情报技术》 *
李洋等: "基于CNN和BiLSTM网络特征融合的文本情感分析", 《计算机应用》 *
胡朝举等: "基于词向量技术和混合神经网络的情感分析", 《计算机应用研究》 *
金保华等: "基于深度学习的社交网络舆情分类", 《电子世界》 *
陈磊等: "基于LF-LDA和Word2vec的文本表示模型研究", 《电子技术》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414009A (en) * 2019-07-09 2019-11-05 昆明理工大学 The remote bilingual parallel sentence pairs abstracting method of English based on BiLSTM-CNN and device
CN110717341A (en) * 2019-09-11 2020-01-21 昆明理工大学 Method and device for constructing old-Chinese bilingual corpus with Thai as pivot
CN110717341B (en) * 2019-09-11 2022-06-14 昆明理工大学 Method and device for constructing old-Chinese bilingual corpus with Thai as pivot
CN112685374A (en) * 2019-10-17 2021-04-20 中国移动通信集团浙江有限公司 Log classification method and device and electronic equipment
CN110781690A (en) * 2019-10-31 2020-02-11 北京理工大学 Fusion and compression method of multi-source neural machine translation model
CN111191028A (en) * 2019-12-16 2020-05-22 浙江大搜车软件技术有限公司 Sample labeling method and device, computer equipment and storage medium
CN111666754A (en) * 2020-05-28 2020-09-15 平安医疗健康管理股份有限公司 Entity identification method and system based on electronic disease text and computer equipment
CN111666754B (en) * 2020-05-28 2023-02-03 深圳平安医疗健康科技服务有限公司 Entity identification method and system based on electronic disease text and computer equipment
CN111797607A (en) * 2020-06-04 2020-10-20 语联网(武汉)信息技术有限公司 Sparse noun alignment method and system
CN111797607B (en) * 2020-06-04 2024-03-29 语联网(武汉)信息技术有限公司 Sparse noun alignment method and system
CN111984762B (en) * 2020-08-05 2022-12-13 中国科学院重庆绿色智能技术研究院 Text classification method sensitive to attack resistance
CN111984762A (en) * 2020-08-05 2020-11-24 中国科学院重庆绿色智能技术研究院 Text classification method sensitive to attack resistance
CN112052750A (en) * 2020-08-20 2020-12-08 南京信息工程大学 Arrhythmia classification method based on class imbalance sensing data and depth model
CN112612889A (en) * 2020-12-28 2021-04-06 中科院计算技术研究所大数据研究院 Multilingual document classification method and device and storage medium
CN112612889B (en) * 2020-12-28 2021-10-29 中科院计算技术研究所大数据研究院 Multilingual document classification method and device and storage medium
CN112765996A (en) * 2021-01-19 2021-05-07 延边大学 Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation
CN112765996B (en) * 2021-01-19 2021-08-31 延边大学 Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation
CN113076467A (en) * 2021-03-26 2021-07-06 昆明理工大学 Chinese-crossing news topic discovery method based on cross-language neural topic model
CN114492401A (en) * 2022-01-24 2022-05-13 重庆工业职业技术学院 Working method for extracting English vocabulary based on big data
CN114492401B (en) * 2022-01-24 2022-11-15 重庆工业职业技术学院 Working method for extracting English vocabulary based on big data
CN115017921A (en) * 2022-03-10 2022-09-06 延边大学 Chinese-oriented neural machine translation method based on multi-granularity characterization

Similar Documents

Publication Publication Date Title
CN109885686A (en) A kind of multilingual file classification method merging subject information and BiLSTM-CNN
CN110413986B (en) Text clustering multi-document automatic summarization method and system for improving word vector model
CN112001187B (en) Emotion classification system based on Chinese syntax and graph convolution neural network
CN107180023B (en) Text classification method and system
CN106599032B (en) Text event extraction method combining sparse coding and structure sensing machine
CN110321563B (en) Text emotion analysis method based on hybrid supervision model
CN112001185A (en) Emotion classification method combining Chinese syntax and graph convolution neural network
WO2019080863A1 (en) Text sentiment classification method, storage medium and computer
Wahid et al. Topic2Labels: A framework to annotate and classify the social media data through LDA topics and deep learning models for crisis response
CN112001186A (en) Emotion classification method using graph convolution neural network and Chinese syntax
Sartakhti et al. Persian language model based on BiLSTM model on COVID-19 corpus
Chaturvedi et al. Lyapunov filtering of objectivity for Spanish sentiment model
Othman et al. Learning english and arabic question similarity with siamese neural networks in community question answering services
CN109344403A (en) A kind of document representation method of enhancing semantic feature insertion
CN115017299A (en) Unsupervised social media summarization method based on de-noised image self-encoder
CN114969304A (en) Case public opinion multi-document generation type abstract method based on element graph attention
CN109766523A (en) Part-of-speech tagging method and labeling system
Wen et al. Structure regularized neural network for entity relation classification for chinese literature text
CN111723572B (en) Chinese short text correlation measurement method based on CNN convolutional layer and BilSTM
CN114265936A (en) Method for realizing text mining of science and technology project
CN111984782A (en) Method and system for generating text abstract of Tibetan language
CN111581943A (en) Chinese-over-bilingual multi-document news viewpoint sentence identification method based on sentence association graph
Zhang et al. Disease prediction and early intervention system based on symptom similarity analysis
Laatar et al. Word Sense Disambiguation of Arabic Language with Word Embeddings as Part of the Creation of a Historical Dictionary.
Wang et al. Improving image captioning via predicting structured concepts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination