CN109885686A - A kind of multilingual file classification method merging subject information and BiLSTM-CNN - Google Patents
A kind of multilingual file classification method merging subject information and BiLSTM-CNN Download PDFInfo
- Publication number
- CN109885686A CN109885686A CN201910127535.8A CN201910127535A CN109885686A CN 109885686 A CN109885686 A CN 109885686A CN 201910127535 A CN201910127535 A CN 201910127535A CN 109885686 A CN109885686 A CN 109885686A
- Authority
- CN
- China
- Prior art keywords
- multilingual
- text
- languages
- neural network
- subject information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Machine Translation (AREA)
Abstract
The present invention relates to text classification technical fields in natural language processing, and in particular to a kind of multilingual file classification method for merging subject information and BiLSTM-CNN implements process are as follows: construct Parallel Corpus towards the multilingual parallel corpora of English in collecting first;Languages text each in corpus is pre-processed;Utilize the term vector of each languages of word embedded technology training;Each languages text subject vector is extracted using topic model;It builds and is suitable for multilingual neural network model, and merge subject information, carry out multilingual text representation.The method of text classification, solves aphasis, has very strong adaptability, is able to satisfy the demand of multilingual text classification, practical.
Description
Technical field
The present invention relates to text classification technical fields in natural language processing, and in particular to a kind of fusion subject information and
The multilingual file classification method of BiLSTM-CNN.
Background technique
With the rapid development of Internet, more and more internet datas exist in a text form, simultaneous
International development, multilingual text data are more and more common.People are increasingly not content with the text under single language environment
This information, the demand to multilingual text information are constantly promoted.People are urgent to be wished to from multilingual text data quickly
Effectively find oneself required information.Research direction of the multilingual text classification as natural language processing is that solution is more
The effective ways of languages text information development.
Multilingual text classification, objective are in the case where not needing manual intervention by existing automatic Text Categorization skill
Art is expanded to multilingual by single languages.With the progress of globalization process, the research of multilingual text classification has obtained extensive pass
Note and development, there are mainly four types of different methods at present.
Method based on dictionary.The strategy of bilingual dictionary is used, this method is simple and easy.For example, Olsson et al. is logical
English Training document is translated into Czech document by the mode for crossing the bilingual dictionary of probability, to carry out across language text classification.
But this method can not solve the problems, such as polysemy.
Method based on corpus.This method is divided into Parallel Corpus and comparison corpus, and Parallel Corpus refers to same
Information is described with different language;Compare corpus and refer to that the information of same subject is described with different languages,
In document be aligned according to discussed theme.But this method needs the highly developed and comprehensive corpus of covering, to reality
The condition of testing causes significant limitation, is unfavorable for extending.
Method based on machine translation.By machine translation tools by the document translation of multiple languages at unified language mould
Type is classified.This method is fairly simple, but depends critically upon the accuracy rate of machine translation, and efficiency is caused to reduce.
The method of word-based insertion.This method is by establishing the feature representation model based on deep learning, and training is multi-lingual
Kind term vector.This method combination context accurately obtains semantic information, and feature is made to obtain specifically indicating very much.
One main difficulty of multilingual text classification is multilingual text representation, it is therefore proposed that a kind of new is multilingual
Text representation and neural network model, solve the problems, such as languages.
Summary of the invention
For the problems mentioned above in the background art, the invention discloses a kind of fusion subject information and BiLSTM-CNN
Multilingual file classification method, be able to solve languages problem.
A kind of multilingual file classification method merging subject information and BiLSTM-CNN, comprising the following steps:
1) Parallel Corpus is constructed towards the multilingual parallel corpora of English in collecting;
2) languages text each in corpus is pre-processed;
3) term vector of each languages of word embedded technology training is utilized;
4) each languages text subject vector is extracted using topic model;
5) it builds suitable for multilingual neural network model, and merges subject information, carry out multilingual text representation.
Preferably, when constructing multilingual Parallel Corpus in the step 1), towards 13 classes of three kinds of languages of English in collection
Other scientific and technical literature abstract, the multilingual Parallel Corpus of content construction alignment;
Preferably, when handling each languages text in the step 2), detailed process is as follows:
S1: for Chinese corpus, building includes the scieintific and technical dictionary of biology, medicine, physics profession term, as
The preference of participle is added in dictionary for word segmentation, optimizes Chinese word segmentation effect;
S2: extracting the stem of English word to English corpus, i.e., English word is reduced into its stem indicates;
S3: to needing to remove termination suffix and conjunction towards literary corpus, pronouns, general term for nouns, numerals and measure words and predicate are left;
Preferably, the term vector that each languages are trained in the step 3), is obtained using the CBOW model training of Word2vec
The term vector that dimension is 220;
Preferably, the method that theme vector uses latent semantic analysis is extracted in the step 4), respectively to different language
Text Feature Extraction its theme vector;
Preferably, it is built in the step 5) suitable for multilingual neural network model, the neural network model
It is divided into three submodels, Chinese neural network model, English neural network model and Chao Wen neural network model, each submodel
Neural network structure having the same, while the text of training different language obtains different model parameters, three submodels exist
Finally cascade obtains complete neural network model, realizes multilingual text classification;
Preferably, in the step 5) Artificial Neural Network Structures of fusion subject information be divided into input layer, BiLSTM layers,
CNN layers, full articulamentum and output layer.
The utility model has the advantages that
1) present invention efficiently solves the problems, such as languages, does not need by external resource, and each languages train alone the mind of oneself
Through network model, each languages semantic information is accurately utilized, obtaining validity feature indicates, has certain versatility.
2) present invention indicates text using the feature that model group cooperation is text, obtains text time and two, space
The text information of dimension, more accurately expresses text semantic.
3) present invention makes full use of subject information, is extracted the theme vector of each languages, in conjunction with text subject information and language
Adopted information improves the accuracy of text modeling.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1: overall flow block diagram of the present invention;
Fig. 2: Text Pretreatment flow chart of the present invention;
Fig. 3: neural network model figure of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
The environment configurations of this example are as follows: Windows makees system, and CPU frequency 3.30GHZ inside saves as 16GB, programming language
For Python, deep learning frame is Tensorflow, is completed at Integrated Development Environment PyCharm.
As shown in Figure 1, the specific implementation step of this algorithm are as follows:
Step 1: constructing Parallel Corpus towards the multilingual parallel corpora of English in collecting first;
Step 2: languages text each in corpus is pre-processed;
Step 3: utilizing the term vector of each languages of word embedded technology training;
Step 4: extracting each languages text subject vector using topic model;
Step 5: building suitable for multilingual neural network model, and merge subject information, carry out multilingual text table
Show.
In above-mentioned steps 1, China and Britain are compiled towards scientific and technical literature summary texts, 32688 texts of every kind of languages are total
98064 texts are divided into 13 classifications, construct multilingual Parallel Corpus.
In above-mentioned steps 2, the text being collected into is pre-processed, due to the text information comprising three kinds of languages, so
Point languages are pre-processed, and specific steps are as shown in Figure 2:
Step 2.1, for Chinese corpus, stop words is removed, is segmented, building includes the professional art such as biology, medicine, physics
The scieintific and technical dictionary of language is added in dictionary for word segmentation as the preference of participle, optimizes Chinese word segmentation effect.
Step 2.2, to English corpus, by capitalization lower, and the stem of English word is extracted, i.e., will
English word is reduced into the expression of its stem.
Step 2.3, it needs to remove termination suffix and conjunction etc. to towards literary corpus, leaves pronouns, general term for nouns, numerals and measure words and predicate.
By above step, the multilingual text in corpus is pre-processed, building experiment text set.
In step 3, each languages train alone its term vector, using the CBOW model of Word2vec, and have ignored in text
Vocabulary of the word frequency less than 10.CBOW model is to predict centre word by the word of front and back, is three layers of processing model, is respectively as follows:
Input layer: Vu, wherein
Projection layer:
Output layer corresponds to a Huffman tree, and leaf node is the word in sample, and n omicronn-leaf child node is virtual nodes, and
It is non-real to be assigned with space.
Its learning objective is to maximize log-likelihood function:
Wherein, w indicates any one word in corpus c, and when objective function obtains maximum value, corresponding term vector is just
It is very good.
In step 4, using latent semantic analysis method, each languages extract alone its text subject vector, and steps are as follows:
Collection of document is analyzed, lexical item-document matrix is established;
Singular value decomposition is carried out to lexical item-document matrix;
To the matrix after singular value decomposition, its document-theme matrix is extracted, as the theme vector of each text.
In step 5, the neural network model for incorporating subject information is built, model is as shown in Figure 3.
By Fig. 3, it can be seen that, which is divided into three submodels, i.e., Chinese neural network model, English neural network mould
Type and Chao Wen neural network model, each submodel neural network structure having the same, while the text of training different language
Different model parameters is obtained, three submodels obtain complete neural network model, it can be achieved that multilingual text in last cascade
This classification.
Artificial Neural Network Structures shown in Fig. 3 are divided into input layer, BiLSTM layers, CNN layers, full articulamentum, output layer etc..
Every layer of concrete meaning is as follows:
Input layer is spliced to form by term vector and theme vector:
Wherein w represents the term vector obtained by Word2vec training, and dimension is 220 dimensions, and θ representative is mentioned by latent semantic analysis
The theme vector taken is equal with term vector dimension;
BiLSTM layers are two-way length memory networks in short-term, include two LSTM: forward directionWith it is backward
BiLSTM layers of output is willOutputWithOutputCascade obtains the output O of t momentt, it may be assumed that
The number of hidden layer neuron is set as 150 in BiLSTM, and BiLSTM layers of effect is the word order that body takes text
Information.
CNN layers are made of convolutional layer, normalization layer, active coating and pond layer.
The size of the convolution kernel of convolutional layer is 3,4,5, and convolution kernel number is 128.
It normalizes layer and uses Batch-Normalization, calculating process is as follows:
Layer functions are activated to select frelu, formula are as follows:
The pond stage using maximum pondization strategy, reduces the error that convolutional layer parameter error causes estimation mean shift, more
More reservation local messages.
The result cascade that three kinds of languages are obtained after Processing with Neural Network, inputs to softmax function, carries out classification
Prediction.
Parallel Corpus is divided into training set and test set by the method for ten folding cross validations, carries out experimental verification.
Cause Dropout machine in full articulamentum to prevent over-fitting with training set training neural network by above-mentioned steps
System, ignores some neurons with certain probability, and Dropout value is 0.5 in this experiment, while introducing L2 regularization mechanism,
Its principle are as follows:
c0Original loss function is represented,L2 regularization term, be by the quadratic sum of all parameter w, divided by
The size n of training set is obtained.λ is exactly regularization coefficient.
Other parameter settings are as follows: batch-size takes 128, epoch to take 200, learning rate le-3.
The performance of the inventive method is verified with test set, evaluation index selects accuracy rate and cross entropy.
Accuracy rate, is defined as: for given test data set, the sample number and total number of samples that classifier is correctly classified
The ratio between.
Cross entropy embodies the probability distribution and authentic specimen of model output as the common valuation functions of deep learning
The similarity degree of probability distribution.Is defined as:
Wherein, y indicates authentic specimen value, and p indicates the class probability obtained through model prediction.
Embodiment one
The languages text set in multilingual text classification corpus that one embodiment selects step 1 to establish carries out
The validity of experimental verification submodel.In the parameter setting of this example, embedding-size takes 220 dimensions, hidden layer neuron
Number equally takes 150 dimensions, and theme number is set as 220, and batch-size takes 64 etc..The model of comparison is TextCNN: by one
A convolutional layer, active coating, pond layer and full articulamentum are constituted, and demonstrate the text classification accuracy that submodel can improve.
Embodiment two
The present embodiment is basically the same as the first embodiment, and difference is:
The multilingual text corpus that the present embodiment selection step 1 is established, carries out multilingual text classification.Model is expanded
It opens up to three kinds of languages, while each languages text of training, is cascaded in last neural net layer, this method can be accurately to multi-lingual
Kind text is classified.
To sum up, this patent method can realize multilingual text classification, the multilingual neural network that this method training obtains
Classify to also single languages, while solving language kind obstacle, improves the accuracy rate of multilingual text classification, and have
Scalability.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to the foregoing embodiments
Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation
Technical solution documented by example is modified or equivalent replacement of some of the technical features;And these modification or
Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.
Claims (7)
1. a kind of multilingual file classification method for merging subject information and BiLSTM-CNN, which is characterized in that including following step
It is rapid:
1) Parallel Corpus is constructed towards the multilingual parallel corpora of English in collecting;
2) languages text each in corpus is pre-processed;
3) term vector of each languages of word embedded technology training is utilized;
4) each languages text subject vector is extracted using topic model;
5) it builds suitable for multilingual neural network model, and merges subject information, carry out multilingual text representation.
2. the multilingual file classification method of fusion subject information and BiLSTM-CNN according to claim 1, feature
It is: when constructing multilingual Parallel Corpus in the step 1), towards the science and technology text of 13 classifications of three kinds of languages of English in collection
Offer abstract, the multilingual Parallel Corpus of content construction alignment.
3. the multilingual file classification method of fusion subject information and BiLSTM-CNN according to claim 1, feature
Be: when handling each languages text in the step 2), detailed process is as follows,
S1: for Chinese corpus, building includes the scieintific and technical dictionary of biology, medicine, physics profession term, as participle
Preference be added in dictionary for word segmentation, optimize Chinese word segmentation effect;
S2: extracting the stem of English word to English corpus, i.e., English word is reduced into its stem indicates;
S3: to needing to remove termination suffix and conjunction towards literary corpus, pronouns, general term for nouns, numerals and measure words and predicate are left.
4. the multilingual file classification method of fusion subject information and BiLSTM-CNN according to claim 1, feature
Be: the term vector of each languages of training in the step 3) uses the CBOW model training of Word2vec to obtain dimension as 220
Term vector.
5. the multilingual file classification method of fusion subject information and BiLSTM-CNN according to claim 1, feature
It is: the method that theme vector uses latent semantic analysis is extracted in the step 4), respectively to the Text Feature Extraction of different language
Its theme vector.
6. the multilingual file classification method of fusion subject information and BiLSTM-CNN according to claim 1, feature
It is: is built in the step 5) suitable for multilingual neural network model, the neural network model is divided into three sons
Model, Chinese neural network model, English neural network model and Chao Wen neural network model, each submodel are having the same
Neural network structure, while the text of training different language obtains different model parameters, three submodels are cascaded finally
To complete neural network model, multilingual text classification is realized.
7. the multilingual file classification method of fusion subject information and BiLSTM-CNN according to claim 1, feature
Be: the Artificial Neural Network Structures of fusion subject information are divided into input layer, BiLSTM layers, CNN layers, Quan Lian in the step 5)
Connect layer and output layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910127535.8A CN109885686A (en) | 2019-02-20 | 2019-02-20 | A kind of multilingual file classification method merging subject information and BiLSTM-CNN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910127535.8A CN109885686A (en) | 2019-02-20 | 2019-02-20 | A kind of multilingual file classification method merging subject information and BiLSTM-CNN |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109885686A true CN109885686A (en) | 2019-06-14 |
Family
ID=66928567
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910127535.8A Pending CN109885686A (en) | 2019-02-20 | 2019-02-20 | A kind of multilingual file classification method merging subject information and BiLSTM-CNN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109885686A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414009A (en) * | 2019-07-09 | 2019-11-05 | 昆明理工大学 | The remote bilingual parallel sentence pairs abstracting method of English based on BiLSTM-CNN and device |
CN110717341A (en) * | 2019-09-11 | 2020-01-21 | 昆明理工大学 | Method and device for constructing old-Chinese bilingual corpus with Thai as pivot |
CN110781690A (en) * | 2019-10-31 | 2020-02-11 | 北京理工大学 | Fusion and compression method of multi-source neural machine translation model |
CN111191028A (en) * | 2019-12-16 | 2020-05-22 | 浙江大搜车软件技术有限公司 | Sample labeling method and device, computer equipment and storage medium |
CN111666754A (en) * | 2020-05-28 | 2020-09-15 | 平安医疗健康管理股份有限公司 | Entity identification method and system based on electronic disease text and computer equipment |
CN111797607A (en) * | 2020-06-04 | 2020-10-20 | 语联网(武汉)信息技术有限公司 | Sparse noun alignment method and system |
CN111984762A (en) * | 2020-08-05 | 2020-11-24 | 中国科学院重庆绿色智能技术研究院 | Text classification method sensitive to attack resistance |
CN112052750A (en) * | 2020-08-20 | 2020-12-08 | 南京信息工程大学 | Arrhythmia classification method based on class imbalance sensing data and depth model |
CN112612889A (en) * | 2020-12-28 | 2021-04-06 | 中科院计算技术研究所大数据研究院 | Multilingual document classification method and device and storage medium |
CN112685374A (en) * | 2019-10-17 | 2021-04-20 | 中国移动通信集团浙江有限公司 | Log classification method and device and electronic equipment |
CN112765996A (en) * | 2021-01-19 | 2021-05-07 | 延边大学 | Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation |
CN113076467A (en) * | 2021-03-26 | 2021-07-06 | 昆明理工大学 | Chinese-crossing news topic discovery method based on cross-language neural topic model |
CN114492401A (en) * | 2022-01-24 | 2022-05-13 | 重庆工业职业技术学院 | Working method for extracting English vocabulary based on big data |
CN115017921A (en) * | 2022-03-10 | 2022-09-06 | 延边大学 | Chinese-oriented neural machine translation method based on multi-granularity characterization |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106528536A (en) * | 2016-11-14 | 2017-03-22 | 北京赛思信安技术股份有限公司 | Multilingual word segmentation method based on dictionaries and grammar analysis |
CN107562729A (en) * | 2017-09-14 | 2018-01-09 | 云南大学 | The Party building document representation method strengthened based on neutral net and theme |
CN107894980A (en) * | 2017-12-06 | 2018-04-10 | 陈件 | A kind of multiple statement is to corpus of text sorting technique and grader |
CN109299253A (en) * | 2018-09-03 | 2019-02-01 | 华南理工大学 | A kind of social text Emotion identification model construction method of Chinese based on depth integration neural network |
-
2019
- 2019-02-20 CN CN201910127535.8A patent/CN109885686A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106528536A (en) * | 2016-11-14 | 2017-03-22 | 北京赛思信安技术股份有限公司 | Multilingual word segmentation method based on dictionaries and grammar analysis |
CN107562729A (en) * | 2017-09-14 | 2018-01-09 | 云南大学 | The Party building document representation method strengthened based on neutral net and theme |
CN107894980A (en) * | 2017-12-06 | 2018-04-10 | 陈件 | A kind of multiple statement is to corpus of text sorting technique and grader |
CN109299253A (en) * | 2018-09-03 | 2019-02-01 | 华南理工大学 | A kind of social text Emotion identification model construction method of Chinese based on depth integration neural network |
Non-Patent Citations (6)
Title |
---|
刘娇: "基于深度学习的多语种短文本分类方法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
张群等: "词向量与LDA相融合的短文本分类方法", 《现代图书情报技术》 * |
李洋等: "基于CNN和BiLSTM网络特征融合的文本情感分析", 《计算机应用》 * |
胡朝举等: "基于词向量技术和混合神经网络的情感分析", 《计算机应用研究》 * |
金保华等: "基于深度学习的社交网络舆情分类", 《电子世界》 * |
陈磊等: "基于LF-LDA和Word2vec的文本表示模型研究", 《电子技术》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414009A (en) * | 2019-07-09 | 2019-11-05 | 昆明理工大学 | The remote bilingual parallel sentence pairs abstracting method of English based on BiLSTM-CNN and device |
CN110717341A (en) * | 2019-09-11 | 2020-01-21 | 昆明理工大学 | Method and device for constructing old-Chinese bilingual corpus with Thai as pivot |
CN110717341B (en) * | 2019-09-11 | 2022-06-14 | 昆明理工大学 | Method and device for constructing old-Chinese bilingual corpus with Thai as pivot |
CN112685374A (en) * | 2019-10-17 | 2021-04-20 | 中国移动通信集团浙江有限公司 | Log classification method and device and electronic equipment |
CN110781690A (en) * | 2019-10-31 | 2020-02-11 | 北京理工大学 | Fusion and compression method of multi-source neural machine translation model |
CN111191028A (en) * | 2019-12-16 | 2020-05-22 | 浙江大搜车软件技术有限公司 | Sample labeling method and device, computer equipment and storage medium |
CN111666754A (en) * | 2020-05-28 | 2020-09-15 | 平安医疗健康管理股份有限公司 | Entity identification method and system based on electronic disease text and computer equipment |
CN111666754B (en) * | 2020-05-28 | 2023-02-03 | 深圳平安医疗健康科技服务有限公司 | Entity identification method and system based on electronic disease text and computer equipment |
CN111797607A (en) * | 2020-06-04 | 2020-10-20 | 语联网(武汉)信息技术有限公司 | Sparse noun alignment method and system |
CN111797607B (en) * | 2020-06-04 | 2024-03-29 | 语联网(武汉)信息技术有限公司 | Sparse noun alignment method and system |
CN111984762B (en) * | 2020-08-05 | 2022-12-13 | 中国科学院重庆绿色智能技术研究院 | Text classification method sensitive to attack resistance |
CN111984762A (en) * | 2020-08-05 | 2020-11-24 | 中国科学院重庆绿色智能技术研究院 | Text classification method sensitive to attack resistance |
CN112052750A (en) * | 2020-08-20 | 2020-12-08 | 南京信息工程大学 | Arrhythmia classification method based on class imbalance sensing data and depth model |
CN112612889A (en) * | 2020-12-28 | 2021-04-06 | 中科院计算技术研究所大数据研究院 | Multilingual document classification method and device and storage medium |
CN112612889B (en) * | 2020-12-28 | 2021-10-29 | 中科院计算技术研究所大数据研究院 | Multilingual document classification method and device and storage medium |
CN112765996A (en) * | 2021-01-19 | 2021-05-07 | 延边大学 | Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation |
CN112765996B (en) * | 2021-01-19 | 2021-08-31 | 延边大学 | Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation |
CN113076467A (en) * | 2021-03-26 | 2021-07-06 | 昆明理工大学 | Chinese-crossing news topic discovery method based on cross-language neural topic model |
CN114492401A (en) * | 2022-01-24 | 2022-05-13 | 重庆工业职业技术学院 | Working method for extracting English vocabulary based on big data |
CN114492401B (en) * | 2022-01-24 | 2022-11-15 | 重庆工业职业技术学院 | Working method for extracting English vocabulary based on big data |
CN115017921A (en) * | 2022-03-10 | 2022-09-06 | 延边大学 | Chinese-oriented neural machine translation method based on multi-granularity characterization |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109885686A (en) | A kind of multilingual file classification method merging subject information and BiLSTM-CNN | |
CN110413986B (en) | Text clustering multi-document automatic summarization method and system for improving word vector model | |
CN112001187B (en) | Emotion classification system based on Chinese syntax and graph convolution neural network | |
CN107180023B (en) | Text classification method and system | |
CN106599032B (en) | Text event extraction method combining sparse coding and structure sensing machine | |
CN110321563B (en) | Text emotion analysis method based on hybrid supervision model | |
CN112001185A (en) | Emotion classification method combining Chinese syntax and graph convolution neural network | |
WO2019080863A1 (en) | Text sentiment classification method, storage medium and computer | |
Wahid et al. | Topic2Labels: A framework to annotate and classify the social media data through LDA topics and deep learning models for crisis response | |
CN112001186A (en) | Emotion classification method using graph convolution neural network and Chinese syntax | |
Sartakhti et al. | Persian language model based on BiLSTM model on COVID-19 corpus | |
Chaturvedi et al. | Lyapunov filtering of objectivity for Spanish sentiment model | |
Othman et al. | Learning english and arabic question similarity with siamese neural networks in community question answering services | |
CN109344403A (en) | A kind of document representation method of enhancing semantic feature insertion | |
CN115017299A (en) | Unsupervised social media summarization method based on de-noised image self-encoder | |
CN114969304A (en) | Case public opinion multi-document generation type abstract method based on element graph attention | |
CN109766523A (en) | Part-of-speech tagging method and labeling system | |
Wen et al. | Structure regularized neural network for entity relation classification for chinese literature text | |
CN111723572B (en) | Chinese short text correlation measurement method based on CNN convolutional layer and BilSTM | |
CN114265936A (en) | Method for realizing text mining of science and technology project | |
CN111984782A (en) | Method and system for generating text abstract of Tibetan language | |
CN111581943A (en) | Chinese-over-bilingual multi-document news viewpoint sentence identification method based on sentence association graph | |
Zhang et al. | Disease prediction and early intervention system based on symptom similarity analysis | |
Laatar et al. | Word Sense Disambiguation of Arabic Language with Word Embeddings as Part of the Creation of a Historical Dictionary. | |
Wang et al. | Improving image captioning via predicting structured concepts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |