CN110516064A - A kind of Aeronautical R&D paper classification method based on deep learning - Google Patents

A kind of Aeronautical R&D paper classification method based on deep learning Download PDF

Info

Publication number
CN110516064A
CN110516064A CN201910625454.0A CN201910625454A CN110516064A CN 110516064 A CN110516064 A CN 110516064A CN 201910625454 A CN201910625454 A CN 201910625454A CN 110516064 A CN110516064 A CN 110516064A
Authority
CN
China
Prior art keywords
aeronautical
paper
data set
training
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910625454.0A
Other languages
Chinese (zh)
Inventor
杨丽君
王坚
凌卫青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201910625454.0A priority Critical patent/CN110516064A/en
Publication of CN110516064A publication Critical patent/CN110516064A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The Aeronautical R&D paper classification method based on deep learning that the present invention relates to a kind of includes the following steps: S1: acquisition Aeronautical R&D paper data obtain paper data set;S2: cleaning pretreatment is carried out to the paper data set, obtains the first data set;S3: Text Pretreatment is carried out to first data set, obtains the second data set;S4: Aeronautical R&D paper classification model is constructed based on Text-CNN text classification algorithm;S5: second data set training Aeronautical R&D paper classification model is utilized;S6: Aeronautical R&D paper classification is carried out using the Aeronautical R&D paper classification model after training.Compared with the sorting techniques such as existing random forest, support vector machines, classification method of the present invention has many advantages, such as that speed is fast, accuracy is high, this will be helpful to the working efficiency for improving researcher.

Description

A kind of Aeronautical R&D paper classification method based on deep learning
Technical field
The invention belongs to data mining technology fields, are related to a kind of Aeronautical R&D paper classification method, more particularly, to one Aeronautical R&D paper classification method of the kind based on deep learning.
Background technique
Due to gradually increasing for academic research personnel in recent years, the scientific achievements such as paper deliver speed quickening, science opinion Explosive increase is presented in literary quantity, is related to every subjects, has expedited the emergence of various demands of user when using them.Such as scholar needs Newest pertinent literature is found in its research field, therefore carry out classification to document and seem to be highly desirable.To Scientific Articles into Classification belonging to paper is carried out automation and labelled, can significantly improve these retrieval efficiency, accelerates scientific research work by row classification Make.Meanwhile constructing the expansion that paper classification model also contributes to the research works such as paper matching, these retrieval, expert's recommendation.Face To the paper data of magnanimity, for example to have classification effectiveness low, quasi- for naive Bayesian scheduling algorithm etc. for traditional text classification algorithm The problems such as really rate is low.
Natural language processing technique develops rapidly in recent years, the natural language processing technique after having merged deep learning algorithm Also the classification field of document is progressed into, wherein the technical term profession degree of Aeronautical R&D paper is high, compared to other field texts It offers, text size needed for the classification of Aeronautical R&D paper is less, still not special at present for the screening of Aeronautical R&D paper Disaggregated model, the conventional machines learning algorithms such as logistic regression, random forest, support vector machines, k nearest neighbor algorithm can not be in aviation sections It grinds paper classification field and particularly shows better classifying quality.
Summary of the invention
It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide one kind to be based on deep learning Aeronautical R&D paper classification method.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of Aeronautical R&D paper classification method based on deep learning, includes the following steps:
S1: acquisition Aeronautical R&D paper data obtain paper data set;
S2: cleaning pretreatment is carried out to the paper data set, obtains the first data set;
S3: Text Pretreatment is carried out to first data set, obtains the second data set;
S4: Aeronautical R&D paper classification model is constructed based on Text-CNN text classification algorithm;
S5: second data set training Aeronautical R&D paper classification model is utilized;
S6: Aeronautical R&D paper classification is carried out using the Aeronautical R&D paper classification model after training.
Further, paper data are acquired in step S1 specifically: crawl paper in paper library using crawlers, institute Crawlers are stated using Python programming language, PyCharm editing environment and Scrapy crawler frame.
Further, described in step S2 clean pretreatment specifically: by paper data set abnormal data and repetition Data are rejected, and the abnormal data includes messy code character.
Further, Text Pretreatment described in step S3 specifically: to first data set carry out jieba participle and It goes stop words to handle and is organized into the abstract of a thesis-classification form.
Further, training process in step S5 specifically:
Training process in step S5 specifically:
S501: second data set is divided into training set, verifying collection and test set, feature is carried out to each data set and is mentioned It takes;
S502: based on training set training Aeronautical R&D paper classification model, fitted model parameters;Based on the verifying Collect the hyper parameter during adjusting training;Aeronautical R&D paper classification model is general after based on test set inspection training Change ability.
Further, the process of the feature extraction are as follows: encoded according to data set generation vocabulary, and by One-Hot Generate numerical matrix.
Further, the Aeronautical R&D paper classification model based on the Text-CNN text classification algorithm includes successively connecting Input layer, convolutional layer, pond layer and 4 layers of full articulamentum composition, the input layer connect is inputted for term vector, convolutional layer and pond layer Advanced features are extracted, full articulamentum completes classification, and class categories number is 2, and convolution nucleus number is 128, and convolution kernel is having a size of 5, Chi Hua Layer is Max-pool, and full articulamentum neuron is 128.
Convolution algorithm is as follows:
cj=f (W × Xj:j+h-1+b) (1)
Wherein, f is ReLU activation primitive, cjTo be after convolution as a result, W is weight matrix, Xj:j+h-1For window j-j+h-1 Term vector matrix, b is amount of bias.
Pond operation is as follows:
cmax=max (cj) (2)
Wherein, cmaxFor after maximum pond operation as a result, cj(j=1,2 ..., n-h+1) is the result after convolution algorithm.
Compared with prior art, the present invention have with following the utility model has the advantages that
1) Aeronautical R&D paper classification model of the invention is based on Text-CNN text classification algorithm, learns with conventional machines Algorithm is compared, and TextCNN is very strong to the Extracting Ability of text shallow-layer feature, and direction of scientific rersearch this for Aeronautical R&D paper is distinct Paper, the professional identification degree of keyword is high, only needs a very short text that can recognize, therefore utilizes to short text classifying quality more Classifying quality is fine when the TextCNN text classification algorithm got well, and speed is fast;
2) present invention carries out jieba participle to paper data set text and goes stop words to handle and be organized into the abstract of a thesis- The text information of redundancy is screened and eliminated to the form of classification, improves the efficiency and accuracy of classification;
3) present invention constructs Aeronautical R&D paper classification model, can carry out data mining to the key message got Analysis, obtains keyword data group and retrieves corresponding airline Scientific Articles, and can discuss the keyword as Aeronautical R&D Text storage indicates, to realize that pertinent literature is accurately retrieved and stored in Aeronautical R&D field;
4) present invention building Aeronautical R&D paper classification model, can be used as the basis of other algorithms, it will help aviation section The expansion of other research works such as paper matching, these retrieval, expert's recommendation is ground, base can be established for other research works Plinth;
5) during classification method of the present invention can also be used for the text classifications such as patent, expansibility is strong, has one Fixed promotional value.
Detailed description of the invention
Fig. 1 is the Aeronautical R&D paper classification implementation flow chart based on deep learning;
Fig. 2 is Text-CNN convolutional neural networks structure chart;
Fig. 3 is the Aeronautical R&D paper classification flow diagram based on conventional machines learning algorithm;
Fig. 4 is the combination comparison diagram of Fig. 1 and Fig. 3;
Fig. 5 is simulated environment information schematic diagram;
Fig. 6 is the classifying quality comparison diagram of algorithms of different.
Specific embodiment
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.The present embodiment is with technical solution of the present invention Premised on implemented, the detailed implementation method and specific operation process are given, but protection scope of the present invention is not limited to Following embodiments.
The present invention provides a kind of Aeronautical R&D paper classification method based on deep learning, as shown in Figure 1, including following step It is rapid:
S1: acquisition Aeronautical R&D paper data obtain paper data set, specifically: known using crawlers from China Paper is crawled in net, data are stored in MySQL database, and the data that the present embodiment is used are the abstract of a thesis and the affiliated class of paper Not, the crawlers use Python programming language, PyCharm editing environment and Scrapy crawler frame.
S2: cleaning pretreatment is carried out to the paper data set, obtains the first data set;
Described in step S2 clean pretreatment specifically: by paper data set abnormal data and repeated data pick It removes, the abnormal data includes messy code character.
S3: Text Pretreatment is carried out to first data set, obtains the second data set;
Text Pretreatment described in step S3 specifically: jieba participle is carried out to first data set and removes stop words The abstract of a thesis-classification form is handled and is organized into, jieba segments kit and the text in the abstract of a thesis is divided into word, and The character filterings without practical significance such as spcial character in abstract are fallen by deactivating vocabulary.
S4: Aeronautical R&D paper classification model is constructed based on Text-CNN text classification algorithm;
The structure of Aeronautical R&D paper classification model includes 1 input layer, a convolutional layer, a pond layer and full connection Layer, as shown in Figure 2.
Input layer is also referred to as word embeding layer, inputs for term vector, and text is input to after feature extraction is converted into term vector Input layer, in the present embodiment, term vector dimension is 64, sequence length 600, and class categories number is 2 classes.
Convolutional layer and pond layer extract advanced features, and convolution kernel is one-dimensional sliding, base of the present invention in text classification In the Aeronautical R&D paper classification method of deep learning, convolution kernel size kernel_size is 5, convolution kernel number num_ Filters is 128;
Convolution algorithm is as follows:
cj=f (W × Xj:j+h-1+b) (1)
Wherein, f is ReLU activation primitive, cjTo be after convolution as a result, W is weight matrix, Xj:j+h-1For window j-j+h-1 Term vector matrix, b is amount of bias.
In embodiment, the pond layer of Text-CNN model is maximum pond layer, the sentence of different length using Max-pool Son becomes fixed length through pond layer, and model parameter is reduced, and helps to improve classification effectiveness.
Pond operation is as follows:
cmax=max (cj) (2)
Wherein, cmaxFor after maximum pond operation as a result, cj(j=1,2 ..., n-h+1) is the result after convolution algorithm.
Finally classification work is completed by full articulamentum, after full articulamentum, exports the probability of each classification.The present embodiment In, full articulamentum neuron is 128, and activation primitive ReLU, dropout retaining ratio is set as 0.5 to prevent over-fitting.
S5: training the Aeronautical R&D paper classification model using second data set, specifically:
S501: second data set is divided into training set, verifying collection and test set, institute according to 6:2:2 division proportion Training set is stated for fitted model parameters, the verifying collection is used for for the hyper parameter adjustment in training process, the test set The generalization ability of testing model after training.
S502: generating vocabulary according to training set, verifying collection and test set, and encoded with One-Hot and generate numerical matrix, The numerical matrix is input to convolutional neural networks to be trained, verify and test.
S6: Aeronautical R&D paper classification is carried out using the Aeronautical R&D paper classification model after training.
The present invention also provides a kind of automatic classification systems for realizing above-mentioned classification method, comprising: number enters module, provides data Input interface, acquire Aeronautical R&D paper data, obtain paper data set;Preprocessing module, for paper data set into Row cleaning pretreatment and Text Pretreatment, obtain the second data set;Training authentication module, based on the second data set to the base of building It is trained and verifies in the Aeronautical R&D paper classification model of Text-CNN text classification algorithm;Application module, based on training Good Aeronautical R&D paper classification model carries out paper classification to Aeronautical R&D paper to be sorted.
The present embodiment is selected to examine the paper classification effect based on Text-CNN using the thought of method of comparative analysis Four kinds of logistic regression, random forest, support vector machines and k nearest neighbor algorithm conventional machines learning algorithms carry out Aeronautical R&D paper Classification, with more each algorithm classification performance, detailed process is as shown in figure 3, step S1-S3 and the present embodiment the method are one It causes, step S4-S5 is the Aeronautical R&D paper classification model established and feature extraction mode difference.
Conventional machines learning algorithm pass through following steps 1) realize this method step S4:
1) disaggregated model of building conventional machines study;
Be respectively adopted LogisticRegression, RandomForestClassifier in the tool box sklearn, KNeighborsClassifier, SVC train logistic regression disaggregated model, random forest disaggregated model, K arest neighbors disaggregated model And support vector cassification model, after repetition training, verifying, test, the available aviation based on conventional machines study Scientific Articles disaggregated model.
Conventional machines learning algorithm pass through following steps 2) realize this method step S5:
2) based on the Text character extraction of TF-IDF;
The reverse document-frequency TF-IDF of word frequency-is mainly made of word frequency TF and reverse document-frequency IDF, specific calculating process Such as following steps:
201): calculating TF
Wherein mijIt is the number that certain word occurs in entire document, ∑tmtjIt is the frequency of occurrence summation of all vocabulary.
202): calculating IDF
Wherein | D | it is the sum of all documents, | { j:wi∈dj+ 1 | show comprising word wiNumber of documents.
203):
TF-IDF=TF × IDF (5)
Wherein, the product of TF-IDF, that is, word frequency TF and reverse document-frequency IDF.
Aeronautical R&D paper classification method based on deep learning and the Aeronautical R&D paper point based on conventional machines study Class method comparison diagram is as shown in Figure 4.
The experimental situation of the present embodiment progress classification experiments and the kit used using PyCharm as shown in figure 5, edited Environment, Python programming language and deep learning frame TensorFlow.
The present embodiment use classifying quality evaluation index have accuracy rate Precision i.e. Pr, recall rate Recall i.e. Re, Harmonic-mean F1, the formula of each index are as follows:
Accuracy rate Pr is used to characterize classification results correctness, the completeness of recall rate Re characterization classification, harmonic-mean F1 value combines accuracy rate and recall rate.
Comprehensive evaluation index harmonic-mean F1 is mainly used to comment the nicety of grading of above 5 kinds of algorithms in embodiment Valence and comparative analysis, classification results are as shown in Figure 6, it can be seen that in the classification of aircraft, the aviation section based on deep learning The recall rate, accuracy rate and F1 value for grinding paper classification method have respectively reached 97%, 98%, 97%;For aero-engine with The two classifications of aircraft, the nicety of grading of Text-CNN have respectively reached 0.95 and 0.97.With conventional machines learning classification side Method is compared, and Text-CNN algorithm can automatically extract and learn to more characteristic of division, and training speed is faster, therefore, this Text-CNN algorithm used by embodiment is preferable to the classifying quality of Aeronautical R&D paper.
The preferred embodiment of the present invention has been described in detail above.It should be appreciated that those skilled in the art without It needs creative work according to the present invention can conceive and makes many modifications and variations.Therefore, all technologies in the art Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Technical solution, all should be within the scope of protection determined by the claims.

Claims (8)

1. a kind of Aeronautical R&D paper classification method based on deep learning, which comprises the steps of:
S1: acquisition Aeronautical R&D paper data obtain paper data set;
S2: cleaning pretreatment is carried out to the paper data set, obtains the first data set;
S3: Text Pretreatment is carried out to first data set, obtains the second data set;
S4: Aeronautical R&D paper classification model is constructed based on Text-CNN text classification algorithm;
S5: second data set training Aeronautical R&D paper classification model is utilized;
S6: Aeronautical R&D paper classification is carried out using the Aeronautical R&D paper classification model after training.
2. the Aeronautical R&D paper classification method according to claim 1 based on deep learning, which is characterized in that step S1 Middle acquisition paper data specifically: paper is crawled in paper library using crawlers, the crawlers are compiled using Python Cheng Yuyan, PyCharm editing environment and Scrapy crawler frame.
3. the Aeronautical R&D paper classification method according to claim 1 based on deep learning, which is characterized in that step S1 Described in Aeronautical R&D paper data include the Aeronautical R&D abstract of a thesis and Aeronautical R&D paper generic.
4. the Aeronautical R&D paper classification method according to claim 1 based on deep learning, which is characterized in that step S2 Described in clean pretreatment specifically: in paper data set abnormal data and repeated data reject.
5. the Aeronautical R&D paper classification method according to claim 1 based on deep learning, which is characterized in that step S3 Described in Text Pretreatment specifically: first data set jieba participle and go stop words to handle and be organized into opinion Digest wants-form of classification.
6. the Aeronautical R&D paper classification method according to claim 1 based on deep learning, which is characterized in that step S5 The Aeronautical R&D paper classification model includes sequentially connected input layer, convolutional layer, pond layer and full articulamentum.
7. the Aeronautical R&D paper classification method according to claim 1 based on deep learning, which is characterized in that step S5 Middle training process specifically:
S501: second data set is divided into training set, verifying collection and test set, feature extraction is carried out to each data set;
S502: based on training set training Aeronautical R&D paper classification model, fitted model parameters;Collected based on the verifying and is adjusted Hyper parameter during training white silk;The extensive energy of Aeronautical R&D paper classification model after training is examined based on the test set Power.
8. the Aeronautical R&D paper classification method according to claim 7 based on deep learning, which is characterized in that the spy Levy the process extracted are as follows: according to data set generation vocabulary, and encode by One-Hot and generate numerical matrix.
CN201910625454.0A 2019-07-11 2019-07-11 A kind of Aeronautical R&D paper classification method based on deep learning Pending CN110516064A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910625454.0A CN110516064A (en) 2019-07-11 2019-07-11 A kind of Aeronautical R&D paper classification method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910625454.0A CN110516064A (en) 2019-07-11 2019-07-11 A kind of Aeronautical R&D paper classification method based on deep learning

Publications (1)

Publication Number Publication Date
CN110516064A true CN110516064A (en) 2019-11-29

Family

ID=68623059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910625454.0A Pending CN110516064A (en) 2019-07-11 2019-07-11 A kind of Aeronautical R&D paper classification method based on deep learning

Country Status (1)

Country Link
CN (1) CN110516064A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241283A (en) * 2020-01-15 2020-06-05 电子科技大学 Rapid characterization method for portrait of scientific research student
CN111651605A (en) * 2020-06-04 2020-09-11 电子科技大学 Lung cancer leading edge trend prediction method based on multi-label classification
CN113342975A (en) * 2021-06-11 2021-09-03 江苏卓易信息科技股份有限公司 Information catalog topic library classification method for data resources
CN113837240A (en) * 2021-09-03 2021-12-24 南京昆虫软件有限公司 Classification system and classification method for education department

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10095992B1 (en) * 2016-07-01 2018-10-09 Intraspexion, Inc. Using classified text, deep learning algorithms and blockchain to identify risk in low-frequency, high value situations, and provide early warning
CN108681610A (en) * 2018-05-28 2018-10-19 山东大学 Production takes turns more and chats dialogue method, system and computer readable storage medium
CN109033402A (en) * 2018-08-02 2018-12-18 上海应用技术大学 The classification method of security fields patent text
CN109062958A (en) * 2018-06-26 2018-12-21 华中师范大学 It is a kind of based on the primary school of TextRank and convolutional neural networks write a composition automatic classification method
CN109189926A (en) * 2018-08-28 2019-01-11 中山大学 A kind of construction method of technical paper corpus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10095992B1 (en) * 2016-07-01 2018-10-09 Intraspexion, Inc. Using classified text, deep learning algorithms and blockchain to identify risk in low-frequency, high value situations, and provide early warning
CN108681610A (en) * 2018-05-28 2018-10-19 山东大学 Production takes turns more and chats dialogue method, system and computer readable storage medium
CN109062958A (en) * 2018-06-26 2018-12-21 华中师范大学 It is a kind of based on the primary school of TextRank and convolutional neural networks write a composition automatic classification method
CN109033402A (en) * 2018-08-02 2018-12-18 上海应用技术大学 The classification method of security fields patent text
CN109189926A (en) * 2018-08-28 2019-01-11 中山大学 A kind of construction method of technical paper corpus

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241283A (en) * 2020-01-15 2020-06-05 电子科技大学 Rapid characterization method for portrait of scientific research student
CN111241283B (en) * 2020-01-15 2023-04-07 电子科技大学 Rapid characterization method for portrait of scientific research student
CN111651605A (en) * 2020-06-04 2020-09-11 电子科技大学 Lung cancer leading edge trend prediction method based on multi-label classification
CN111651605B (en) * 2020-06-04 2022-07-05 电子科技大学 Lung cancer leading edge trend prediction method based on multi-label classification
CN113342975A (en) * 2021-06-11 2021-09-03 江苏卓易信息科技股份有限公司 Information catalog topic library classification method for data resources
CN113837240A (en) * 2021-09-03 2021-12-24 南京昆虫软件有限公司 Classification system and classification method for education department

Similar Documents

Publication Publication Date Title
CN110516064A (en) A kind of Aeronautical R&D paper classification method based on deep learning
CN109189926B (en) Construction method of scientific and technological paper corpus
CN101819601B (en) Method for automatically classifying academic documents
CN110175224B (en) Semantic link heterogeneous information network embedding-based patent recommendation method and device
Sundus et al. A deep learning approach for arabic text classification
CN102194013A (en) Domain-knowledge-based short text classification method and text classification system
CN114048305B (en) Class case recommendation method of administrative punishment document based on graph convolution neural network
CN105260437A (en) Text classification feature selection method and application thereof to biomedical text classification
CN107194617A (en) A kind of app software engineers soft skill categorizing system and method
CN109255029A (en) A method of automatic Bug report distribution is enhanced using weighted optimization training set
Basnet et al. Improving Nepali news recommendation using classification based on LSTM recurrent neural networks
Nguyen et al. An ensemble of shallow and deep learning algorithms for Vietnamese sentiment analysis
CN114265935A (en) Science and technology project establishment management auxiliary decision-making method and system based on text mining
Ali et al. A probabilistic framework for short text classification
Kundana Data Driven Analysis of Borobudur Ticket Sentiment Using Naïve Bayes.
CN112784919A (en) Intelligent manufacturing multi-mode data oriented classification method
Swami et al. Resume classifier and summarizer
Ai Predicting Titanic Survivors by Using Machine Learning
CN117235253A (en) Truck user implicit demand mining method based on natural language processing technology
Mantika et al. Sentiment Analysis on Twitter Using Naïve Bayes and Logistic Regression for the 2024 Presidential Election
Almutairi et al. A Comparative Analysis for Arabic Sentiment Analysis Models In E-Marketing Using Deep Learning Techniques
Sameh et al. Behaviour analysis voting model using social media data
Rajasekar et al. Comparison of machine learning algorithms in domain specific information extraction
Shanthi et al. Machine learning based twitter sentiment analysis on COVID-19
Shifullah et al. Classification of Hotel Reviews Using Sentiment Analysis and Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191129

RJ01 Rejection of invention patent application after publication