CN111797198A - Method for recognizing bad taste discussion of software architecture from text - Google Patents

Method for recognizing bad taste discussion of software architecture from text Download PDF

Info

Publication number
CN111797198A
CN111797198A CN202010539516.9A CN202010539516A CN111797198A CN 111797198 A CN111797198 A CN 111797198A CN 202010539516 A CN202010539516 A CN 202010539516A CN 111797198 A CN111797198 A CN 111797198A
Authority
CN
China
Prior art keywords
text
data set
bad taste
software architecture
question
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010539516.9A
Other languages
Chinese (zh)
Inventor
梁鹏
鲁帆
田方超
李雪莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202010539516.9A priority Critical patent/CN111797198A/en
Publication of CN111797198A publication Critical patent/CN111797198A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for identifying bad taste discussion of a software architecture from a text, which comprises the following steps: 1) performing text crawling on the question and answer posts of the software development professional question and answer community, and constructing a data set for identifying bad taste discussion of a software architecture; 2) preprocessing the simplified text content of the text in the data set; 3) extracting text features from the text in the step 2) by a natural language processing technology to obtain a processed feature vector data set; 4) after the characteristics of each text are obtained, training a secondary classifier by using a training set; 5) predicting the test concentrated documents by the trained classifiers to obtain classification results, and evaluating the bad taste performance of the classifier recognition software architecture; 6) and comparing results, and analyzing the optimal combination of the feature extraction and the classifier. The invention provides an automated method for identifying bad taste discussion of a software architecture, which can quickly obtain the optimal combination of feature extraction and classification models according to setting.

Description

Method for recognizing bad taste discussion of software architecture from text
Technical Field
The invention relates to the technical field of software engineering, in particular to a method for recognizing bad taste discussion of a software architecture from texts.
Background
In all times of software definition, the complexity of software systems is continuously increased, and due to the increase of software development cost and the gradual improvement of the existing software architecture, developers tend to develop and adapt to the existing systems to meet new requirements rather than building a completely new software system. Developers are therefore also required to perform long-term maintenance and upgrades to software applications. Throughout the life cycle of software, its code is undergoing evolutionary modification. During the evolution of software code, the architecture of the software may produce bad tastes that have a significant negative impact on subsequent evolution. Developers need to correct the "bad taste" found in the system to maintain the system. Bad taste can be divided into three categories according to particle size: architecture bad taste, design bad taste, code bad taste. All three bad tastes can cause different damage to the software quality. Where architectural bad taste is a higher order design problem that continuously and cumulatively negatively impacts system maintenance, and reconfiguring architectural bad taste is more time consuming and laborious than configuring code bad taste and designing bad taste. Therefore, researchers need to discuss and identify various types of bad tastes. Developers and researchers have studied bad tastes of software architectures by referring to documents, books or online resources, and even if relevant examples are found, the quality of the examples constrains research progress. The lack of research and lack of use cases have made the bad taste study of the software architecture difficult. Therefore, the method for acquiring and identifying the bad smell of the software architecture needs to be optimized, and examples related to the bad smell of the software architecture and irrelevant examples are distinguished from search results, so that developers can be helped to quickly acquire research cases to promote related research on the bad smell of the software architecture.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method for identifying bad taste discussion of a software architecture from a text, solve the problem of identifying specific subject content from the text, analyze the text content of the question and answer posts of a software development professional question and answer community, and divide the question and answer posts into relevant posts and irrelevant posts of the bad taste of the software architecture by using an automatic classification technology so as to provide a discussion example of the bad taste of the software architecture.
The technical scheme adopted by the invention for solving the technical problems is as follows: a method of identifying a bad taste discussion of a software architecture from text, comprising the steps of:
1) performing text crawling on the question and answer posts of the software development professional question and answer community, manually marking out text posts related to bad taste of the software architecture and irrelevant to the bad taste of the software architecture, using the text posts as a test set and a training set, and constructing a data set for identifying bad taste discussion of the software architecture;
2) preprocessing the simplified text content of the text in the data set;
3) extracting text features from the text in the step 2) by a natural language processing technology to obtain a processed feature vector data set, wherein the feature vector data set comprises: a BoW feature vector dataset, a TF-IDF feature vector dataset, and a Word2Vec feature vector dataset;
4) after the characteristics of each text are obtained, dividing the data set obtained after the processing in the step 3) into a training set and a testing set, and training a second classifier by using the training set;
the method comprises the following specific steps:
respectively training an LR classifier, an RF classifier, an SVM classifier and a KNN classifier according to the three feature data sets obtained in the step 3), and obtaining 3 kinds of classifiers with various combinations of feature extraction and the classification models; predicting the documents in the test set by using the trained classifiers to obtain classification results;
5) the trained classifiers predict the test concentrated documents to obtain classification results, and the performance of the classifier recognition software architecture bad taste question-answer sticker is evaluated, wherein the performance evaluation adopts the following four indexes: accuracy (Accuracy), Precision (Precision), Recall (Recall), and F1-score;
6) and comparing results, analyzing to obtain the optimal combination of the feature extraction and the classifier, and identifying by using the classification model of the final combination.
According to the scheme, the step 1) specifically comprises the following steps:
step 1.1) crawling text data; firstly, searching question-answer labels from a software development question-answer community by taking bad taste of a software architecture as a key word, extracting all the question-answer labels related to the bad taste of the software architecture from a search result, and recording URL links; then, randomly extracting a similar number of irrelevant postings from the irrelevant postings screened out from the search result, and recording URL links; thus forming a balanced data set.
And step 1.2) crawling a title-query-answer in each question-answer by using a URL link, manually marking the question-answers which are related or unrelated to bad taste of the software system structure, and storing the question-answers into a CSV file for use in the subsequent steps.
According to the scheme, the pretreatment in the step 2) comprises the following steps: cleaning data, removing the original form reduction of useless characters and words;
the data is cleaned to delete useless characters and escape characters contained in the webpage text;
the useless character removal is to delete words with the length of less than 3 letters and to perform English stop word processing on the text;
and the original form reduction of the words comprises stem reduction and morphology reduction, and deformed words of all words in the text are reduced into the original forms of the words by utilizing an NLTK toolkit.
According to the scheme, in the step 3), the text features of the text in the step 2) are extracted through a natural language processing technology to obtain a processed feature vector data set
The method comprises the following specific steps:
step 3.1) processing the data set obtained in the step 2) by using a Bag-of-Words technology, calculating the frequency of each word in each document in the text data set, combining the frequency numbers of all the Words into a feature vector of the document, and storing the feature vectors of all the documents obtained in the step as a BoW feature vector data set;
step 3.2) processing the data set obtained in the step 2) by using a TF-IDF technology, calculating a TF value and an IDF value of each word in each document in the text data set, multiplying the TF value and the IDF value to obtain a TF-IDF value serving as the feature of the document, and storing the TF-IDF value as a TF-IDF feature vector data set;
and 3.3) processing the data set obtained in the step 2) by using a Word2Vec technology, converting each Word in each document in the text data set into a vector value in a feature space through a mapping function, averaging vectors of all words in one text to be used as the feature of the document, and storing the feature vectors of all the documents as a Word2Vec feature vector data set.
The invention has the following beneficial effects: an automated technique for recognizing software architecture bad taste discussions from text is provided that can quickly obtain an optimal combination of feature extraction and classification models based on settings.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of a method of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, a method of identifying a bad taste discussion of a software architecture from text, comprising the steps of:
step 1, crawling the question and answer posts of a software development professional question and answer community, and manually marking out text posts related and unrelated to bad taste of a software architecture, thereby constructing a data set for automatically identifying the bad taste discussion of the software architecture;
step 1.1, experimental data are crawled. First, a total of 5950 pieces of data are searched for from 14 pieces of words of "architecture cell", "architecture base defect", "architecture vision", "architecture workflow reagent", "architecture reagent-pattern", and "architecture reagent-pattern" as keywords from the software development question-and-answer community. And selecting the top five items (700) which are most relevant from the search results of each keyword for manual screening, and performing duplicate removal processing on repeated question and answer stickers. After each word is carefully read, all bad-taste related software architecture postings (208) in the result are extracted and URL links thereof are recorded, and then irrelevant postings (187) of the same order of magnitude are randomly extracted from the remaining 492 bad-taste irrelevant postings, and the URL links are recorded, thereby forming a balanced data set.
Step 1.2, crawling the title-query-answer in each question-answer by using the URL link, manually marking the question-answers which are related or unrelated to bad taste of the software system structure, and storing the question-answers into a CSV file for use in the subsequent steps.
Step 2, preprocessing the text in the data set, for example, removing irrelevant characters, vocabularies and the like, and obtaining relatively simplified text content;
the method specifically comprises the following steps:
and 2.1, cleaning the data. Useless characters, such as "… …" and "/" and a series of escape characters often contained in web page text, are first removed.
And 2.2, removing useless characters. And deleting the words with the word length of less than 3 letters by using an NLTK toolkit, and performing English stop word processing on the text.
And 2.3, stem reduction (stemming) and morphology reduction (lemmatization). With the NLTK toolkit, morphemes of all words in the text are restored to the original forms of the words, such as locked and locking are restored to lock.
Step 3, processing the text by using the data set processed in the step 2 and utilizing a natural language processing technology to extract text features;
and (3) processing the data set obtained in the step (2) by using a TF-IDF characteristic extraction technology, calculating a TF value and an IDF value of each word in each document in the text data set, multiplying the TF value and the IDF value to obtain a TF-IDF value serving as the characteristic of the document, and storing the TF-IDF value as a characteristic vector data set.
Step 4, after the characteristics of each text are obtained, dividing the data set obtained after the processing in the step 3 into a training set and a testing set, and training a Random Forest (Random Forest) secondary classifier by using the training set; predicting sentences in the test set by using the trained classifier;
step 5, predicting the test concentrated documents by the trained classifiers to obtain classification results, and evaluating the performance of the classifiers for identifying the bad taste question and answer labels of the software system structure, wherein the performance evaluation adopts the following four indexes: accuracy (Accuracy), Precision (Precision), Recall (Recall), and F1-score.
Through calculation, the accuracy of the classification result obtained by the scheme is 0.643, the accuracy is 0.645, the recall rate is 0.740, and the F1-score is 0.678, which shows that the method can effectively identify the bad taste discussion of the software architecture from the text, and the question and answer is divided into the relevant section and the irrelevant section of the bad taste of the software architecture so as to provide a discussion example of the bad taste of the software architecture.
The classifier in the above scheme of the invention is suitable for identifying the bad taste discussion of the software architecture from the text of the professional question-and-answer community for software development, and in order to expand the application range of the technical scheme of the invention, the invention also provides a method for identifying the bad taste discussion of the software architecture from the text, which comprises the following steps:
step 1, crawling the question and answer posts of a software development professional question and answer community, and manually marking out text posts related and unrelated to bad taste of a software architecture, thereby constructing a data set for automatically identifying the bad taste discussion of the software architecture;
step 1.1, experimental data are crawled. First, a total of 5950 pieces of data are searched for from 14 pieces of words of "architecture cell", "architecture base defect", "architecture vision", "architecture workflow reagent", "architecture reagent-pattern", and "architecture reagent-pattern" as keywords from the software development question-and-answer community. And selecting the top five items (700) which are most relevant from the search results of each keyword for manual screening, and performing duplicate removal processing on repeated question and answer stickers. After each sentence is carefully read, all bad-taste relevant software architecture postings (208 in total) are extracted, URL links are recorded, and then the same number of irrelevant postings (187 in total) are randomly extracted from the screened bad-taste irrelevant postings of the software architecture, and the URL links are recorded.
Step 1.2, crawling the title-query-answer in each question-answer by using the URL link, manually marking the question-answers which are related or unrelated to bad taste of the software system structure, and storing the question-answers into a CSV file for use in the subsequent steps.
Step 2, preprocessing the text in the data set, for example, removing irrelevant characters, vocabularies and the like, and obtaining relatively simplified text content;
the method specifically comprises the following steps:
and 2.1, cleaning the data. Useless characters, such as "… …" and "/" and a series of escape characters often contained in web page text, are first removed.
And 2.2, removing useless characters. And deleting the words with the word length of less than 3 letters by using an NLTK toolkit, and performing English stop word processing on the text.
And 2.3, stem reduction (stemming) and morphology reduction (lemmatization). With the NLTK toolkit, morphemes of all words in the text are restored to the original forms of the words, such as locked and locking are restored to lock.
Step 3, processing the text by using the data set processed in the step 2 and utilizing a natural language processing technology to extract text features;
the method specifically comprises the following steps:
and 3.1, processing the data set obtained in the step 2 by using a Bag-of-words (BoW) technology, calculating the frequency of each word in each document in the text data set, and combining the frequency of all the words into a feature vector of the document. And storing the feature vectors of all the documents obtained in the step as a BoW feature vector data set.
And 3.2, processing the data set obtained in the step 2 by using a TF-IDF (Term Frequency-Inverse Document Frequency) technology, calculating a TF value and an IDF value of each word in each Document in the text data set, multiplying the TF value and the IDF value to obtain a TF-IDF value serving as the feature of the Document, and storing the TF-IDF value as a TF-IDF feature vector data set.
And 3.3, processing the data set obtained in the step 2 by using a Word2Vec technology, converting each Word in each document in the text data set into a vector value in a feature space through a mapping function, and averaging vectors of all words in one text to serve as the feature of the document. The feature vectors of all documents are saved as Word2Vec feature vector datasets.
Specifically, steps 3.1 to 3.3 are further operations based on step 2, and steps 3.1 to 3.3 are executed in parallel.
And 4, after the characteristics of each text are obtained, dividing the data set obtained after the processing in the step 3 into a training set and a testing set, and training a secondary classifier by using the training set. And predicting sentences in the test set by using the trained classifier.
And 4.1, respectively training an LR classifier (the parameter of the LR classifier uses a default value) by using an LR (logical regression) classification technology and the three characteristic data sets obtained in the step 3, and predicting the classification result of the documents in the test set by using the trained LR classifier.
And 4.2, training an RF classifier by using an RF (random forest) classification technology and using the three feature data sets obtained in the step 3 (the parameters of the RF classifier use default values), and predicting the classification result of the documents in the test set by using the trained RF classifier.
And 4.3, training an SVM classifier by using an SVM (support Vector machine) classification technology and the three feature data sets obtained in the step 3 (the parameters of the SVM classifier use default values), and predicting the classification result of the documents in the test set by using the trained SVM classifier.
And 4.4, training a KNN classifier by using the three feature data sets obtained in the step 3 by using a KNN (k-Nearest Neighbors) classification technology (the parameters of the KNN classifier use default values), and predicting the classification result of the documents in the test set by using the trained KNN classifier.
Specifically, steps 4.1 to 4.4 are further operations based on step 3, and steps 4.1 to 4.4 are executed in parallel.
Step 5, four indexes of Accuracy (Accuracy), Precision (Precision), Recall (Recall) and F1-score were used to evaluate the performance of the classifier recognition software architecture bad taste question-answer patch.
And 5.1, calculating four evaluation indexes of Accuracy, Precision, Recall and F1-score of 12 algorithm combinations of 3 feature extractions and 4 classification models.
And 5.2, comparing results, and analyzing an optimal feature extraction algorithm, an optimal classification model algorithm and an optimal combination algorithm.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims (5)

1. A method for recognizing a bad taste discussion of a software architecture from text, comprising the steps of:
1) performing text crawling on the question and answer posts of the software development professional question and answer community, manually marking out text posts related to bad taste of the software architecture and irrelevant to the bad taste of the software architecture, using the text posts as a test set and a training set, and constructing a data set for identifying bad taste discussion of the software architecture;
2) preprocessing the simplified text content of the text in the data set;
3) extracting text features from the text in the step 2) by a natural language processing technology to obtain a processed feature vector data set, wherein the feature vector data set comprises: a BoW feature vector dataset, a TF-IDF feature vector dataset, and a Word2Vec feature vector dataset;
4) after the characteristics of each text are obtained, dividing the data set obtained after the processing in the step 3) into a training set and a testing set, and training two classifiers in a classifier set by using the training set;
5) the trained classifiers predict the test concentrated documents to obtain classification results, and the performance of the classifier recognition software architecture bad taste question-answer sticker is evaluated, wherein the performance evaluation adopts the following four indexes: accuracy, precision, recall and F1-score;
6) comparing the results, analyzing to obtain the optimal combination of feature extraction and classifier, and identifying the bad taste discussion of the software architecture from the text by using the classification model of the final combination.
2. The method for identifying a bad taste discussion of a software architecture from a text according to claim 1, wherein the step 1) comprises the following steps:
step 1.1) crawling text data; firstly, searching question-answer labels from a software development question-answer community by taking bad taste of a software architecture as a key word, extracting all the question-answer labels related to the bad taste of the software architecture from a search result, and recording URL links; then, randomly extracting a similar number of irrelevant postings from the irrelevant postings screened out from the search result, and recording URL links; thus forming a balanced data set.
And step 1.2) crawling a title-query-answer in each question-answer by using a URL link, manually marking the question-answers which are related or unrelated to bad taste of the software system structure, and storing the question-answers into a CSV file for use in the subsequent steps.
3. The method for recognizing software architecture bad taste discussions from text according to claim 1, wherein said preprocessing in step 2) comprises: cleaning data, removing the original form reduction of useless characters and words;
the data is cleaned to delete useless characters and escape characters contained in the webpage text;
the useless character removal is to delete words with the length of less than 3 letters and to perform English stop word processing on the text;
and the original form reduction of the words comprises stem reduction and morphology reduction, and deformed words of all words in the text are reduced into the original forms of the words by utilizing an NLTK toolkit.
4. The method for identifying bad taste discussions in software architecture from texts as claimed in claim 1, wherein said step 3) of extracting text features from the texts of step 2) by natural language processing technique to obtain processed feature vector data set
The method comprises the following specific steps:
step 3.1) processing the data set obtained in the step 2) by using a Bag-of-Words technology, calculating the frequency of each word in each document in the text data set, combining the frequency numbers of all the Words into a feature vector of the document, and storing the feature vectors of all the documents obtained in the step as a BoW feature vector data set;
step 3.2) processing the data set obtained in the step 2) by using a TF-IDF technology, calculating a TF value and an IDF value of each word in each document in the text data set, multiplying the TF value and the IDF value to obtain a TF-IDF value serving as the feature of the document, and storing the TF-IDF value as a TF-IDF feature vector data set;
and 3.3) processing the data set obtained in the step 2) by using a Word2Vec technology, converting each Word in each document in the text data set into a vector value in a feature space through a mapping function, averaging vectors of all words in one text to be used as the feature of the document, and storing the feature vectors of all the documents as a Word2Vec feature vector data set.
5. The method for recognizing a bad taste discussion of a software architecture from a text according to claim 1, wherein the step 4) is specifically as follows:
respectively training an LR classifier, an RF classifier, an SVM classifier and a KNN classifier according to the three feature data sets obtained in the step 3), and obtaining 3 kinds of classifiers with various combinations of feature extraction and the classification models; and predicting the documents in the test set by using the trained classifiers to obtain a classification result.
CN202010539516.9A 2020-06-14 2020-06-14 Method for recognizing bad taste discussion of software architecture from text Pending CN111797198A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010539516.9A CN111797198A (en) 2020-06-14 2020-06-14 Method for recognizing bad taste discussion of software architecture from text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010539516.9A CN111797198A (en) 2020-06-14 2020-06-14 Method for recognizing bad taste discussion of software architecture from text

Publications (1)

Publication Number Publication Date
CN111797198A true CN111797198A (en) 2020-10-20

Family

ID=72802913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010539516.9A Pending CN111797198A (en) 2020-06-14 2020-06-14 Method for recognizing bad taste discussion of software architecture from text

Country Status (1)

Country Link
CN (1) CN111797198A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170242691A1 (en) * 2016-02-18 2017-08-24 King Fahd University Of Petroleum And Minerals Apparatus and methodologies for code refactoring
CN108875741A (en) * 2018-06-15 2018-11-23 哈尔滨工程大学 It is a kind of based on multiple dimensioned fuzzy acoustic picture texture characteristic extracting method
CN109002473A (en) * 2018-06-13 2018-12-14 天津大学 A kind of sentiment analysis method based on term vector and part of speech
CN109388800A (en) * 2018-09-30 2019-02-26 江苏师范大学 A kind of short text sentiment analysis method based on adding window term vector feature
CN110413746A (en) * 2019-06-25 2019-11-05 阿里巴巴集团控股有限公司 The method and device of intention assessment is carried out to customer problem

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170242691A1 (en) * 2016-02-18 2017-08-24 King Fahd University Of Petroleum And Minerals Apparatus and methodologies for code refactoring
CN109002473A (en) * 2018-06-13 2018-12-14 天津大学 A kind of sentiment analysis method based on term vector and part of speech
CN108875741A (en) * 2018-06-15 2018-11-23 哈尔滨工程大学 It is a kind of based on multiple dimensioned fuzzy acoustic picture texture characteristic extracting method
CN109388800A (en) * 2018-09-30 2019-02-26 江苏师范大学 A kind of short text sentiment analysis method based on adding window term vector feature
CN110413746A (en) * 2019-06-25 2019-11-05 阿里巴巴集团控股有限公司 The method and device of intention assessment is carried out to customer problem

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CAGATAY CATAL等: "A sentiment classification model based on multiple classifiers", 《APPLIED SOFT COMPUTING》 *
ROBERT DZISEVIČ等: "Text Classification using Different Feature Extraction Approaches", 《2019 OPEN CONFERENCE OF ELECTRICAL, ELECTRONIC AND INFORMATION SCIENCES (ESTREAM)》 *
刘晓鹏等: "面向短文本分类的特征提取与算法研究", 《信息技术与网络安全》 *

Similar Documents

Publication Publication Date Title
CN109101477B (en) Enterprise field classification and enterprise keyword screening method
CN113011533A (en) Text classification method and device, computer equipment and storage medium
CN108255813B (en) Text matching method based on word frequency-inverse document and CRF
CN107944014A (en) A kind of Chinese text sentiment analysis method based on deep learning
CN103995876A (en) Text classification method based on chi square statistics and SMO algorithm
CN112836509A (en) Expert system knowledge base construction method and system
CN109710725A (en) A kind of Chinese table column label restoration methods and system based on text classification
CN113312476A (en) Automatic text labeling method and device and terminal
CN110910175A (en) Tourist ticket product portrait generation method
CN113590764B (en) Training sample construction method and device, electronic equipment and storage medium
CN115952292A (en) Multi-label classification method, device and computer readable medium
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering
CN103020286A (en) Internet ranking list grasping system based on ranking website
CN106815209B (en) Uygur agricultural technical term identification method
Gelman et al. A language-agnostic model for semantic source code labeling
CN114265935A (en) Science and technology project establishment management auxiliary decision-making method and system based on text mining
CN112328469B (en) Function level defect positioning method based on embedding technology
CN117520561A (en) Entity relation extraction method and system for knowledge graph construction in helicopter assembly field
CN112487263A (en) Information processing method, system, equipment and computer readable storage medium
CN116049376B (en) Method, device and system for retrieving and replying information and creating knowledge
Suprayogi et al. Information extraction for mobile application user review
Ameri et al. Smart semi-supervised accumulation of large repositories for industrial control systems device information
CN117235253A (en) Truck user implicit demand mining method based on natural language processing technology
CN111797198A (en) Method for recognizing bad taste discussion of software architecture from text
Wibawa et al. Classification Analysis of MotoGP Comments on Media Social Twitter Using Algorithm Support Vector Machine and Naive Bayes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination