CN113011180A - Defect report severity prediction method based on description keyword extraction - Google Patents
Defect report severity prediction method based on description keyword extraction Download PDFInfo
- Publication number
- CN113011180A CN113011180A CN202110412776.4A CN202110412776A CN113011180A CN 113011180 A CN113011180 A CN 113011180A CN 202110412776 A CN202110412776 A CN 202110412776A CN 113011180 A CN113011180 A CN 113011180A
- Authority
- CN
- China
- Prior art keywords
- defect
- vector
- description
- severity
- abstract
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000007547 defect Effects 0.000 title claims abstract description 187
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000000605 extraction Methods 0.000 title claims abstract description 20
- 239000013598 vector Substances 0.000 claims abstract description 84
- 230000011218 segmentation Effects 0.000 claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 15
- 230000009467 reduction Effects 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 12
- 238000007477 logistic regression Methods 0.000 claims abstract description 6
- 239000004576 sand Substances 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims 1
- 239000013589 supplement Substances 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 238000011282 treatment Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a defect report severity degree prediction method based on description keyword extraction, which selects defect abstract, defect description and severity degree of corresponding software project in a defect tracking system; performing word segmentation, word stop removal and word shape reduction processing on the defect abstract; carrying out character string replacement, keyword extraction, word segmentation, word stop removal and word shape reduction processing on the defect description; training and constructing a word vector model for the defect abstract and the defect description respectively based on the severity so as to obtain corresponding vectors; acquiring a defect report severity prediction model by adopting a logistic regression classification method based on the vectors; the model is used to predict the severity of a defect report in a software project. The invention has the beneficial effects that: the method adopts the keywords extracted from the defect description to supplement the defect abstract, and can realize better model prediction performance.
Description
Technical Field
The invention relates to the technical field of software quality assurance, in particular to a method for predicting the severity of a defect report based on extraction of description keywords.
Background
With the development of internet technology, software engineering technology is correspondingly developed, and as the number of software projects is increasing day by day, the software projects have larger or smaller software defects inevitably, and 90% of the software defects seriously affect the experience of users, it is particularly important to track and manage the defects in the software projects. In a defect report tracking system, defect reports are used for problems encountered by users submitting feedback. The severity in the defect report can be used for testing the reasonable distribution of the defect report by distributor and the quick repair of the defect by the developer, thereby reducing the workload of manual distribution and realizing the quick repair of the defect.
For the above situation, the prediction of the severity of the defect report is performed by text preprocessing according to the content of the defect report, and finally the prediction of the severity attribute value in the defect report is realized. At present, the defect abstract is generally adopted as a training data set for predicting the severity of the defect report, and the text of the defect abstract is less, so that the performance of a severity prediction model is limited.
How to solve the above technical problems is the subject of the present invention.
Disclosure of Invention
In order to solve the problem, the invention provides a defect report severity degree prediction method based on description keyword extraction, aiming at the current prediction of the severity degree of the defect report, a defect abstract is usually adopted as a training data set, the text of the defect abstract is less, and the performance of a severity degree prediction model is limited, so that the model performance can be further enhanced by supplementing the defect abstract with the rest content in the defect report.
The invention provides a defect report severity prediction method based on description keyword extraction, which comprises the following steps:
(1) selecting defect reports with the states of CLOSED and FIXED and the severity of Blocker, Critical, Major, Minor and Trivisual from a defect tracking system in which the project is positioned, downloading data of the defect reports, wherein downloaded fields comprise defect abstract, defect description and severity of the defect report, and forming a data set based on the downloaded fields;
(2) for dataThe text in the concentrated defect abstract field is sequentially subjected to word segmentation, word stop removal and word shape reduction to obtain a corresponding word segmentation set Ts;
(3) Using said set of part-words TsUsing a word embedding method FastText training and obtaining a abstract word vector model F according to the severity of the defect report in the data setsThe defect abstract is subjected to vector representation by using the model, and the method specifically comprises the following steps: vector model F based on abstract wordssObtaining the vector of each participle in the defect abstract, and summing the vectors of each participle in the defect abstract to obtain a defect abstract vector Es;
(4) Extracting and expressing the keywords of the defect description field in the data set to obtain a defect description vector Ed;
(5) Merging the defect digest vector EsAnd the defect description vector EdAs an input vector Einput;
(6) Based on the input vector EinputAnd the severity of the defect report in the data set, training and obtaining a prediction model of the severity of the defect report by using a logistic regression classification method;
(7) inputting a new defect report, processing the defect abstract in the step (2), processing the defect description in the step (4), combining two vectors based on the step (5), and inputting the defect report severity prediction model obtained in the step (6) to obtain a final prediction result;
in the step (4), keyword extraction and representation are carried out on the defect description field in the experimental data set to obtain a defect description vector EdThe method specifically comprises the following steps:
1) the method for replacing the character strings of the defect description fields in the data set by utilizing a regularization method comprises the following steps: matching the content containing the URL and replacing the content by using a 'URL' character string, outputting the content by using a matching console and replacing the content by using a 'console' character string, matching a code segment and replacing the content by using a 'code' character string, and then carrying out word segmentation, word stop removal and word shape reduction on the content to obtain a corresponding word segmentation set;
2) based on the set of parts of speechExtracting a keyword set T of defect description by using a keyword extraction method Textrankd;
3) Based on the keyword set TdTraining a defect descriptor vector by using a word embedding method FastText according to the severity corresponding to the data set to obtain a descriptor vector model FdAnd performing defect description vector representation on the defect description by using the model, specifically comprising the following steps: vector model F based on descriptorsdObtaining the vector of each keyword in the defect description, and summing the vectors of each keyword in the defect description to obtain a defect description vector Ed。
Compared with the prior art, the embodiment of the invention has the beneficial effects that: the invention utilizes keywords extracted from defect description to supplement data of defect abstract, can increase the number of data sets, and further improves the prediction performance of a defect report severity prediction model, wherein, the keywords obtained by a keyword extraction method Textrank can realize the improvement of the performance of the defect report severity prediction model and simultaneously compress the information of defect description, namely: better model performance is achieved with fewer keywords.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a flowchart illustrating a method for predicting severity of defect reports based on extraction of description keywords according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments are further detailed. Of course, the specific embodiments described herein are merely illustrative of the invention and are not intended to be limiting.
Example 1
Referring to fig. 1, the present invention provides a method for predicting severity of a defect report based on extraction of description keywords, the method comprising the steps of:
(1) and selecting the defect reports with the states of CLOSED and FIXED and the severity degrees of Blocker, Critical, Major, Minor and Trivisual from the defect tracking system in which the project is positioned, downloading the data of the defect reports, wherein the downloaded fields comprise defect abstract, defect description and severity degrees of defect report, and finally forming a data set.
This embodiment takes the Eclipse project in the Bugzilla defect tracking system as the subject of the experiment and downloads a data set including defect digest, defect description and defect report severity. We consider only 5 severity levels for defect reporting in the dataset, and remove the default option Normal and Enhancement that do not belong to true defects, because in the field of prediction of severity of defect reporting, both the Normal and Enhancement severity levels are considered as noisy data and cannot help in the construction of prediction models of severity of defect reporting. For the rest 5 severity degrees, the Blocker, Critical and Major are combined into a 'severe' category, the Minor and Trivial are combined into a 'non-severe' category, and the 'severe' category and the 'non-severe' category are utilized to train a defect report severity degree prediction model. The number of defect reports for different severity levels is shown in table 1.
TABLE 1 number of Defect reports at varying severity
(2) The text in the defect abstract field in the data set is subjected to word segmentation, word removal and word shape reduction in sequence to obtain a corresponding word segmentation set Ts。
The problem of predicting the severity of the defect report can be modeled as a text classification problem, which firstly needs to carry out word segmentation, word stop removal and word shape reduction processing on text content to obtain a corresponding word segmentation set Ts。
Wherein,
word segmentation: a defect report is divided into a series of words, content which does not belong to words such as punctuation marks can be removed, and the original text data is subjected to preliminary processing.
Stop words: words that appear frequently but have little practical meaning, such as 'the', 'is', 'at', 'which', 'on', etc., are removed, and removal of stop words may improve model performance and reduce text size.
And (3) shape reduction: the words in various forms are restored to the root word form, and in real text data, the words have different tense morphemes, but the meaning of the words is similar in different states. For example, the words "make", "keys" and "making", the words in different states can increase the redundancy of text information, thereby causing the performance of the model to be reduced. Therefore, it is necessary to perform morphological restoration on words.
Table 2 shows the word segmentation set T obtained by the defect abstract through the stepss。
TABLE 2 recovery of word segmentation, stop word and shape of defect abstract
(3) Using said set of part-words TsUsing a word embedding method FastText training and obtaining a abstract word vector model F according to the severity of the defect report in the data setsThe defect abstract is subjected to vector representation by using the model, and the method specifically comprises the following steps: vector model F based on abstract wordssObtaining the vector of each participle in the defect abstract, and summing the vectors of each participle in the defect abstract to obtain a defect abstract vector Es。
The word embedding method FastText is a word vector representation algorithm which is integrated into a character string-based method and can understand the morphological characteristics of words. FastText uses a hierarchical classifier and a hierarchical Softmax, and establishes a tree structure for representing categories by using a Huffman tree algorithm, so that the number of model prediction targets can be reduced, and the complexity of calculation can be reduced. In the embodiment, the parameter setting in the word embedding method FastText adopts the parameter setting of an original thesis, specifically, the window size n-gram is 3, and the word vector dimension is 10.
Based on the participle set T obtained in the step (2)sTraining the severity of the defect report in the data set and constructing a abstract word vector model FsWhen obtaining the abstract word vector model FsThen, inputting the participles in a defect abstract into the model and obtaining the vector representation method thereof, and summing the vectors of each participle in the defect abstract to obtain a defect abstract vector Es。
(4) Extracting and expressing the keywords of the defect description field in the data set to obtain a defect description vector Ed。
There is more data in the defect description field than in the defect summary. The effect of the model for predicting the severity of the defect report by using the defect description is slightly inferior to that of the model for predicting the severity of the defect report by using the defect abstract, so that the final performance of the model is interfered due to the noise of the defect description data. Therefore, the keyword extraction method Textrank is used for extracting keywords in the defect description for data supplement, and the main steps are as follows:
1) the method for replacing the character strings of the defect description fields in the data set by utilizing a regularization method comprises the following steps: matching the content containing the URL and replacing the content by using a 'URL' character string, outputting the content by using a matching console and replacing the content by using a 'console' character string, matching a code segment and replacing the content by using a 'code' character string, and then carrying out word segmentation, word stop removal and word shape reduction on the content to obtain a corresponding word segmentation set.
2) Extracting a keyword set T of defect description by using a keyword extraction method Textrank based on the word segmentation setd. The keyword extraction method Textrank is an algorithm for representing a given text relation based on a graph. The algorithm firstly constructs a segmentation dictionary and a graph model according to the segmentation, and carries out score calculation on the segmentation by using a PageRank algorithm on the basis of the graph model, wherein top-k segmentation is considered as a keyword. The relation among the participles can be better understood.
3) Based on the keyword set TdTraining a defect descriptor vector by using a word embedding method FastText according to the severity corresponding to the data set to obtain a descriptor vector model FdUsing the model to perform defect descriptionThe defect description vector represents, specifically: vector model F based on descriptorsdObtaining the vector of each keyword in the defect description, and summing the vectors of each keyword in the defect description to obtain a defect description vector Ed。
The word segmentation, word stop removal and word shape reduction are the same as the step (2), and a word vector model FdDefect description vector EdThe acquisition method is similar to the step (2). Table 3 shows that the defect description is processed correspondingly to obtain a keyword set TdThe process of (1).
Table 3 Defect description processing
(5) Merging the defect digest vector EsAnd the defect description vector EdAs an input vector Einput。
Obtaining a defect abstract vector E according to the step (3)sAnd the defect description vector E obtained in the step (4)dThe vector vectors are concatenated, and since the word vector dimension, highest, is 10 in this embodiment, the length of the word vector after concatenation is 20.
(6) Based on the input vector EinputAnd training and obtaining a defect report severity prediction model by using a logistic regression classification method together with the defect report severity in the data set.
The logistic regression classification method is a classical classification algorithm in statistics, and can be used for two-classification or multi-classification problems.
In one of the classification problems, the classification problem,
in the above formula, x represents the input word vector, and y represents the classification result, where y includes k classes. WkAnd bkAre two parameters in the logistic regression method. In this experiment, k is 2, specifically two categories, "severe" and "not severe".
Adopting a logistic classification regression method based on the input vector E obtained in the step (5)inputAnd training and constructing a defect report severity prediction model with the defect report severity in the data set.
(7) Inputting a new defect report, processing the defect abstract in the step (2), processing the defect description in the step (4), combining two vectors based on the step (5), and inputting the defect report severity prediction model obtained in the step (6) to obtain a final prediction result.
When inputting a new defect report, the invention firstly adopts the step (2) to carry out the treatments of word segmentation, stop word removal and word shape reduction on the defect abstract to obtain a corresponding word segmentation set TsUsing the abstract word vector model F in step (3)sAnd carrying out vector representation on the defect abstract. Adopting the step (4) to replace character strings of the defect description, then carrying out word segmentation, word stop removal and word shape reduction processing on the defect description to obtain a corresponding word segmentation set, and then adopting a keyword extraction method Textrank to extract a keyword set T of the defect descriptiondUsing the descriptor vector model F of step (4)dAnd carrying out vector representation on the defect description. And (5) merging the obtained vectors, and inputting the defect report severity prediction model obtained in the step (6) to obtain a final prediction result.
In the examples, experiments were performed using the dataset of Eclipse project. The data set is first partitioned into a training set and a test set in a time series order of 8: 2. And (3) evaluating the performance of the defect report severity prediction model by using three common evaluation indexes of F-Measure, Precision and Recall. The calculation formula is as follows:
TABLE 4 confusion matrix
Predicted | |||
Positive | Negative | ||
Actual | Positive | TP | FN |
Negative | FP | TN |
Where the confusion matrix is shown in table 4. TP + FP + TN + FN is the total number of samples, F-Measure is the harmonic mean value of recall ratio and precision ratio, and when the F-Measure value is higher, the method is more effective.
Based on the Eclipse data set, the experimental results obtained under the two standards of a naive Bayes method and a k nearest neighbor method are shown in Table 5.
TABLE 5 results of the experiment
F-Measure | Precision | Recall | |
Naive Bayes method | 0.660 | 0.652 | 0.681 |
k nearest neighbor method | 0.656 | 0.674 | 0.646 |
The invention | 0.704 | 0.738 | 0.687 |
According to the experimental result, the method has better model effect on 3 evaluation indexes. In addition, the key words in the defect description are extracted to supplement the defect abstract, and compared with the method using the defect description, the defect abstract has the advantages that the added data amount is less, and the model performance is better.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (2)
1. A defect report severity prediction method based on description keyword extraction is characterized by comprising the following steps:
s1, selecting a defect report with the states of CLOSED and FIXED and the severity of Blocker, Critical, Major, Minor and trivisual from a defect tracking system where the project is located, downloading data of the defect report, wherein downloaded fields comprise defect abstract, defect description and severity of the defect report, and forming a data set based on the data;
s2, carrying out word segmentation, word stop removal and word shape reduction on the text in the defect abstract field in the data set in sequence to obtain a corresponding word segmentation set Ts;
S3, utilizing the participle set T in the step S2sUsing a word embedding method FastText training and obtaining a abstract word vector model F according to the severity of the defect report in the data setsThe defect abstract is subjected to vector representation by using the model, and the method specifically comprises the following steps: vector model F based on abstract wordssObtaining the vector of each participle in the defect abstract, and summing the vectors of each participle in the defect abstract to obtain a defect abstract vector Es;
S4, defect description word in data setExtracting and representing key words to obtain a defect description vector Ed;
S5, merging the defect abstract vector EsAnd the defect description vector EdAs an input vector Einput;
S6, based on the input vector EinputAnd the severity of the defect report in the data set, training and obtaining a prediction model of the severity of the defect report by using a logistic regression classification method;
and S7, inputting a new defect report, processing the defect abstract in the step S2, processing the defect description in the step S4, merging two vectors based on the step S5, and inputting the defect report severity prediction model obtained in the step S6 to obtain a final prediction result.
2. The method for predicting the severity of defect report based on keyword extraction as claimed in claim 1, wherein in said step S4, keyword extraction and representation are performed on the defect description fields in the experimental data set to obtain a defect description vector EdThe method specifically comprises the following steps:
s401, replacing character strings of the defect description fields in the data set by utilizing a regularization method, wherein the method comprises the following steps: matching the content containing the URL and replacing the content by using a 'URL' character string, outputting the content by using a matching console and replacing the content by using a 'console' character string, matching a code segment and replacing the content by using a 'code' character string, and then performing word segmentation, word stop removal and word shape reduction on the content to obtain a corresponding word segmentation set;
s402, extracting a keyword set T of defect description by using a keyword extraction method Textrank based on the word segmentation setd;
S403, based on the keyword set TdTraining a defect descriptor vector by using a word embedding method FastText according to the severity corresponding to the data set to obtain a descriptor vector model FdAnd performing defect description vector representation on the defect description by using the model, specifically comprising the following steps: vector model F based on descriptorsdObtaining a vector of each keyword in the defect description, and performing defect detection on the obtained vectorSumming vectors of each keyword in the defect description to obtain a defect description vector Ed。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110412776.4A CN113011180B (en) | 2021-04-16 | 2021-04-16 | Defect report severity prediction method based on description keyword extraction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110412776.4A CN113011180B (en) | 2021-04-16 | 2021-04-16 | Defect report severity prediction method based on description keyword extraction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113011180A true CN113011180A (en) | 2021-06-22 |
CN113011180B CN113011180B (en) | 2024-09-03 |
Family
ID=76389428
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110412776.4A Active CN113011180B (en) | 2021-04-16 | 2021-04-16 | Defect report severity prediction method based on description keyword extraction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113011180B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116049693A (en) * | 2023-03-17 | 2023-05-02 | 济南市计量检定测试院 | Metering verification data management method based on medical equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100191731A1 (en) * | 2009-01-23 | 2010-07-29 | Vasile Rus | Methods and systems for automatic clustering of defect reports |
CN111177010A (en) * | 2019-12-31 | 2020-05-19 | 杭州电子科技大学 | Software defect severity identification method |
CN112306731A (en) * | 2020-11-12 | 2021-02-02 | 南通大学 | Two-stage defect-distinguishing report severity prediction method based on space word vector |
-
2021
- 2021-04-16 CN CN202110412776.4A patent/CN113011180B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100191731A1 (en) * | 2009-01-23 | 2010-07-29 | Vasile Rus | Methods and systems for automatic clustering of defect reports |
CN111177010A (en) * | 2019-12-31 | 2020-05-19 | 杭州电子科技大学 | Software defect severity identification method |
CN112306731A (en) * | 2020-11-12 | 2021-02-02 | 南通大学 | Two-stage defect-distinguishing report severity prediction method based on space word vector |
Non-Patent Citations (1)
Title |
---|
郑炜;陈军正;吴潇雪;陈翔;夏鑫;: "基于深度学习的安全缺陷报告预测方法实证研究", 软件学报, no. 05, 15 May 2020 (2020-05-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116049693A (en) * | 2023-03-17 | 2023-05-02 | 济南市计量检定测试院 | Metering verification data management method based on medical equipment |
CN116049693B (en) * | 2023-03-17 | 2023-06-06 | 济南市计量检定测试院 | Metering verification data management method based on medical equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113011180B (en) | 2024-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113254599B (en) | Multi-label microblog text classification method based on semi-supervised learning | |
CN108710611B (en) | Short text topic model generation method based on word network and word vector | |
CN109165382B (en) | Similar defect report recommendation method combining weighted word vector and potential semantic analysis | |
CN111190968A (en) | Data preprocessing and content recommendation method based on knowledge graph | |
CN109840324B (en) | Semantic enhancement topic model construction method and topic evolution analysis method | |
CN112926337B (en) | End-to-end aspect level emotion analysis method combined with reconstructed syntax information | |
CN107220293B (en) | Emotion-based text classification method | |
CN116050397B (en) | Method, system, equipment and storage medium for generating long text abstract | |
CN111241271B (en) | Text emotion classification method and device and electronic equipment | |
CN111339753B (en) | Self-adaptive Chinese new word recognition method and system | |
CN111061866B (en) | Barrage text clustering method based on feature expansion and T-oBTM | |
Wang et al. | Named entity recognition method of brazilian legal text based on pre-training model | |
CN113886562A (en) | AI resume screening method, system, equipment and storage medium | |
CN115759119A (en) | Financial text emotion analysis method, system, medium and equipment | |
CN113011180B (en) | Defect report severity prediction method based on description keyword extraction | |
CN114266249A (en) | Mass text clustering method based on birch clustering | |
CN109829054A (en) | A kind of file classification method and system | |
CN106294689B (en) | A kind of method and apparatus for selecting to carry out dimensionality reduction based on text category feature | |
CN113051892A (en) | Chinese word sense disambiguation method based on transformer model | |
CN114202038B (en) | Crowdsourcing defect classification method based on DBM deep learning | |
CN102622405B (en) | Method for computing text distance between short texts based on language content unit number evaluation | |
CN112131384A (en) | News classification method and computer-readable storage medium | |
Xu et al. | An Improved Translation-Based Method for Knowledge Graph Representation | |
CN118467724B (en) | Abstract generation method and system based on financial big model | |
CN117312565B (en) | Literature author name disambiguation method based on relation fusion and representation learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |