CN116775871A - Deep learning software defect report classification method based on seBERT pre-training model - Google Patents

Deep learning software defect report classification method based on seBERT pre-training model Download PDF

Info

Publication number
CN116775871A
CN116775871A CN202310711807.5A CN202310711807A CN116775871A CN 116775871 A CN116775871 A CN 116775871A CN 202310711807 A CN202310711807 A CN 202310711807A CN 116775871 A CN116775871 A CN 116775871A
Authority
CN
China
Prior art keywords
defect report
defect
data
deep learning
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310711807.5A
Other languages
Chinese (zh)
Inventor
宫丽娜
曾子璇
张静宣
魏明强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202310711807.5A priority Critical patent/CN116775871A/en
Publication of CN116775871A publication Critical patent/CN116775871A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a deep learning software defect report classification method based on a seBERT pre-training model, which comprises the following steps: collecting a defect report corresponding to a software warehouse developed based on a deep learning framework, adding a label type of the defect report and forming sample data; combining the text data extracted from each sample and inputting the text data into a pre-training model for fine adjustment; the feature vector corresponding to the cls identifier started in the text data is used as the semantic feature of the input text data by the pre-training model after fine adjustment, and the feature vector is input into a Softmax layer for normalization; inputting the feature vector output by the fine-tuned pre-training model to a full-connection layer, and mapping the feature vector containing semantic information to a corresponding category by using a linear layer according to the dimension; and taking the output with the highest probability as the final prediction category of the defect report, and finishing the classification of each defect report. The application is beneficial to improving the efficiency and accuracy of classifying the defect report of the deep learning software.

Description

Deep learning software defect report classification method based on seBERT pre-training model
Technical Field
The application relates to a deep learning software defect report classification method, in particular to a deep learning software defect report classification method based on a seBERT pre-training model.
Background
Deep learning software is permeated in various industries and application fields, but the inevitable security, quality and other loopholes still exist. In order to ensure the quality of the deep learning software system and prevent serious economic loss, the identification and prediction technology of the defects has important engineering application value. In the software development process, the problems of loopholes, performance requirements and the like of the software can be reflected according to software defect reports submitted by developers and users.
However, the manual division and identification of defect reports takes a lot of manpower and time. Compared with common traditional software, the deep learning software has the problems of randomness of training stages, dense interdependence of a neural network and the like, so that defects are more difficult to observe and reproduce, and therefore, new testing technology is needed to support the testing.
Pre-trained contextual language representation models have been successful in natural language processing areas and improving the effectiveness of tag recommendations, but models that are pre-trained in general areas are not as performing on certain tasks as models that are trained on domain-specific corpora. And for the defect report text disclosed in the deep learning software public warehouse, the problems of unbalance of data, redundant items and the like are obvious, and the training and prediction effects are poor by independently adopting a general pre-training model.
Patent document 1 discloses a software defect prediction device and method based on open source community knowledge, the method respectively constructs and trains a BP neural network and an LSTM neural network by using open source community codes, firstly performs software defect prediction through the trained BP neural network, and further performs software defect type prediction through the trained LSTM neural network if the prediction is defective, thereby improving the accuracy of a code defect prediction result. However, the method is not improved aiming at the defect characteristics in the existing deep learning software, and the data processing of the defects of the deep learning software is not considered.
In summary, the traditional research provides a good research foundation for vulnerability prediction according to software defect report, but the defect report classification capability of the current deep learning software is not fully mined, and is mainly expressed in the following steps:
1. the defect report classification method special for the deep learning software is not available, and the accuracy of subdivision of the defect report is poor.
2. Because the deep learning software has the problems of randomness of training stages, dense interdependence of a neural network and the like, the obtained software defects are difficult to reproduce, the defect report data class is unbalanced, and the content is disordered.
Reference to the literature
Patent document 1 chinese application patent application publication No.: CN111949535a, publication date: 2020.11.17.
disclosure of Invention
The application aims to provide a deep learning software defect report classification method based on a seBERT pre-training model, which fully considers the defect characteristics of deep learning software and the defect and class unbalance of defect report data, and is beneficial to improving the efficiency and accuracy of a defect report classification task of the deep learning software.
In order to achieve the above purpose, the application adopts the following technical scheme:
the deep learning software defect report classification method based on the seBERT pre-training model comprises the following steps:
step 1, collecting a defect report corresponding to a software warehouse developed based on a deep learning framework in a hosting platform of a software project, and adding a label category of the defect report according to information such as a title, text description, follow-up comments and the like in the defect report;
the method comprises the steps of forming sample data from defect report information and label categories corresponding to defect reports;
step 2, merging the text data extracted from each sample, inputting the merged text data into a pre-training model, and fine-tuning the pre-training model; the feature vector corresponding to the cls identifier started in the text data is used as the semantic feature of the input text data by the pre-training model after fine adjustment, and then the feature vector is input into a Softmax layer for normalization;
step 3, inputting the feature vector output by the fine-tuned pre-training model in the step 2 to a full-connection layer, and mapping the feature vector containing semantic information to a corresponding category by using a linear layer according to the dimension;
and finally, taking the output with the highest probability as the final prediction category of the defect report, thereby completing the classification of each defect report.
The application has the following advantages:
as described above, the application relates to a deep learning software defect report classification method based on a seBERT pre-training model. The application uses the pre-trained seBERT model in the corpus in the software engineering field to better extract text information in defect reports submitted by users and developers, and improves the capability of identifying defect categories to which the defect reports belong from the defect report text. In addition, in classifying defects of the deep learning software, unlike the traditional classification mode of classifying defect report labels into bug and non-bug, the method analyzes and describes the cause and classification of the bug of the deep learning software, and classifies the defect labels of the software into four types of Error, dployment, performance and problems, thereby further predicting specific classification of defect reports and improving classification and prediction capability of models for defect reports. In addition, in defect report data of a software project, obvious unbalanced-like problems exist, the effect of a fine tuning process of a pre-training model is greatly influenced, and data enhancement by using a mask language model (Mask Language Model, MLM) has excellent performance along with the rising of the pre-training language model, so that the application adopts a BERT pre-training model, predicts and replaces synonyms for masked words in the data by using the MLM method in the BERT, thereby generating new training data and adding the new training data into the original training data, and achieving the purpose of extracting text semantic information of the defect-like from a small amount of unbalanced data.
Drawings
FIG. 1 is a flow chart of a deep learning software defect report classification method according to an embodiment of the application.
FIG. 2 is a flow chart of a method for data enhancement of a data set based on a BERT pre-training model in an embodiment of the application.
FIG. 3 is a model diagram of a deep learning software defect report classification method according to an embodiment of the application.
Detailed Description
The application is described in further detail below with reference to the attached drawings and detailed description:
as deep learning is increasingly used for mission critical applications, defective deep learning application software can lead to catastrophic consequences. In defect reports (i.e., defect reports) submitted by developers, maintainers and users of the software, the development conditions of the software can be reflected and extracted so as to provide data references for defect prediction.
According to the application, through text information of the software defect report, a language model embedded by a context word is utilized to complete the task of multi-classification of labels of the defect report, so that the maintenance possibility and defect traceability of the deep learning software are ensured.
According to the application, the seBERT model which is trained in the software field context in advance is selected for fine tuning, so that semantic information in the defect report can be better extracted, a better result can be obtained by the model under the condition of unbalanced data or insufficient data quantity, and finally multi-classification work is completed through a forward neural network, and the correct label of the real defect in the defect report is determined.
As shown in fig. 1, the deep learning software defect report classification method based on the seBERT pre-training model in this embodiment includes the following steps:
step 1, collecting a defect report corresponding to a software warehouse developed based on a deep learning framework in a hosting platform of a software project, and adding a label category of the defect report according to information such as a title, text description, follow-up comments and the like in the defect report.
And forming the text information of the defect report and the label category corresponding to the defect report into sample data.
The step 1 specifically comprises the following steps:
and step 1.1, screening out a mature software system which has high activity and is developed based on a deep learning framework according to the Stars number and the project development time information.
The software system data to be collected in the application mainly comprises information such as titles and report descriptions in defect reports (issues) in a hosting platform (such as Github and Gitee) of software projects.
And marking the defect type according to the title text of the report and the follow-up reply and submitted information of the developer and the user.
While the provision of an off-the-shelf bug report tab in Github is available for project developers and users to add, lacking subdivision of specific bugs, there is only a "bug" tab to mark reports describing software bugs.
Even most defect reports lack labels or have label addition errors, which seriously affect later project maintenance and defect positioning and other works, and intangibly create additional labor and time costs.
There is therefore a need for re-labeling defect reports, in particular reports containing real defects.
And step 1.2, performing data filtering and text type data preprocessing on the collected software system data.
Step 1.2.1, filtering data; filtering invalid defect reports submitted under the software warehouse, wherein the invalid defect reports comprise reports with titles or texts having vacancies and defect reports without shutdown.
Step 1.2.2. Text data preprocessing.
And (3) performing word segmentation, stop word removal, foreign language word removal, picture removal, linking and code preprocessing operations on text type data contained in the collected data, including the title and the text data in the report.
Step 1.3, manually adding a specific defect category corresponding to the report according to the title text and the follow-up comments of the defect report; if the defect report does not contain a corresponding real defect, the added label is of the "other" class.
For defect reports containing real defects, the label class of the defect report will continue to be subdivided according to the content of the defect report.
Specifically, for defect reports containing real defects, the present embodiment classifies the defect reports into Error class, dproyment class, performance class, and Tensors & Inputs class according to defect types, for a total of 4 classes.
Error tags represent defects arising from code writing problems and api use.
The deviyment tag represents a software defect in terms of installation and hardware Deployment.
Performance tags include problems of inefficiency and inefficiency in software.
Tensorts & Inputs class labels represent problems due to erroneous data types, data shapes or data formats.
The type 4 defect type labels for defect reports containing real defects are consistent with the current mainstream research, so that specific problems of each defect report can be completely and accurately divided.
Defect reports that do not contain real defects (if only reaction requirements are submitted, user questions, file version declarations, etc.) are categorized as "Other (Other)" class labels.
For example, if the report content actually presents a problem of use by the user rather than a defect of the software itself, or presents a question of use and installation of the software, advice on the performance of the software, etc., these types of reports are collectively classified as "other" categories.
Extracting and preprocessing information in all defect reports, extracting header and description data in the defect reports, and forming a defect label data set with labeled classification labels and other data for subsequent training models.
Step 1.4. Because the defect report submitted under the disclosed software project warehouse platform contains very low quantity of real defects, the phenomenon of unbalanced category distribution exists, and the fine tuning effect of the pre-training model is seriously affected.
Thus, the present application employs data enhancement techniques for processing.
Specifically, the original Token is replaced by 'MASK', the pre-training model BERT is utilized for prediction, the Token with higher prediction probability is selected to replace the original Token, and the replaced text is added into the training data set, so that the pre-training model can fully extract text semantic information of the defect class from a small amount of unbalanced data in the training process.
As shown in fig. 2, taking the text of "This is very cool" as an example, after the "try" in the text is masked, the text is replaced by "pretty", "real" and "super", and three types of text after replacement are respectively as follows:
"This is pretty cool", "This is really cool", and "This is super cool".
And adding the three replaced texts into a training data set to realize data enhancement, so that the pre-training model can fully extract text semantic information of the defect class from a small amount of unbalanced data in the training process.
Step 2, merging the text data extracted from each sample, inputting the merged text data into a pre-training model, and fine-tuning the pre-training model; the feature vector corresponding to the cls identifier at the beginning in the text data is used as the semantic feature of the input text data by the pre-training model after fine tuning, and then the feature vector is input into a Softmax layer for normalization.
As shown in fig. 3, the concatenated isue text is converted into vectors of the dimension corresponding to ebedding by querying a word vector table (e.g., t shown in fig. 3 1 ……t n ) As input to the seBERT model.
Corresponding to seBERT<cls>token feature (e.g., C in FIG. 3) and each word conversion in textIs (e.g. t shown in FIG. 3) 1 ……t n ) Corresponding features (e.g. T shown in FIG. 3 1 ……T n ) And inputting the extracted isue features into a full-connection layer to obtain probabilities that the isues belong to different categories.
The application uses the seBERT which is excellent in the natural language processing field as a language model, carries out self-supervision training on a large-scale unlabeled text and then carries out fine tuning training on a downstream task, and the trimmed model can finish various downstream tasks.
Unlike models that are pre-trained in a generic corpus, the seBERT model is trained from the beginning based on data from the software engineering domain and has been demonstrated to possess higher performance and work efficiency in some software engineering related downstream tasks.
Step 2.1, selecting seBERT as a pre-training model; the model shows powerful performance in the task of report (isue) type prediction, exceeding the smaller model. In particular, the pre-training process corpus of the seBERT is based on data in the field of software engineering, and can effectively process defect report text information submitted in a deep learning software warehouse.
And 2.2. Merging the extracted text data, namely the title and the text part in the defect report, as the input of the seBERT model for each closed defect report, and performing fine tuning on the seBERT model to enable the seBERT model obtained after fine tuning to be more in accordance with the downstream task of label classification.
By updating the parameters of the original pre-training model, the trimmed seBERT model is more in line with the downstream task of label classification, and the trimmed seBERT model outputs the corresponding feature vector of < cls > for the subsequent classification task.
To further solve the training set problem of data imbalance, a Cross-entropy loss function (Cross-entropy loss) is used in the fine tuning process. In deep learning, cross entropy loss functions are a commonly used type of loss function, commonly used for classification problems. The method can measure the difference between the model prediction result and the actual result, and is one of key indexes for optimizing model parameters.
The cross entropy loss function calculation formula is as follows:
wherein y is i,k The real label representing the ith sample is K, and the total number of the K label values is N samples;
p i,k representing the probability that the i-th sample is predicted to be the k-th tag value.
During training, the sembert model was optimized using torch.nn.cross entropyloss as a loss function, the gradient was emptied with an optimizer.zero_grad (), the sembert model output and loss were calculated, the gradient was calculated using loss.backsaward (), and the sembert model parameters were updated using optimizer.step ().
At the end of each epoch, the model will evaluate on the test set to check its generalization ability on new data.
The application adopts the model pre-trained by the corpus in the software engineering field as the training model of the classification task, so as to improve the processing and learning capacity of the model for the text data of the defect report of the deep learning software warehouse and improve the efficiency and classification accuracy of the model.
And 3, inputting the feature vector output by the fine-tuned pre-training model in the step 2 to a full-connection layer, and mapping the feature vector containing semantic information to a corresponding category by using a linear layer according to the dimension.
And finally, taking the output with the highest probability as the final prediction category of the defect report, thereby completing the classification of each defect report.
The step 3 specifically comprises the following steps:
normalizing the feature vector output by the pre-training model in the step 2 by inputting a Softmax activation function, and then classifying the normalized feature vector into a full-connection layer; the calculation formula of the Softmax activation function is as follows:
wherein x is i Is the output value of the i-th node in the neural network,is a normalized term that ensures that all output values of the function sum to 1 and each value is in the (0, 1) range, thus constituting an effective probability distribution.
And finally, taking the output with the highest probability as the final prediction category of the defect report, thereby completing the classification of the defect report.
The foregoing description is, of course, merely illustrative of preferred embodiments of the present application, and it should be understood that the present application is not limited to the above-described embodiments, but is intended to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present application as defined by the appended claims.

Claims (10)

1. The deep learning software defect report classification method based on the seBERT pre-training model is characterized in that,
the method comprises the following steps:
step 1, collecting a defect report corresponding to a software warehouse developed based on a deep learning framework, and adding a label category of the defect report according to a title, text description and follow-up comment information in the defect report;
forming text information of the defect report and label types corresponding to the defect report into sample data;
step 2, merging the text data extracted from each sample and inputting the text data into a pre-training model, and fine-tuning the pre-training model;
the feature vector corresponding to the cls identifier started in the text data is used as the semantic feature of the input text data by the pre-training model after fine adjustment, and then the feature vector is input into a Softmax layer for normalization;
step 3, inputting the feature vector output by the fine-tuned pre-training model in the step 2 to a full-connection layer, and mapping the feature vector containing semantic information to a corresponding category by using a linear layer according to the dimension;
and finally, taking the output with higher probability as the final prediction category of the defect report, thereby completing the classification of the defect report.
2. The deep learning software defect report classification method of claim 1 wherein,
in the step 1, for a defect report containing a real defect, continuously subdividing the label category of the defect report according to the text content of the defect report; defect reports that do not contain real defects are classified as "other" class labels.
3. The deep learning software defect report classification method of claim 1 wherein,
in the step 1, the obtained closed defect report header and text information and the real classification label corresponding to the defect report are combined to form sample data, and the sample data is used as a training data set of a subsequent pre-training model.
4. The deep learning software defect report classification method of claim 2 wherein,
in the step 1, for the defect report including the real defect, the defect report is classified into Error class, dproyment class, performance class and Tensors & Inputs class according to the defect type;
error tags represent defects arising from code writing problems and api use;
the depoyment tag represents a flaw in software installation and hardware Deployment;
performance tags include problems of inefficiency and inefficiency in software;
tensorts & Inputs class labels represent problems due to erroneous data types, data shapes or data formats.
5. The deep learning software defect report classification method of claim 1 wherein,
the step 1 specifically comprises the following steps:
step 1.1, screening out a mature software system which has high activity and is developed based on a deep learning framework;
step 1.2, data filtering and text type data preprocessing are carried out on the collected software system data;
step 1.3, manually adding a specific defect category corresponding to the report according to the title text and the follow-up comments of the defect report; if the defect report does not contain a corresponding real defect, the added label is of the "other" class.
6. The deep learning software defect report classification method of claim 5 wherein,
the step 1.2 specifically comprises the following steps:
step 1.2.1, filtering data; filtering invalid defect reports submitted under a software warehouse, wherein the invalid defect reports comprise reports of blank titles or text and defect reports without closing;
step 1.2.2, preprocessing text data; and performing word segmentation, stop word removal, foreign language word removal, picture removal, linking and code preprocessing on text type data contained in the collected data.
7. The deep learning software defect report classification method of claim 5 wherein,
the step 1.3 further comprises a step of data enhancement, namely:
step 1.4, randomly replacing the text marked as the true defect type;
specifically, the original Token is replaced by 'MASK', the pre-training model BERT is utilized for prediction, the Token with higher prediction probability is selected to replace the original Token, and the replaced text is added into the training data set.
8. The deep learning software defect report classification method of claim 1 wherein,
the step 2 specifically comprises the following steps:
step 2.1, selecting seBERT as a pre-training model;
2.2. Merging the extracted text data, namely the title and the text part in the defect report, as the input of the seBERT model for each closed defect report, and performing fine tuning on the seBERT model;
by updating the parameters of the original pre-training model, the trimmed seBERT model is more in line with the downstream task of label classification, and the trimmed seBERT model outputs the corresponding feature vector of < cls > for the subsequent classification task.
9. The deep learning software defect report classification method of claim 8 wherein,
in the step 2.2, a cross entropy loss function optimization model is adopted in the fine tuning process of the seBERT model;
the cross entropy loss function calculation formula is as follows:
wherein y is i,k The real label representing the ith sample is K, and the total number of the K label values is N samples;
p i,k representing the probability that the i-th sample is predicted to be the k-th tag value.
10. The deep learning software defect report classification method of claim 1 wherein,
the step 3 specifically comprises the following steps:
normalizing the feature vector output by the pre-training model in the step 2 by inputting a Softmax activation function, and then classifying the normalized feature vector into a full-connection layer; the calculation formula of the Softmax activation function is as follows:
wherein x is i Is the output value of the i-th node in the neural network,is a normalized term that ensures that all output values of the function sum to 1 and each value is in the (0, 1) range, thus constituting an effective probability distribution;
and finally, taking the output with the highest probability as the final prediction category of the defect report, thereby completing the classification of the defect report.
CN202310711807.5A 2023-06-15 2023-06-15 Deep learning software defect report classification method based on seBERT pre-training model Pending CN116775871A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310711807.5A CN116775871A (en) 2023-06-15 2023-06-15 Deep learning software defect report classification method based on seBERT pre-training model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310711807.5A CN116775871A (en) 2023-06-15 2023-06-15 Deep learning software defect report classification method based on seBERT pre-training model

Publications (1)

Publication Number Publication Date
CN116775871A true CN116775871A (en) 2023-09-19

Family

ID=87992401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310711807.5A Pending CN116775871A (en) 2023-06-15 2023-06-15 Deep learning software defect report classification method based on seBERT pre-training model

Country Status (1)

Country Link
CN (1) CN116775871A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180307904A1 (en) * 2017-04-19 2018-10-25 Tata Consultancy Services Limited Systems and methods for classification of software defect reports
CN108804558A (en) * 2018-05-22 2018-11-13 北京航空航天大学 A kind of defect report automatic classification method based on semantic model
CN109492106A (en) * 2018-11-13 2019-03-19 扬州大学 Text code combined automatic classification method for defect reasons
CN111507990A (en) * 2020-04-20 2020-08-07 南京航空航天大学 Tunnel surface defect segmentation method based on deep learning
CN112328469A (en) * 2020-10-22 2021-02-05 南京航空航天大学 Function level defect positioning method based on embedding technology
CN114782967A (en) * 2022-03-21 2022-07-22 南京航空航天大学 Software defect prediction method based on code visualization learning
CN114816497A (en) * 2022-04-18 2022-07-29 南京航空航天大学 Link generation method based on BERT pre-training model
CN115617990A (en) * 2022-09-28 2023-01-17 浙江大学 Electric power equipment defect short text classification method and system based on deep learning algorithm
CN116186506A (en) * 2023-03-13 2023-05-30 南京航空航天大学 Automatic identification method for accessibility problem report based on BERT pre-training model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180307904A1 (en) * 2017-04-19 2018-10-25 Tata Consultancy Services Limited Systems and methods for classification of software defect reports
CN108804558A (en) * 2018-05-22 2018-11-13 北京航空航天大学 A kind of defect report automatic classification method based on semantic model
CN109492106A (en) * 2018-11-13 2019-03-19 扬州大学 Text code combined automatic classification method for defect reasons
CN111507990A (en) * 2020-04-20 2020-08-07 南京航空航天大学 Tunnel surface defect segmentation method based on deep learning
CN112328469A (en) * 2020-10-22 2021-02-05 南京航空航天大学 Function level defect positioning method based on embedding technology
CN114782967A (en) * 2022-03-21 2022-07-22 南京航空航天大学 Software defect prediction method based on code visualization learning
CN114816497A (en) * 2022-04-18 2022-07-29 南京航空航天大学 Link generation method based on BERT pre-training model
CN115617990A (en) * 2022-09-28 2023-01-17 浙江大学 Electric power equipment defect short text classification method and system based on deep learning algorithm
CN116186506A (en) * 2023-03-13 2023-05-30 南京航空航天大学 Automatic identification method for accessibility problem report based on BERT pre-training model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ALEXANDER TRAUTSCH: ""Predicting Issue Types with seBERT"", 《2022 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING (NLBSE)》, pages 1 - 3 *
EESHITA BISWAS ETC.AL: "Achieving Reliable Sentiment Analysis in the Software Engineering Domain using BERT", 2020 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), pages 1 - 12 *
张楠: "《深度学习自然语言处理实战》", 机械工业出版社, pages: 165 - 168 *
田园;原野;刘海斌;满志博;毛存礼;: "基于BERT预训练语言模型的电网设备缺陷文本分类", 南京理工大学学报, no. 04 *

Similar Documents

Publication Publication Date Title
US11734328B2 (en) Artificial intelligence based corpus enrichment for knowledge population and query response
AU2019263758B2 (en) Systems and methods for generating a contextually and conversationally correct response to a query
CN112732934B (en) Power grid equipment word segmentation dictionary and fault case library construction method
CN113191148B (en) Rail transit entity identification method based on semi-supervised learning and clustering
CN111427775B (en) Method level defect positioning method based on Bert model
Del Carpio et al. Trends in software engineering processes using deep learning: a systematic literature review
CN116383399A (en) Event public opinion risk prediction method and system
Wu et al. BERT for sentiment classification in software engineering
Nicholson et al. Issue link label recovery and prediction for open source software
Sghaier et al. A multi-step learning approach to assist code review
Geist et al. Leveraging machine learning for software redocumentation
CN113868422A (en) Multi-label inspection work order problem traceability identification method and device
CN113221569A (en) Method for extracting text information of damage test
CN115934936A (en) Intelligent traffic text analysis method based on natural language processing
Wang et al. Exploring semantics of software artifacts to improve requirements traceability recovery: a hybrid approach
AL-Rubaiee et al. Techniques for improving the labelling process of sentiment analysis in the saudi stock market
CN117421226A (en) Defect report reconstruction method and system based on large language model
Amin Cases without borders: automating knowledge acquisition approach using deep autoencoders and siamese networks in case-based reasoning
CN117312402A (en) Oilfield safety standard knowledge data mining method and system
CN116841869A (en) Java code examination comment generation method and device based on code structured information and examination knowledge
CN116775871A (en) Deep learning software defect report classification method based on seBERT pre-training model
CN114048321B (en) Multi-granularity text error correction data set generation method, device and equipment
CN115827871A (en) Internet enterprise classification method, device and system
CN110807096A (en) Information pair matching method and system on small sample set
CN113642321A (en) Financial field-oriented causal relationship extraction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230919