CN108984613A - A kind of defect report spanned item mesh classification method based on transfer learning - Google Patents

A kind of defect report spanned item mesh classification method based on transfer learning Download PDF

Info

Publication number
CN108984613A
CN108984613A CN201810601343.1A CN201810601343A CN108984613A CN 108984613 A CN108984613 A CN 108984613A CN 201810601343 A CN201810601343 A CN 201810601343A CN 108984613 A CN108984613 A CN 108984613A
Authority
CN
China
Prior art keywords
data
defect report
training
target data
transfer learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810601343.1A
Other languages
Chinese (zh)
Inventor
郑征
杜晓婷
肖冠平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201810601343.1A priority Critical patent/CN108984613A/en
Publication of CN108984613A publication Critical patent/CN108984613A/en
Pending legal-status Critical Current

Links

Abstract

The defect report spanned item mesh classification method based on transfer learning that the invention discloses a kind of, comprising the following steps: (1), select data, determine source data and target data, and pre-process to data;(2), by training defect report semantic model, the vector for calculating each defect report in step (1) is indicated;(3), by step (2) source data and target data be divided into training data and test data, by the weight of transfer learning adjusting training data, keep classification results error minimum;(4), using the data training classifier that migration obtains in step (3), target data is carried out across classification of the items by Machine learning classifiers.The present invention by by transfer learning be introduced into defect report across in classification of the items, improve accuracy rate of the defect report across classification of the items, semantic information is introduced into the automatic classification of defect report, improves the accuracy rate classified automatically to defect report by training defect report semantic model.

Description

A kind of defect report spanned item mesh classification method based on transfer learning
Technical field
The invention belongs to defect report automatic classification technology field more particularly to a kind of defect reports based on transfer learning Spanned item mesh classification method.
Background technique
Conventional machines learning method is based on the assumption that training data and test data obey identical feature distribution, it is desirable that Training data and test data data distribution having the same.When the data distribution between training data and test data is in the presence of poor Different time, traditional machine learning method will be deteriorated to the prediction of result.However, in some actual machine learning scenes, The training data that obtaining has same characteristic features space and data distribution with test data is highly difficult, or needs to spend very big Cost so that it is this hypothesis be often unable to satisfy.Such problems is just faced when predicting defect, it is new for one Project or the less project of historical data, can not often obtain enough defect reports, and new data is marked Cost it is also very high.The data for how maximally utilising existing project, which carry out classification to new project data, becomes a pass Key problem.
When being classified automatically to defect report, it is assumed that have been obtained for the defect of a large amount of labeled linux systems Report, if training data and the target data to be classified are from Linux software systems, traditional machine learning Method can be obtained by good prediction result.But if training data comes from Linux, and target data is lacked from MySQL Report is fallen into, since defect report is from different projects, the prediction result of machine learning method will be deteriorated.
In order to make up the deficiency of the above method, the invention proposes a kind of, and the defect report spanned item mesh based on transfer learning divides Class method.One side transfer learning broken conventional machines learning method it is assumed that can be from other related fields migration informations To improve the study of the information to a certain field.On the other hand, accuracy rate of the defect report across classification of the items is improved.
Summary of the invention
The purpose of the present invention is: using the method for transfer learning, the accuracy rate to defect report across classification of the items is improved, is beaten Broken conventional machines learning method require training data and test data obey same distribution it is assumed that can be moved from related fields Information is moved to improve the study to a certain realm information, proposes a kind of across classification of the items side of the defect report based on transfer learning Method.
The technical scheme is that a kind of defect report spanned item mesh classification method based on transfer learning, including it is following Step:
Step 1), the clear target data to be classified have close feature according to selection the characteristics of target data therewith Source data, and Text Pretreatment is carried out to target data and source data;The pretreatment include participle, remove stop words and Lemmatization excludes the interference information for including in text;
Step 2), training defect report semantic model use the defect report training defect report language largely without label Adopted model, the vector for obtaining each word indicate, and by each defect report table in step 1) source data and target data It is shown as the form of vector;
Step 3), by step 2) obtained source data and target data be divided into training data and test data, Middle training data includes the target data of all source data and 10%~20%, and test data includes remaining target data; Assigning initial weight to training data makes the classification to target data by the weight of the continuous adjusting training data of transfer learning Error is minimum;
Step 4), the training data training machine Study strategies and methods obtained using the middle migration of step 3), and use engineering It practises classifier to classify to test data automatically, obtains the spanned item mesh classification results of defect report.
A kind of defect report spanned item mesh classification method based on transfer learning of the present invention, compared with the conventional method compared with the advantages of Be: the present invention broken conventional machines learning method require training data and test data obey same distribution it is assumed that can To improve the study to a certain realm information from related fields migration information, defect report is improved across the accurate of classification of the items Rate.
Detailed description of the invention
Defect report spanned item mesh classification method flow diagram of the Fig. 1 based on transfer learning.
Fig. 2 transfer learning frame diagram.
Specific embodiment
Before specific descriptions, the definition of used transfer learning is introduced first.
Transfer learning: a source domain D is givenSAnd aiming fieldRespectively correspond originating taskAnd goal taskMigration Study be exactly by usingWithIn relevant information, improve target prediction function fTThe process of the predictive ability of (), Middle DS≠DTOrHere source domain may be individually be also likely to be multiple.
Isomorphism migration/isomery migration: after the definition of given transfer learning, source domain Wherein xSi∈χSIt is DSIn i-th of data, ySi∈ySCorrespond to xSiClassification mark Label.Equally, aiming fieldWherein xTi∈χTFor DTIn I-th of data, yTi∈yTCorrespond to xTiClass label.Condition DS≠DTNamely refer toAnd/or P (XS)≠ P(XT)。Refer to that transfer learning is isomery transfer learning,Finger transfer learning is isomorphism transfer learning.
When classifying to defect report, isomery transfer learning refers to that source software project and target software project are different Feature, herein refer to source data and target data from different projects, isomorphism transfer learning refer to source software project and Target software project feature having the same herein refers to source data and target data from identical project.
Explanation more further is made to the present invention with reference to the accompanying drawing.Below with reference to attached drawing 1, to technical side of the invention Case is described in detail, and specific implementation step is as follows:
Step 1, selection data, the clear target data to be classified, and determine institute's source data to be used.It selects first With the data of the project different field to be classified, this partial data is tape label, referred to as source data, uses TdIt indicates.Quilt Data in the project of prediction are known as target data, include two parts data in target data, a portion is a small amount of mark Count evidence, uses TsIt indicates, another part is Unlabeled data, is indicated with S.Determine institute target data and source data to be used Afterwards, it needs to carry out Text Pretreatment to data, including segments, removes three steps of stop words and lemmatization, exclude in text The interference information for including;
Step 2, training defect report semantic model.Defect report largely without label is downloaded from defect tracking system, After carrying out Text Pretreatment (including participle, remove stop words and lemmatization) to defect report, word2vec work is used Skip-gram model training defect report semantic model in tool, in vector form by each word in defect report It indicates.And by being averaged to all word term vectors in each defect report, by the source data (T in step 1d) and mesh Mark data (TsAnd S) the defects of report be all expressed as the form of vector;
Step 3, using transfer learning method realize defect report across classification of the items.The present invention is to defect report spanned item The thought of mesh prediction comes from Tradaboost transfer learning frame, by the weight of adjust automatically training sample, uses Boosting filters training sample widely different with target data in source data.Flag data in target data is seldom When, by the data in source data as additional training data, to improve result of the defect report across classification of the items.Such as Fig. 2 It is shown, by the obtained source data T in step 2dWith the T in target datasIt is incorporated as training data, is indicated with T, and is instruction Practice data and assigns weight wt, as the input of training classifier, using the classifier after training to the reference numerals in target data According to TsClassify, by classification results and TsLabel compare, according to the power of the error update training data T of classification results Weight wt
Step 4, using step 3 from source data TdIt is middle to migrate obtained data training classifier, pass through machine learning classification Device classifies to the test set S in target data, and obtains classification results.In an experiment, by 80% in target data Data are used as test, and 20% data and source data are together as training data, for migrating useful information from source data And the training data as classifier.
The above content has carried out the defect report spanned item mesh classification method of the present invention based on transfer learning detailed Illustrate, it is apparent that specific implementation form of the invention is not limited thereto.For the those skilled in the art of the art, To the various obvious changes of its progress all at this without departing substantially from spirit and claims of the present invention range Within the protection scope of invention.

Claims (2)

1. a kind of defect report spanned item mesh classification method based on transfer learning, it is characterised in that: method includes the following steps:
Step 1), the clear target data to be classified, there is the source number of close feature according to selection the characteristics of target data therewith According to, and Text Pretreatment is carried out to target data and source data;
Step 2), training defect report semantic model use the defect report training defect report semanteme mould largely without label Type, the vector for obtaining each word indicates, and each defect report in step 1) source data and target data is expressed as The form of vector;
Step 3), by step 2) obtained source data and target data be divided into training data and test data, wherein instructing Practice the target data that data include all source data and 10%~20%, test data includes remaining target data;To instruction Practicing data imparting initial weight makes the error in classification to target data by the weight of the continuous adjusting training data of transfer learning It is minimum;
Step 4) is divided using the training data training machine Study strategies and methods that migration obtains in step 3), and using machine learning Class device classifies automatically to test data, obtains the spanned item mesh classification results of defect report.
2. a kind of defect report spanned item mesh classification method based on transfer learning according to claim 1, it is characterised in that: Pretreatment described in step 1) includes participle, removes stop words and lemmatization, excludes the interference information for including in text.
CN201810601343.1A 2018-06-12 2018-06-12 A kind of defect report spanned item mesh classification method based on transfer learning Pending CN108984613A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810601343.1A CN108984613A (en) 2018-06-12 2018-06-12 A kind of defect report spanned item mesh classification method based on transfer learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810601343.1A CN108984613A (en) 2018-06-12 2018-06-12 A kind of defect report spanned item mesh classification method based on transfer learning

Publications (1)

Publication Number Publication Date
CN108984613A true CN108984613A (en) 2018-12-11

Family

ID=64541144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810601343.1A Pending CN108984613A (en) 2018-06-12 2018-06-12 A kind of defect report spanned item mesh classification method based on transfer learning

Country Status (1)

Country Link
CN (1) CN108984613A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614489A (en) * 2018-12-13 2019-04-12 大连海事大学 It is a kind of to report severity recognition methods based on transfer learning and the Bug of feature extraction
CN110598787A (en) * 2019-09-12 2019-12-20 北京理工大学 Software bug classification method based on self-defined step length learning
CN110751186A (en) * 2019-09-26 2020-02-04 北京航空航天大学 Cross-project software defect prediction method based on supervised expression learning
CN111723010A (en) * 2020-06-12 2020-09-29 大连海事大学 Software BUG classification method based on sparse cost matrix
CN111966586A (en) * 2020-08-05 2020-11-20 南通大学 Cross-project defect prediction method based on module selection and weight updating
WO2021103909A1 (en) * 2019-11-27 2021-06-03 支付宝(杭州)信息技术有限公司 Risk prediction method and apparatus, risk prediction model training method and apparatus, and electronic device
WO2022058324A1 (en) 2020-09-21 2022-03-24 Roche Diagnostics Gmbh A method for detecting and reporting an operation error in an in-vitro diagnostic system and an in-vitro diagnostic system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025503A (en) * 2017-04-18 2017-08-08 武汉大学 Across company software failure prediction method based on transfer learning and defects count information
CN107644057A (en) * 2017-08-09 2018-01-30 天津大学 A kind of absolute uneven file classification method based on transfer learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025503A (en) * 2017-04-18 2017-08-08 武汉大学 Across company software failure prediction method based on transfer learning and defects count information
CN107644057A (en) * 2017-08-09 2018-01-30 天津大学 A kind of absolute uneven file classification method based on transfer learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
陈琳: "基于机器学习的软件缺陷预测研究", 《万方数据》 *
魏晓聪等: "面向迁移学习的文本特征对齐算法", 《计算机工程》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614489A (en) * 2018-12-13 2019-04-12 大连海事大学 It is a kind of to report severity recognition methods based on transfer learning and the Bug of feature extraction
CN109614489B (en) * 2018-12-13 2022-11-18 大连海事大学 Bug report severity recognition method based on transfer learning and feature extraction
CN110598787A (en) * 2019-09-12 2019-12-20 北京理工大学 Software bug classification method based on self-defined step length learning
CN110598787B (en) * 2019-09-12 2021-06-08 北京理工大学 Software bug classification method based on self-defined step length learning
CN110751186A (en) * 2019-09-26 2020-02-04 北京航空航天大学 Cross-project software defect prediction method based on supervised expression learning
WO2021103909A1 (en) * 2019-11-27 2021-06-03 支付宝(杭州)信息技术有限公司 Risk prediction method and apparatus, risk prediction model training method and apparatus, and electronic device
CN111723010A (en) * 2020-06-12 2020-09-29 大连海事大学 Software BUG classification method based on sparse cost matrix
CN111723010B (en) * 2020-06-12 2024-02-23 大连海事大学 Software BUG classification method based on sparse cost matrix
CN111966586A (en) * 2020-08-05 2020-11-20 南通大学 Cross-project defect prediction method based on module selection and weight updating
WO2022058324A1 (en) 2020-09-21 2022-03-24 Roche Diagnostics Gmbh A method for detecting and reporting an operation error in an in-vitro diagnostic system and an in-vitro diagnostic system

Similar Documents

Publication Publication Date Title
CN108984613A (en) A kind of defect report spanned item mesh classification method based on transfer learning
Trojanowska et al. A methodology of improvement of manufacturing productivity through increasing operational efficiency of the production process
Shahzad et al. Data mining based job dispatching using hybrid simulation-optimization approach for shop scheduling problem
CN104615986B (en) The method that pedestrian detection is carried out to the video image of scene changes using multi-detector
Zahraee et al. Simulation of manufacturing production line based on Arena
Fernandez-Viagas et al. Exploring the benefits of scheduling with advanced and real-time information integration in Industry 4.0: A computational study
US11699106B2 (en) Categorical feature enhancement mechanism for gradient boosting decision tree
CN104615730B (en) A kind of multi-tag sorting technique and device
CN112990298B (en) Key point detection model training method, key point detection method and device
Yang et al. Prediction-guided distillation for dense object detection
CN105868269A (en) Precise image searching method based on region convolutional neural network
CN115131655B (en) Training method and device of target detection model and target detection method
US20210311440A1 (en) Systems, Methods, and Media for Manufacturing Processes
JP2017022593A (en) Verification device, verification method and verification program
Wu Applying grey model to prioritise technical measures in quality function deployment
US11847187B2 (en) Device identification device, device identification method, and device identification program
JPWO2019180868A1 (en) Image generator, image generator and image generator
CN111950652A (en) Semi-supervised learning data classification algorithm based on similarity
Flotzinger et al. Building inspection toolkit: Unified evaluation and strong baselines for damage recognition
US20230280731A1 (en) Production management system, production management method, and production management program
US20210192779A1 (en) Systems, Methods, and Media for Manufacturing Processes
Wahab et al. Production improvement in an aircraft manufacturing company using value stream mapping approach
KR101609292B1 (en) Apparatus and method for managing a research and development project
EP4118512A1 (en) Systems, methods, and media for manufacturing processes
KR101649913B1 (en) Apparatus and method for managing a research and development project

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181211

RJ01 Rejection of invention patent application after publication