CN108984613A - A kind of defect report spanned item mesh classification method based on transfer learning - Google Patents
A kind of defect report spanned item mesh classification method based on transfer learning Download PDFInfo
- Publication number
- CN108984613A CN108984613A CN201810601343.1A CN201810601343A CN108984613A CN 108984613 A CN108984613 A CN 108984613A CN 201810601343 A CN201810601343 A CN 201810601343A CN 108984613 A CN108984613 A CN 108984613A
- Authority
- CN
- China
- Prior art keywords
- data
- defect report
- training
- target data
- transfer learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The defect report spanned item mesh classification method based on transfer learning that the invention discloses a kind of, comprising the following steps: (1), select data, determine source data and target data, and pre-process to data;(2), by training defect report semantic model, the vector for calculating each defect report in step (1) is indicated;(3), by step (2) source data and target data be divided into training data and test data, by the weight of transfer learning adjusting training data, keep classification results error minimum;(4), using the data training classifier that migration obtains in step (3), target data is carried out across classification of the items by Machine learning classifiers.The present invention by by transfer learning be introduced into defect report across in classification of the items, improve accuracy rate of the defect report across classification of the items, semantic information is introduced into the automatic classification of defect report, improves the accuracy rate classified automatically to defect report by training defect report semantic model.
Description
Technical field
The invention belongs to defect report automatic classification technology field more particularly to a kind of defect reports based on transfer learning
Spanned item mesh classification method.
Background technique
Conventional machines learning method is based on the assumption that training data and test data obey identical feature distribution, it is desirable that
Training data and test data data distribution having the same.When the data distribution between training data and test data is in the presence of poor
Different time, traditional machine learning method will be deteriorated to the prediction of result.However, in some actual machine learning scenes,
The training data that obtaining has same characteristic features space and data distribution with test data is highly difficult, or needs to spend very big
Cost so that it is this hypothesis be often unable to satisfy.Such problems is just faced when predicting defect, it is new for one
Project or the less project of historical data, can not often obtain enough defect reports, and new data is marked
Cost it is also very high.The data for how maximally utilising existing project, which carry out classification to new project data, becomes a pass
Key problem.
When being classified automatically to defect report, it is assumed that have been obtained for the defect of a large amount of labeled linux systems
Report, if training data and the target data to be classified are from Linux software systems, traditional machine learning
Method can be obtained by good prediction result.But if training data comes from Linux, and target data is lacked from MySQL
Report is fallen into, since defect report is from different projects, the prediction result of machine learning method will be deteriorated.
In order to make up the deficiency of the above method, the invention proposes a kind of, and the defect report spanned item mesh based on transfer learning divides
Class method.One side transfer learning broken conventional machines learning method it is assumed that can be from other related fields migration informations
To improve the study of the information to a certain field.On the other hand, accuracy rate of the defect report across classification of the items is improved.
Summary of the invention
The purpose of the present invention is: using the method for transfer learning, the accuracy rate to defect report across classification of the items is improved, is beaten
Broken conventional machines learning method require training data and test data obey same distribution it is assumed that can be moved from related fields
Information is moved to improve the study to a certain realm information, proposes a kind of across classification of the items side of the defect report based on transfer learning
Method.
The technical scheme is that a kind of defect report spanned item mesh classification method based on transfer learning, including it is following
Step:
Step 1), the clear target data to be classified have close feature according to selection the characteristics of target data therewith
Source data, and Text Pretreatment is carried out to target data and source data;The pretreatment include participle, remove stop words and
Lemmatization excludes the interference information for including in text;
Step 2), training defect report semantic model use the defect report training defect report language largely without label
Adopted model, the vector for obtaining each word indicate, and by each defect report table in step 1) source data and target data
It is shown as the form of vector;
Step 3), by step 2) obtained source data and target data be divided into training data and test data,
Middle training data includes the target data of all source data and 10%~20%, and test data includes remaining target data;
Assigning initial weight to training data makes the classification to target data by the weight of the continuous adjusting training data of transfer learning
Error is minimum;
Step 4), the training data training machine Study strategies and methods obtained using the middle migration of step 3), and use engineering
It practises classifier to classify to test data automatically, obtains the spanned item mesh classification results of defect report.
A kind of defect report spanned item mesh classification method based on transfer learning of the present invention, compared with the conventional method compared with the advantages of
Be: the present invention broken conventional machines learning method require training data and test data obey same distribution it is assumed that can
To improve the study to a certain realm information from related fields migration information, defect report is improved across the accurate of classification of the items
Rate.
Detailed description of the invention
Defect report spanned item mesh classification method flow diagram of the Fig. 1 based on transfer learning.
Fig. 2 transfer learning frame diagram.
Specific embodiment
Before specific descriptions, the definition of used transfer learning is introduced first.
Transfer learning: a source domain D is givenSAnd aiming fieldRespectively correspond originating taskAnd goal taskMigration
Study be exactly by usingWithIn relevant information, improve target prediction function fTThe process of the predictive ability of (),
Middle DS≠DTOrHere source domain may be individually be also likely to be multiple.
Isomorphism migration/isomery migration: after the definition of given transfer learning, source domain Wherein xSi∈χSIt is DSIn i-th of data, ySi∈ySCorrespond to xSiClassification mark
Label.Equally, aiming fieldWherein xTi∈χTFor DTIn
I-th of data, yTi∈yTCorrespond to xTiClass label.Condition DS≠DTNamely refer toAnd/or P (XS)≠
P(XT)。Refer to that transfer learning is isomery transfer learning,Finger transfer learning is isomorphism transfer learning.
When classifying to defect report, isomery transfer learning refers to that source software project and target software project are different
Feature, herein refer to source data and target data from different projects, isomorphism transfer learning refer to source software project and
Target software project feature having the same herein refers to source data and target data from identical project.
Explanation more further is made to the present invention with reference to the accompanying drawing.Below with reference to attached drawing 1, to technical side of the invention
Case is described in detail, and specific implementation step is as follows:
Step 1, selection data, the clear target data to be classified, and determine institute's source data to be used.It selects first
With the data of the project different field to be classified, this partial data is tape label, referred to as source data, uses TdIt indicates.Quilt
Data in the project of prediction are known as target data, include two parts data in target data, a portion is a small amount of mark
Count evidence, uses TsIt indicates, another part is Unlabeled data, is indicated with S.Determine institute target data and source data to be used
Afterwards, it needs to carry out Text Pretreatment to data, including segments, removes three steps of stop words and lemmatization, exclude in text
The interference information for including;
Step 2, training defect report semantic model.Defect report largely without label is downloaded from defect tracking system,
After carrying out Text Pretreatment (including participle, remove stop words and lemmatization) to defect report, word2vec work is used
Skip-gram model training defect report semantic model in tool, in vector form by each word in defect report
It indicates.And by being averaged to all word term vectors in each defect report, by the source data (T in step 1d) and mesh
Mark data (TsAnd S) the defects of report be all expressed as the form of vector;
Step 3, using transfer learning method realize defect report across classification of the items.The present invention is to defect report spanned item
The thought of mesh prediction comes from Tradaboost transfer learning frame, by the weight of adjust automatically training sample, uses
Boosting filters training sample widely different with target data in source data.Flag data in target data is seldom
When, by the data in source data as additional training data, to improve result of the defect report across classification of the items.Such as Fig. 2
It is shown, by the obtained source data T in step 2dWith the T in target datasIt is incorporated as training data, is indicated with T, and is instruction
Practice data and assigns weight wt, as the input of training classifier, using the classifier after training to the reference numerals in target data
According to TsClassify, by classification results and TsLabel compare, according to the power of the error update training data T of classification results
Weight wt;
Step 4, using step 3 from source data TdIt is middle to migrate obtained data training classifier, pass through machine learning classification
Device classifies to the test set S in target data, and obtains classification results.In an experiment, by 80% in target data
Data are used as test, and 20% data and source data are together as training data, for migrating useful information from source data
And the training data as classifier.
The above content has carried out the defect report spanned item mesh classification method of the present invention based on transfer learning detailed
Illustrate, it is apparent that specific implementation form of the invention is not limited thereto.For the those skilled in the art of the art,
To the various obvious changes of its progress all at this without departing substantially from spirit and claims of the present invention range
Within the protection scope of invention.
Claims (2)
1. a kind of defect report spanned item mesh classification method based on transfer learning, it is characterised in that: method includes the following steps:
Step 1), the clear target data to be classified, there is the source number of close feature according to selection the characteristics of target data therewith
According to, and Text Pretreatment is carried out to target data and source data;
Step 2), training defect report semantic model use the defect report training defect report semanteme mould largely without label
Type, the vector for obtaining each word indicates, and each defect report in step 1) source data and target data is expressed as
The form of vector;
Step 3), by step 2) obtained source data and target data be divided into training data and test data, wherein instructing
Practice the target data that data include all source data and 10%~20%, test data includes remaining target data;To instruction
Practicing data imparting initial weight makes the error in classification to target data by the weight of the continuous adjusting training data of transfer learning
It is minimum;
Step 4) is divided using the training data training machine Study strategies and methods that migration obtains in step 3), and using machine learning
Class device classifies automatically to test data, obtains the spanned item mesh classification results of defect report.
2. a kind of defect report spanned item mesh classification method based on transfer learning according to claim 1, it is characterised in that:
Pretreatment described in step 1) includes participle, removes stop words and lemmatization, excludes the interference information for including in text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810601343.1A CN108984613A (en) | 2018-06-12 | 2018-06-12 | A kind of defect report spanned item mesh classification method based on transfer learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810601343.1A CN108984613A (en) | 2018-06-12 | 2018-06-12 | A kind of defect report spanned item mesh classification method based on transfer learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108984613A true CN108984613A (en) | 2018-12-11 |
Family
ID=64541144
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810601343.1A Pending CN108984613A (en) | 2018-06-12 | 2018-06-12 | A kind of defect report spanned item mesh classification method based on transfer learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108984613A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109614489A (en) * | 2018-12-13 | 2019-04-12 | 大连海事大学 | It is a kind of to report severity recognition methods based on transfer learning and the Bug of feature extraction |
CN110598787A (en) * | 2019-09-12 | 2019-12-20 | 北京理工大学 | Software bug classification method based on self-defined step length learning |
CN110751186A (en) * | 2019-09-26 | 2020-02-04 | 北京航空航天大学 | Cross-project software defect prediction method based on supervised expression learning |
CN111723010A (en) * | 2020-06-12 | 2020-09-29 | 大连海事大学 | Software BUG classification method based on sparse cost matrix |
CN111966586A (en) * | 2020-08-05 | 2020-11-20 | 南通大学 | Cross-project defect prediction method based on module selection and weight updating |
WO2021103909A1 (en) * | 2019-11-27 | 2021-06-03 | 支付宝(杭州)信息技术有限公司 | Risk prediction method and apparatus, risk prediction model training method and apparatus, and electronic device |
WO2022058324A1 (en) | 2020-09-21 | 2022-03-24 | Roche Diagnostics Gmbh | A method for detecting and reporting an operation error in an in-vitro diagnostic system and an in-vitro diagnostic system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107025503A (en) * | 2017-04-18 | 2017-08-08 | 武汉大学 | Across company software failure prediction method based on transfer learning and defects count information |
CN107644057A (en) * | 2017-08-09 | 2018-01-30 | 天津大学 | A kind of absolute uneven file classification method based on transfer learning |
-
2018
- 2018-06-12 CN CN201810601343.1A patent/CN108984613A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107025503A (en) * | 2017-04-18 | 2017-08-08 | 武汉大学 | Across company software failure prediction method based on transfer learning and defects count information |
CN107644057A (en) * | 2017-08-09 | 2018-01-30 | 天津大学 | A kind of absolute uneven file classification method based on transfer learning |
Non-Patent Citations (2)
Title |
---|
陈琳: "基于机器学习的软件缺陷预测研究", 《万方数据》 * |
魏晓聪等: "面向迁移学习的文本特征对齐算法", 《计算机工程》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109614489A (en) * | 2018-12-13 | 2019-04-12 | 大连海事大学 | It is a kind of to report severity recognition methods based on transfer learning and the Bug of feature extraction |
CN109614489B (en) * | 2018-12-13 | 2022-11-18 | 大连海事大学 | Bug report severity recognition method based on transfer learning and feature extraction |
CN110598787A (en) * | 2019-09-12 | 2019-12-20 | 北京理工大学 | Software bug classification method based on self-defined step length learning |
CN110598787B (en) * | 2019-09-12 | 2021-06-08 | 北京理工大学 | Software bug classification method based on self-defined step length learning |
CN110751186A (en) * | 2019-09-26 | 2020-02-04 | 北京航空航天大学 | Cross-project software defect prediction method based on supervised expression learning |
WO2021103909A1 (en) * | 2019-11-27 | 2021-06-03 | 支付宝(杭州)信息技术有限公司 | Risk prediction method and apparatus, risk prediction model training method and apparatus, and electronic device |
CN111723010A (en) * | 2020-06-12 | 2020-09-29 | 大连海事大学 | Software BUG classification method based on sparse cost matrix |
CN111723010B (en) * | 2020-06-12 | 2024-02-23 | 大连海事大学 | Software BUG classification method based on sparse cost matrix |
CN111966586A (en) * | 2020-08-05 | 2020-11-20 | 南通大学 | Cross-project defect prediction method based on module selection and weight updating |
WO2022058324A1 (en) | 2020-09-21 | 2022-03-24 | Roche Diagnostics Gmbh | A method for detecting and reporting an operation error in an in-vitro diagnostic system and an in-vitro diagnostic system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108984613A (en) | A kind of defect report spanned item mesh classification method based on transfer learning | |
Trojanowska et al. | A methodology of improvement of manufacturing productivity through increasing operational efficiency of the production process | |
Shahzad et al. | Data mining based job dispatching using hybrid simulation-optimization approach for shop scheduling problem | |
CN104615986B (en) | The method that pedestrian detection is carried out to the video image of scene changes using multi-detector | |
Zahraee et al. | Simulation of manufacturing production line based on Arena | |
Fernandez-Viagas et al. | Exploring the benefits of scheduling with advanced and real-time information integration in Industry 4.0: A computational study | |
US11699106B2 (en) | Categorical feature enhancement mechanism for gradient boosting decision tree | |
CN104615730B (en) | A kind of multi-tag sorting technique and device | |
CN112990298B (en) | Key point detection model training method, key point detection method and device | |
Yang et al. | Prediction-guided distillation for dense object detection | |
CN105868269A (en) | Precise image searching method based on region convolutional neural network | |
CN115131655B (en) | Training method and device of target detection model and target detection method | |
US20210311440A1 (en) | Systems, Methods, and Media for Manufacturing Processes | |
JP2017022593A (en) | Verification device, verification method and verification program | |
Wu | Applying grey model to prioritise technical measures in quality function deployment | |
US11847187B2 (en) | Device identification device, device identification method, and device identification program | |
JPWO2019180868A1 (en) | Image generator, image generator and image generator | |
CN111950652A (en) | Semi-supervised learning data classification algorithm based on similarity | |
Flotzinger et al. | Building inspection toolkit: Unified evaluation and strong baselines for damage recognition | |
US20230280731A1 (en) | Production management system, production management method, and production management program | |
US20210192779A1 (en) | Systems, Methods, and Media for Manufacturing Processes | |
Wahab et al. | Production improvement in an aircraft manufacturing company using value stream mapping approach | |
KR101609292B1 (en) | Apparatus and method for managing a research and development project | |
EP4118512A1 (en) | Systems, methods, and media for manufacturing processes | |
KR101649913B1 (en) | Apparatus and method for managing a research and development project |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181211 |
|
RJ01 | Rejection of invention patent application after publication |