CN110175726A - Cross-project defect prediction method based on migration analysis - Google Patents

Cross-project defect prediction method based on migration analysis Download PDF

Info

Publication number
CN110175726A
CN110175726A CN201910524720.0A CN201910524720A CN110175726A CN 110175726 A CN110175726 A CN 110175726A CN 201910524720 A CN201910524720 A CN 201910524720A CN 110175726 A CN110175726 A CN 110175726A
Authority
CN
China
Prior art keywords
project
code
defect
dimension
change
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910524720.0A
Other languages
Chinese (zh)
Other versions
CN110175726B (en
Inventor
余跃
张迅晖
毛新军
曾雅蓉
王涛
李志星
范强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201910524720.0A priority Critical patent/CN110175726B/en
Publication of CN110175726A publication Critical patent/CN110175726A/en
Application granted granted Critical
Publication of CN110175726B publication Critical patent/CN110175726B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Operations Research (AREA)
  • Strategic Management (AREA)
  • Pure & Applied Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a cross-project defect prediction method based on migration analysis, which is characterized in that the relevance among projects is considered according to the migration requirements of a cross-project defect prediction model in an open source community, the mobility of the defect prediction model among the projects is analyzed by utilizing information of various dimensions including submitting codes, submitting associated text information, submitting relevant historical information and the like in the open source community, and then the defect prediction method is formed. The method is provided to assist code examination and promote the rapid and healthy development of open source projects.

Description

A kind of spanned item mesh failure prediction method based on migration analysis
Technical field
The present invention designs a kind of spanned item mesh failure prediction method based on migration analysis, mainly comprising different dimensional between project Impact analysis of the relation factor of degree to spanned item mesh failure prediction effect, and formed based on analysis result and integrated voting method Spanned item mesh failure prediction method.
Background technique
Since pull request technology in 2010 proposes, the community Github is produced in the accumulation of 7 years more than 3,000 Ten thousand pull request, and still increase with swift and violent speed.However, the project development due to submitting pull request Personnel's programming experience is insufficient, development process is unreasonable, demand understanding is incorrect, and the pull request for causing it to submit can be introduced Various software defects, and then the sound development of software quality and software is had an impact.Open source community uses lightweight Process management tool since, the person that greatly reduces peripheral contributions participates in project contribution threshold, but increases open source projects simultaneously The difficulty of process products maintenance, goes to check whether newly-increased code can introduce new defect in artificial mode completely, not only time-consuming but also take Power.Therefore, automation failure prediction method is introduced, can be found with supplementary defect, mitigates examiner's workload, and then accelerate software Healthy and stable development.However traditional machine learning failure prediction method based on supervised learning is dependent on available enough Training sample, decision is done to Future Data to sum up Heuristics, this is new for the accumulation of not enough historical data It is disagreeableness for emerging open source projects, therefore it is contemplated that the migration by spanned item mesh bug prediction model solves the problems, such as this, Emerging items are helped to tide over cold-start phase.
The analysis of spanned item mesh bug prediction model migration, needs the correlation between consideration project, using in open source community The information of various dimensions including submitting code, submission associated text information, submission relevant history information etc., analysis project Between bug prediction model transportable property, and then formed failure prediction method.
Summary of the invention
It is an object of the invention to the migration demands for spanned item mesh bug prediction model in open source community, propose a kind of base In the spanned item mesh failure prediction method of migration analysis, the various factors for influencing model migration in this case is analyzed comprehensively, Spanned item mesh failure prediction, accuracy rate with higher are carried out in the way of integrated ballot.
To realize the above-mentioned technical purpose, technical scheme is as follows:
A kind of spanned item mesh failure prediction method based on migration analysis, comprising the following steps:
S1. the pull request quantity and open source category of language, selection for including according to open source projects in open source community are opened Project set of the open source projects as migration analysis in the community of source carries out data acquisition;
Wherein data acquisition content includes: project source code, history codes submission information, code submission change information, generation Code submits historical track, code that the file of change, code is submitted to submit the corresponding history developer quantity of file of change, generation Code submits the average time interval and defect information of change file;Wherein defect information includes defect report, defect report mark Remember position and the corresponding number of defect report, pull request, ReadMe file;
S2. the corresponding history codes of each open source projects extracted in migration analysis project set submit information and defect letter Breath, the code that data collected in S1 are divided into the code submission for introducing defect and are not introduced into defect are submitted;
S3. the code submission that defect is introduced obtained in S2 is extracted respectively in diffusion dimension, code dimension, coding change Code submission may introduce measurement factor in the project of defect under purpose dimension, text information dimension and history dimension, obtain generation Code, which is submitted, may introduce multidimensional measurement factor in the project of defect;
S4. the possibility extracted in the code for being introduced into defect obtained in S2 submission and S3 is utilized to introduce in the project of defect Multidimensional measurement factor utilizes failure prediction mould in the project of each project in random forest method training transfer analysis project set Type;
S5. it counts in migration analysis project set between disparity items, the shadow under project dimension, technology dimension and personnel's dimension Incidence relation multidimensional measurement factor between the project of failure prediction between the project of sound;
S6. bug prediction model, calculating in the project of each project in the migration analysis project set according to obtained in S4 Bug prediction model obtains the accuracy rate of sundry item failure prediction in the project of each project in migration analysis project set Failure prediction accuracy rate between project;
S7. it is lacked using between the project being calculated in incidence relation multidimensional measurement factor between the project extracted in S5 and S6 Predictablity rate is fallen into, regression analysis model is designed, obtains under project dimension, technology dimension and personnel's dimension to spanned item mesh defect The influence of prediction effect;
S8. according to being obtained in S7 under project dimension, technology dimension and personnel's dimension to spanned item mesh failure prediction effect Influence, treat prediction project carry out spanned item mesh prediction model selection, project to be predicted is obtained by way of collective vote Failure prediction result.
In the present invention, when flaw labeling position is 0 in defect report described in S1, indicates defect, indicated when marker bit is 1 Non-defective information.
In the present invention, the S2 includes:
S21. the corresponding code of the defect report that flaw labeling position is 0 is found to submit;
S22. it traverses code all in S21 to submit, submits information sifting to go out by history codes all repaired The code of defect is submitted;
S23. the code for finding all repaired defects in S22 submits corresponding code source file;
S24. the code source file found in step S23 is checked, corresponding code submission is considered when the last time is modified It is the code submission for introducing defect, it is considered as the code submission for being not introduced into defect that remaining code, which is submitted,.
In the present invention, the S3 includes:
S3.1. it spreads under dimension, it includes that code submits change that code submission, which may introduce measurement factor in the project of defect, Code source file quantity and coding change Distribution Entropy;
S3.2. under code dimension, it includes that code submits former generation that code submission, which may introduce measurement factor in the project of defect, The lines of code of the code line number of file, the lines of code newly increased and deletion;
S3.3. under coding change purpose dimension, it includes introducing that code submission, which may introduce measurement factor in the project of defect, The code of current defect submits the associated defect report of code submission whether repaired a defect and introduced current defect Quantity;
S3.4. under text information dimension, it includes that code is submitted that code submission, which may introduce measurement factor in the project of defect, A possibility that length and introducing defect of information;
S3.5. under history dimension, it includes code submitter that code submission, which may introduce measurement factor in the project of defect, History codes contribute number, code that the corresponding history developer quantity of code source file of change, this code is submitted to submit The code source file of change and last code submit the size of intersection and code between the code source file of change to submit change The average time interval of code source file.
Wherein, coding change Distribution Entropy indicates that code is submitted in each code source file changed and changes code line in S3.1 Several dispersion degrees, dispersion degree is higher, and expression code is more complicated, is more readily incorporated defect, and H (p) indicates coding change distribution Entropy, calculation method are shown in formula (1),
Wherein n indicates that code submits the code source file quantity of change, pkIndicate that k-th of code submits the code source of change File line number accounting.
Wherein, in S3.5, under history dimension, code submits the code that may be introduced into the project of defect in measurement factor to mention The corresponding history developer quantity of code source file of change is handed over, if a code submission has changed multiple code source documents Part, the value for selecting corresponding history developer quantity most is as final result;
This code submits the code source file of change and last code to submit intersection between the code source file of change Size indicate that calculation method is shown in formula (2) with S (c),
S (c)=| Filea∩Fileb| (2)
Wherein FileaIndicate that last code submits the code source file set of corresponding change, FilebIndicate this code Submit the code source file set of corresponding change;
Code submits the average time interval of change code source file, this code submits the time of change code source file Equispaced indicates that calculation method is shown in formula (3) with G (c),
Wherein n indicates that current code submits the code source file quantity of change, FiIndicate that code submits the file set of change Close i-th of file in F, G (Fi) indicate that this code is submitted to file FiChange and last-chance code source file code Time interval between submission.
In the present invention, the S5 includes,
S5.1. incidence relation measurement factor between the project of failure prediction between influence project under project dimension, including project institute Whether the person's of having type is identical, and whether project uses identical open source licensing, and whether project uses identical programming language, and project is retouched State the similarity of information, the similitude of project ReadMe file and the similitude of item code bibliographic structure;
S5.2. incidence relation measurement factor between the project of failure prediction between influence project under technology dimension, including project generation The difference of code amount size, project rely on the difference of library quantity, and project, which whether there is, directly relies on relationship, the library of the public dependence of project Defect report cross reference quantity in the comment of community where quantity and project;
S5.3. incidence relation measurement factor includes project core between the project of failure prediction between influence project under personnel's dimension The difference of developer's quantity, the difference of project peripheral contributions person's quantity participate in the personnel amount of two project development process jointly, Project participant's distribution of countries similarity;
Wherein, core developer, which refers to having, merges pull request or closes defect report or possess direct submission Code submits the developer of permission, and peripheral contributions person refers to other contributors in addition to core developer.
Wherein, project described in S5.2 relies on library, refers to other bank codes that a project development relies on, passes through calculating Project relies on the difference of library quantity, checks that calculation method is shown in formula (4) with the presence or absence of relationship is directly relied between project,
N (a, b)=| depa∩depb| (4)
Indicate that the public project of project a and b rely on the quantity in library with N (a, b), wherein depaThe project of expression project a according to Lai Ku set, depbThe project of expression project b relies on library set;
The quantity of defect report cross reference, refers to occurring in the comment of project in the comment of project place community The reference to another project number.
In the present invention, in the S6, failure prediction accuracy rate calculation method is shown in formula (5) between project,
Wherein, failure prediction accuracy rate between precision expression project, NrightIndicate each in migration analysis project set Bug prediction model is to the correct defects count of sundry item failure prediction, N in the project of projectallIndicate migration analysis project Defect sum in the project of each project in set.
In the present invention, the S8 includes:
S8.1. model selects, and forms spanned item using each open source projects in migration analysis project set and project to be predicted Mesh pair is predicted using regression model obtained in S7, and then obtains each open source projects pair in migration analysis project set The predictablity rate of project to be predicted selects preceding K open source projects in migration analysis project set according to accuracy rate;
S8.2. model decision integrates K open source projects in S8.1, using collective vote and the original that the minority is subordinate to the majority Then obtain the prediction final result that K open source projects treat each defect of prediction project.
Following technical effect can achieve using the present invention:
This method can comprehensively consider various factors in open source community, find the finger of incidence relation between comprehensive metric terms Mark factor, and by the method for regression analysis, it determines under spanned item mesh failure prediction scene, each index is for prediction effect Influence degree, and then obtain influence of the different dimensions to spanned item mesh bug prediction model migration, eventually by integrated ballot Mode forms spanned item mesh bug prediction model.The it is proposed of this method, auxiliary code examine, promote open source projects quickly health hair Exhibition.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the spanned item mesh failure prediction method based on migration analysis of the present invention;
Fig. 2 is to extract to introduce in defective item in a kind of spanned item mesh failure prediction method based on migration analysis of the present invention The flow chart of multidimensional measurement factor;
Fig. 3 is failure prediction between extraction project in a kind of spanned item mesh failure prediction method based on migration analysis of the present invention Project between incidence relation multidimensional measurement factor flow chart.
Specific embodiment
As shown in Figure 1, a kind of spanned item mesh failure prediction method flow diagram based on migration analysis, specific as follows:
S1. open source projects in open source community are obtained.The pull request number for including according to open source projects in open source community Amount and open source category of language are chosen open source projects in open source community (can choose some hot topic open source projects in open source community) and are made For the project set of migration analysis, data acquisition is carried out.
Wherein data acquisition content includes: project source code, history codes submission information, code submission change information, generation Code submits historical track, code that the file of change, code is submitted to submit the corresponding history developer quantity of file of change, generation Code submits the average time interval and defect information of change file;Wherein defect information includes defect report, defect report mark Remember position and the corresponding number of defect report, pull request, ReadMe file.
Wherein: when flaw labeling position is 0 in the defect report, indicating defect, indicate non-defective letter when marker bit is 1 Breath.
S2. data classification, the code for obtaining introducing defect are submitted.Extract each open source item in migration analysis project set The corresponding history codes of mesh submit information and defect information, data collected in S1 are divided into introduce defect code submit and The code for being not introduced into defect is submitted.The specific implementation steps are as follows for it:
S21. the corresponding code of the defect report that flaw labeling position is 0 is found to submit;
S22. it traverses code all in S21 to submit, submits information sifting to go out by history codes all repaired The code of defect is submitted;
S23. the code for finding all repaired defects in S22 submits corresponding code source file;
S24. the code source file found in step S23 is checked, corresponding code submission is considered when the last time is modified It is the code submission for introducing defect, it is considered as the code submission for being not introduced into defect that remaining code, which is submitted,.
S3. it extracts and introduces multidimensional measurement factor in defective item.The code that defect is introduced obtained in S2 is submitted into difference It is possible to extract the code submission under diffusion dimension, code dimension, coding change purpose dimension, text information dimension and history dimension Measurement factor in the project of defect is introduced, multidimensional measurement factor in the project of defect may be introduced by obtaining code submission.Specific step It is rapid as shown in Figure 2.
S3.1. measurement factor in the project that statistical item A may introduce defect in the case where spreading dimension.It spreads under dimension, generation It includes the code source file quantity and coding change that code submits change that code, which is submitted and may introduce measurement factor in the project of defect, Distribution Entropy, wherein coding change Distribution Entropy indicates that code is submitted and changes the discrete of lines of code in each code source file of change Degree, dispersion degree is higher, and expression code is more complicated, is more readily incorporated defect, and H (p) indicates coding change Distribution Entropy, calculating side Method is shown in formula (1),
Wherein n indicates that code submits the code source file quantity of change, pkIndicate that k-th of code submits the code source of change File line number accounting.
S3.2. measurement factor in the project that statistical item A may introduce defect under code dimension.Under code dimension, generation It includes the line number of code file before code is submitted, the code line newly increased that code, which is submitted and may introduce measurement factor in the project of defect, Number and the lines of code deleted.
S3.3. measurement factor in the project that statistical item A may introduce defect under coding change purpose dimension.Code is more Change under purpose dimension, code, which is submitted to introduce the code that measurement factor in the project of defect includes introducing current defect and submit, is The no code repaired a defect and introduced current defect submits associated defect report quantity.
S3.4. measurement factor in the project that statistical item A may introduce defect under text information dimension.Text information dimension Under degree, code submit may introduce measurement factor in the project of defect include code submit information length and introduce defect can It can property.
S3.5. measurement factor in the project that statistical item A may introduce defect under history dimension.Under history dimension, generation Code submission may introduce history codes contribution number, the code that measurement factor in the project of defect includes code submitter and submit more The corresponding history developer quantity of the code source file changed, this code submit the code source file and last code of change The size of intersection and code between the code source file of change is submitted to submit the average time interval for changing code source file.
Wherein, in the S3.5, under history dimension, code submits the generation that may be introduced into the project of defect in measurement factor Code submits the corresponding history developer quantity of code source file of change, if a code submission has changed multiple code sources File, the value for selecting corresponding history developer quantity most is as final result;
This code submits the code source file of change and last code to submit intersection between the code source file of change Size indicate that calculation method is shown in formula (2) with S (c),
S (c)=| Filea∩Fileb| (2)
Wherein FileaIndicate that last code submits the code source file set of corresponding change, FilebIndicate this code Submit the code source file set of corresponding change;
Code submits the average time interval of change code source file, this code submits the time of change code source file Equispaced indicates that calculation method is shown in formula (3) with G (c),
Wherein n indicates that current code submits the code source file quantity of change, FiIndicate that code submits the file set of change Close i-th of file in F, G (Fi) indicate that this code is submitted to file FiChange and last-chance code source file code Time interval between submission.
S4. using bug prediction model in classification method training program.It is mentioned using the code for introducing defect obtained in S2 Hand over and S3 in the possibility extracted introduce multidimensional measurement factor in the project of defect, utilize random forest method training transfer analysis Bug prediction model in the project of each project in project set.
S5. incidence relation multidimensional measurement factor between the project of failure prediction between extraction project.Count migration analysis Item Sets In conjunction between disparity items, it is associated between the project of failure prediction between influence project under project dimension, technology dimension and personnel's dimension Relationship multidimensional measurement factor.Specific steps are as shown in Figure 3.
S5.1. incidence relation measurement factor between the project of failure prediction between influence project under statistical item dimension.
Count in migration analysis project set between disparity items, under project dimension between influence project failure prediction project Between incidence relation measurement factor, including whether owner of the project's type identical, and whether project uses identical open source licensing, project Whether in identical programming language, the similarity of item description information, the similitude of project ReadMe file and project generation, are used The similitude of code bibliographic structure.
S5.2. incidence relation measurement factor between the project of failure prediction between influence project under statistical technique dimension.
Count in migration analysis project set between disparity items, under technology dimension between influence project failure prediction project Between incidence relation measurement factor, the difference including item code amount size, project relies on the difference of library quantity, and project whether there is Relationship is directly relied on, defect report cross reference number in the comment of community where the library quantity and project of the public dependence of project Amount.The project relies on library, refers to other bank codes that a project development relies on, and relies on library quantity by calculating project Difference checks that calculation method is shown in formula (4) with the presence or absence of relationship is directly relied between project,
N (a, b)=| depa∩depb| (4)
Indicate that the public project of project a and b rely on the quantity in library with N (a, b), wherein depaThe project of expression project a according to Lai Ku set, depbThe project of expression project b relies on library set;
The quantity of defect report cross reference, refers to occurring in the comment of project in the comment of project place community The reference to another project number.
S5.3. incidence relation measurement factor between the project of failure prediction between influence project under statistician's dimension.
Count in migration analysis project set between disparity items, under personnel's dimension between influence project failure prediction project Between incidence relation measurement factor include project core developer's quantity difference, the difference of project peripheral contributions person's quantity, jointly Participate in the personnel amount of two project development process, project participant's distribution of countries similarity;
Wherein, core developer, which refers to having, merges pull request or closes defect report or possess direct submission Code submits the developer of permission, and peripheral contributions person refers to other contributors in addition to core developer.
S6. bug prediction model spanned item mesh prediction effect is calculated.It is every in the migration analysis project set according to obtained in S4 Bug prediction model in the project of a project, bug prediction model in the project of each project in computation migration analysis project set To the accuracy rate of sundry item failure prediction, failure prediction accuracy rate between project is obtained.
Wherein, in the S6, failure prediction accuracy rate calculation method is shown in formula (5) between project,
Wherein, failure prediction accuracy rate between precision expression project, NrightIndicate each in migration analysis project set Bug prediction model is to the correct defects count of sundry item failure prediction, N in the project of projectallIndicate migration analysis project Defect sum in the project of each project in set.
S7. regression analysis model is constructed, influence 107 of each measurement to spanned item mesh defect model migration effect is analyzed.Benefit With failure prediction accuracy rate between the project being calculated in incidence relation multidimensional measurement factor between the project extracted in S5 and S6, Regression analysis model is designed, is obtained under project dimension, technology dimension and personnel's dimension to the shadow of spanned item mesh failure prediction effect It rings.
S8. spanned item mesh failure prediction 108.It is right under project dimension, technology dimension and the personnel's dimension according to being obtained in S7 The influence of spanned item mesh failure prediction effect treats the selection that prediction project carries out spanned item mesh prediction model, passes through collective vote Mode obtains the failure prediction result of project to be predicted.
Wherein, the S8 includes:
S8.1. model selects, and forms spanned item using each open source projects in migration analysis project set and project to be predicted Mesh pair is predicted using regression model obtained in S7, and then obtains each open source projects pair in migration analysis project set The predictablity rate of project to be predicted selects preceding K open source projects in migration analysis project set according to accuracy rate;
S8.2. model decision integrates K open source projects in S8.1, using collective vote and the original that the minority is subordinate to the majority Then obtain the prediction final result that K open source projects treat each defect of prediction project.

Claims (10)

1. a kind of spanned item mesh failure prediction method based on migration analysis, it is characterised in that:
S1. the pull request quantity and open source category of language for including according to open source projects in open source community, choose open source society Project set of the open source projects as migration analysis in area carries out data acquisition;
Wherein data acquisition content includes: project source code, history codes submission information, information is changed in code submission, code mentions Historical track, code is handed over to submit the file of change, code that the corresponding history developer quantity of file of change, code is submitted to mention Hand over the average time interval and defect information of change file;Wherein defect information includes defect report, defect report marker bit Number corresponding with defect report, pull request, ReadMe file;
S2. the corresponding history codes of each open source projects extracted in migration analysis project set submit information and defect information, The code that data collected in S1 are divided into the code submission for introducing defect and are not introduced into defect is submitted;
S3. the code submission that defect is introduced obtained in S2 is extracted respectively in diffusion dimension, code dimension, coding change purpose Code submission may introduce measurement factor in the project of defect under dimension, text information dimension and history dimension, obtain code and mention Friendship may introduce multidimensional measurement factor in the project of defect;
S4. multidimensional in the project for being introduced into the possibility introducing defect extracted in the code submission and S3 of defect obtained in S2 is utilized Measurement factor utilizes bug prediction model in the project of each project in random forest method training transfer analysis project set;
S5. item is influenced under project dimension, technology dimension and personnel's dimension between disparity items in statistics migration analysis project set Incidence relation multidimensional measurement factor between the project of failure prediction between mesh;
S6. bug prediction model, computation migration in the project of each project in the migration analysis project set according to obtained in S4 Bug prediction model obtains project to the accuracy rate of sundry item failure prediction in the project of each project in analysis project set Between failure prediction accuracy rate;
S7. pre- using defect between the project being calculated in incidence relation multidimensional measurement factor between the project extracted in S5 and S6 Accuracy rate is surveyed, regression analysis model is designed, obtains under project dimension, technology dimension and personnel's dimension to spanned item mesh failure prediction The influence of effect;
S8. according to being obtained in S7 under project dimension, technology dimension and personnel's dimension to the shadow of spanned item mesh failure prediction effect It rings, treats the selection that prediction project carries out spanned item mesh prediction model, lacking for project to be predicted is obtained by way of collective vote Fall into prediction result.
2. a kind of spanned item mesh failure prediction method based on migration analysis as described in claim 1, it is characterised in that:
When flaw labeling position is 0 in defect report described in S1, defect is indicated, indicate non-defective information when marker bit is 1.
3. a kind of spanned item mesh failure prediction method based on migration analysis as claimed in claim 2, it is characterised in that:
The S2 includes:
S21. the corresponding code of the defect report that flaw labeling position is 0 is found to submit;
S22. it traverses all code in S21 to submit, submits information sifting to go out all repaired defects by history codes Code submit;
S23. the code for finding all repaired defects in S22 submits corresponding code source file;
S24. the code source file found in step S23 is checked, corresponding code submission is considered as drawing when the last time is modified The code for entering defect is submitted, and it is considered as the code submission for being not introduced into defect that remaining code, which is submitted,.
4. a kind of spanned item mesh failure prediction method based on migration analysis as described in claim 1, it is characterised in that:
The S3 includes:
S3.1. it spreads under dimension, it includes the generation that code submits change that code, which is submitted and may introduce measurement factor in the project of defect, Code source file quantity and coding change Distribution Entropy;
S3.2. under code dimension, it includes code text before code is submitted that code submission, which may introduce measurement factor in the project of defect, The lines of code of the line number of part, the lines of code newly increased and deletion;
S3.3. under coding change purpose dimension, it includes introducing currently that code submission, which may introduce measurement factor in the project of defect, The code of defect submits the associated defect report quantity of code submission whether repaired a defect and introduced current defect;
S3.4. under text information dimension, it includes that code submits information that code submission, which may introduce measurement factor in the project of defect, Length and introduce defect a possibility that;
S3.5. under history dimension, code submission may introduce the history that measurement factor in the project of defect includes code submitter Code contributes number, code that the corresponding history developer quantity of code source file of change, this code is submitted to submit change Code source file and last code submit the size of intersection and code between the code source file of change to submit change code The average time interval of source file.
5. a kind of spanned item mesh failure prediction method based on migration analysis as described in claim 1, it is characterised in that:
Coding change Distribution Entropy indicates that code is submitted and changes the discrete of lines of code in each code source file of change in S3.1 Degree, dispersion degree is higher, and expression code is more complicated, is more readily incorporated defect, and H (p) indicates coding change Distribution Entropy, calculating side Method is shown in formula (1),
Wherein n indicates that code submits the code source file quantity of change, pkIndicate that k-th of code submits the code source file of change Line number accounting.
6. a kind of spanned item mesh failure prediction method based on migration analysis as claimed in claim 4, it is characterised in that:
In the S3.5, under history dimension, code submits the code that may be introduced into the project of defect in measurement factor to submit more The corresponding history developer quantity of the code source file changed is selected if a code submission has changed multiple code source files The most value of corresponding history developer quantity is selected as final result;
Intersection is big between the code source file that this code submits the code source file of change and last code submission to change Small to be indicated with S (c), calculation method is shown in formula (2),
S (c)=| Filea∩Fileb| (2)
Wherein FileaIndicate that last code submits the code source file set of corresponding change, FilebIndicate that this code is submitted The code source file set of corresponding change;
Code submits the average time interval of change code source file, this code submits the time of change code source file average Interval indicates that calculation method is shown in formula (3) with G (c),
Wherein n indicates that current code submits the code source file quantity of change, FiIndicate that code is submitted in the file set F of change I-th of file, G (Fi) indicate that this code is submitted to file FiChange and last-chance code source file code submit Between time interval.
7. a kind of spanned item mesh failure prediction method based on migration analysis as described in claim 1, it is characterised in that:
Include in the S5,
S5.1. incidence relation measurement factor between the project of failure prediction between influence project under project dimension, including owner of the project Whether type is identical, and whether project uses identical open source licensing, and whether project uses identical programming language, item description letter The similarity of breath, the similitude of project ReadMe file and the similitude of item code bibliographic structure;
S5.2. incidence relation measurement factor between the project of failure prediction between influence project under technology dimension, including item code amount The difference of size, project rely on the difference of library quantity, and project, which whether there is, directly relies on relationship, the library number of the public dependence of project Defect report cross reference quantity in the comment of community where amount and project;
S5.3. incidence relation measurement factor includes the exploitation of project core between the project of failure prediction between influence project under personnel's dimension The difference of person's quantity, the difference of project peripheral contributions person's quantity participate in the personnel amount of two project development process, project jointly Participant's distribution of countries similarity;
Wherein, core developer, which refers to having merging pull request or closing defect report or possess, directly submits code The developer of permission is submitted, peripheral contributions person refers to other contributors in addition to core developer.
8. a kind of spanned item mesh failure prediction method based on migration analysis as claimed in claim 6, it is characterised in that:
Project described in S5.2 relies on library, refers to other bank codes that a project development relies on, and is relied on by calculating project The difference of library quantity checks that calculation method is shown in formula (4) with the presence or absence of relationship is directly relied between project,
N (a, b)=| depa∩depb| (4)
Indicate that the public project of project a and b rely on the quantity in library with N (a, b), wherein depaThe project of expression project a relies on library collection It closes, depbThe project of expression project b relies on library set;
The quantity of defect report cross reference, refers to pair occurred in the comment of project in the comment of project place community The number of the reference of another project.
9. a kind of spanned item mesh failure prediction method based on migration analysis as described in claim 1, it is characterised in that:
In the S6, failure prediction accuracy rate calculation method is shown in formula (5) between project,
Wherein, failure prediction accuracy rate between precision expression project, NrightIndicate each project in migration analysis project set Project in bug prediction model to the correct defects count of sundry item failure prediction, NallIndicate migration analysis project set In each project project in defect sum.
10. a kind of spanned item mesh failure prediction method based on migration analysis as described in claim 1, it is characterised in that:
The S8 includes:
S8.1. model selects, and forms spanned item mesh pair using each open source projects in migration analysis project set and project to be predicted, It is predicted using regression model obtained in S7, and then obtains in migration analysis project set each open source projects to be predicted The predictablity rate of project selects preceding K open source projects in migration analysis project set according to accuracy rate;
S8.2. model decision integrates K open source projects in S8.1, is obtained using collective vote and the principle that the minority is subordinate to the majority The prediction final result of each defect of prediction project is treated to K open source projects.
CN201910524720.0A 2019-06-18 2019-06-18 Cross-project defect prediction method based on migration analysis Active CN110175726B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910524720.0A CN110175726B (en) 2019-06-18 2019-06-18 Cross-project defect prediction method based on migration analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910524720.0A CN110175726B (en) 2019-06-18 2019-06-18 Cross-project defect prediction method based on migration analysis

Publications (2)

Publication Number Publication Date
CN110175726A true CN110175726A (en) 2019-08-27
CN110175726B CN110175726B (en) 2021-03-26

Family

ID=67697436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910524720.0A Active CN110175726B (en) 2019-06-18 2019-06-18 Cross-project defect prediction method based on migration analysis

Country Status (1)

Country Link
CN (1) CN110175726B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781086A (en) * 2019-10-23 2020-02-11 南京大学 Cross-project defect influence analysis method based on program dependency relationship and symbolic analysis
CN111158964A (en) * 2019-11-26 2020-05-15 北京邮电大学 Disk failure prediction method, system, device and storage medium
CN114328277A (en) * 2022-03-11 2022-04-12 广东省科技基础条件平台中心 Software defect prediction and quality analysis method, device, equipment and medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021115A (en) * 2016-06-06 2016-10-12 重庆大学 Non-supervision defect prediction method based on probabilities
CN106569954A (en) * 2016-11-08 2017-04-19 南京航空航天大学 Method based on KL divergence for predicting multi-source software defects
WO2017131263A1 (en) * 2016-01-29 2017-08-03 한국과학기술원 Hybrid instance selection method using nearest neighboring point for cross project defect prediction
CN107025503A (en) * 2017-04-18 2017-08-08 武汉大学 Across company software failure prediction method based on transfer learning and defects count information
CN107133176A (en) * 2017-05-09 2017-09-05 武汉大学 A kind of spanned item mesh failure prediction method based on semi-supervised clustering data screening
CN108171485A (en) * 2018-02-01 2018-06-15 中国人民解放军国防科技大学 Cross-project reviewer recommendation method based on software association library
CN108446711A (en) * 2018-02-01 2018-08-24 南京邮电大学 A kind of Software Defects Predict Methods based on transfer learning
CN106156633B (en) * 2016-06-23 2018-11-23 扬州大学 The risk analysis method of software-oriented modification

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017131263A1 (en) * 2016-01-29 2017-08-03 한국과학기술원 Hybrid instance selection method using nearest neighboring point for cross project defect prediction
CN106021115A (en) * 2016-06-06 2016-10-12 重庆大学 Non-supervision defect prediction method based on probabilities
CN106156633B (en) * 2016-06-23 2018-11-23 扬州大学 The risk analysis method of software-oriented modification
CN106569954A (en) * 2016-11-08 2017-04-19 南京航空航天大学 Method based on KL divergence for predicting multi-source software defects
CN107025503A (en) * 2017-04-18 2017-08-08 武汉大学 Across company software failure prediction method based on transfer learning and defects count information
CN107133176A (en) * 2017-05-09 2017-09-05 武汉大学 A kind of spanned item mesh failure prediction method based on semi-supervised clustering data screening
CN108171485A (en) * 2018-02-01 2018-06-15 中国人民解放军国防科技大学 Cross-project reviewer recommendation method based on software association library
CN108446711A (en) * 2018-02-01 2018-08-24 南京邮电大学 A kind of Software Defects Predict Methods based on transfer learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
余跃 等: "TransferringWell-Trained Models for Cross-Project Issue", 《INTERNETWARE 2018》 *
张洋洋 等: "基于迁移学习的跨项目软件缺陷预测", 《计算机技术与发展》 *
毛发贵 等: "基于实例迁移的跨项目软件缺陷预测", 《计算机科学与探索》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781086A (en) * 2019-10-23 2020-02-11 南京大学 Cross-project defect influence analysis method based on program dependency relationship and symbolic analysis
CN110781086B (en) * 2019-10-23 2022-02-08 南京大学 Cross-project defect influence analysis method
CN111158964A (en) * 2019-11-26 2020-05-15 北京邮电大学 Disk failure prediction method, system, device and storage medium
CN114328277A (en) * 2022-03-11 2022-04-12 广东省科技基础条件平台中心 Software defect prediction and quality analysis method, device, equipment and medium

Also Published As

Publication number Publication date
CN110175726B (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN110807328B (en) Named entity identification method and system for legal document multi-strategy fusion
Chen et al. Sentimoji: an emoji-powered learning approach for sentiment analysis in software engineering
Tsur et al. A frame of mind: Using statistical models for detection of framing and agenda setting campaigns
CN109829166B (en) People and host customer opinion mining method based on character-level convolutional neural network
CN110175726A (en) Cross-project defect prediction method based on migration analysis
CN108717433A (en) A kind of construction of knowledge base method and device of programming-oriented field question answering system
CN106021410A (en) Source code annotation quality evaluation method based on machine learning
CN110532379A (en) A kind of electronics information recommended method of the user comment sentiment analysis based on LSTM
CN107291902A (en) Automatic marking method is checked in a kind of popular contribution based on hybrid classification technology
CN107145514B (en) Chinese sentence pattern classification method based on decision tree and SVM mixed model
CN112051986B (en) Code search recommendation device and method based on open source knowledge
Sofyan et al. The philosophy of sport and physical education: Four decade publication trends via scientometric evaluation
Chi et al. Establish a patent risk prediction model for emerging technologies using deep learning and data augmentation
KR101179613B1 (en) Method of automatic patent document categorization adjusting association rules and frequent itemset
CN116186422A (en) Disease-related public opinion analysis system based on social media and artificial intelligence
CN109543038B (en) Emotion analysis method applied to text data
Sandberg Socially mediated issue ownership
Kirchner et al. Researching alignment research: Unsupervised analysis
Ahmed et al. Evaluating the effectiveness of author-count based metrics in measuring scientific contributions
Ahmed et al. Context based emotion recognition from bengali text using transformers
CN108470035A (en) A kind of entity-quotation correlation sorting technique based on differentiation mixed model
CN115600602B (en) Method, system and terminal device for extracting key elements of long text
Zhang et al. Mining evolutionary topic patterns in community question answering systems
CN114118097A (en) Online comment emotion evaluation method and system for urban public space
CN114547294A (en) Rumor detection method and system based on comprehensive information of propagation process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant