KR20100088399A

KR20100088399A - Apparatus and method for software faults prediction using metrics

Info

Publication number: KR20100088399A
Application number: KR1020090007596A
Authority: KR
Inventors: 채흥석; 김태연; 김윤규
Original assignee: 부산대학교 산학협력단
Priority date: 2009-01-30
Filing date: 2009-01-30
Publication date: 2010-08-09
Also published as: KR100987124B1

Abstract

PURPOSE: An apparatus and a method for software faults prediction using metrics are provided to change a measured metric value in other system, thereby applying fault prediction model to the other system. CONSTITUTION: Based on a normalized metric value of a fault prediction model and a normalization module(20), a fault prediction module predicts fault per class of test target. Based on fault information of a bug tracking system, a fault information collecting module(40) collects information about actual fault of a test objects. Based on information of the class having a predicted result value of the fault prediction model and a collected actual fault of the fault information collection module, a production force evaluation module(50) evaluates the forecastability of the fault prediction model.

Description

Apparatus and Method for Software Faults Prediction using Metrics

본 발명은 소프트웨어의 결함을 예측하거나 기존의 결함 예측 모형을 범용적으로 적용하기 위하여 예측력을 개선시키는 계산 장치 및 계산 방법에 관한 것이다.The present invention relates to a calculation apparatus and a calculation method for improving the predictive power in order to predict the defect of software or to universally apply the existing defect prediction model.

치열한 경쟁 속에서 사업을 영위하는 기업 입장에서는 시장에서 도태되지 않기 위하여 경쟁 회사와 비교하여 우위에 있는 소프트웨어 제품을 더 빠른 시간에 개발해야만 한다. 이와 같은 개발 환경의 영향으로 개발된 소프트웨어 내부에 결함이 포함된 채로 소비자에게 전달되는 일들이 빈번하게 일어나고 있다. 휴대폰 사례의 경우에 이용자들은 갑작스럽게 휴대폰이 꺼지는 경우를 경험하곤 한다. 이런 일을 경험하게 되면 고객은 일반적으로 제품에 대한 신뢰도에 의문을 가지게 되고 심각한 경우에 회사에 대한 불신으로 이어지게 된다.In the face of intense competition, companies need to develop software products that are superior to their competitors in a timely manner so that they do not get out of the market. Due to the influence of such a development environment, things are often delivered to consumers with defects in the developed software. In the case of cell phones, users often experience cases where the phone suddenly turns off. When this happens, customers generally have questions about the reliability of the product and, in serious cases, lead to distrust in the company.

따라서 제품이 개발된 후 배포하기 이전 단계에서 소프트웨어의 결함을 발견하는 것이 중요하다. 결함을 발견하는데 널리 사용되는 방법에는 정적 분석(Static Analysis), 코드 검사(Code Inspection), 테스팅(Testing) 등이 있다. 정적 분석은 소스코드를 자동으로 분석하여 결함이 발생될 부분을 알려주는 기법이며, 코드 검사는 개발자 이외의 사람이 직접 소스코드를 분석하여 결함이 있는 부분을 발견하는 것이며, 테스팅은 개발 소프트웨어에 특정 입력값을 전달 후 예상되는 결과값을 출력하는지를 확인하는 방법이다. 그러나 이와 같은 방법들은 개발 단계 후반, 즉 소스코드가 개발이 진행된 상태에서 수행할 수 있기 때문에 결함을 발견하더라도 그것을 수정하는데 많은 비용이 발생하게 된다.Therefore, it is important to detect software defects during the product development stage and before release. Popular methods for finding defects include static analysis, code inspection, and testing. Static analysis is a technique that analyzes the source code automatically and informs you where the defect will occur. Code inspection is the detection of defects by analyzing the source code directly by a person other than the developer. Testing is specific to development software. It is a method to check whether outputting expected result is output after passing input value. However, since these methods can be executed later in the development stage, that is, the source code is in development, even if a defect is found, it is expensive to fix it.

이 문제를 해결하기 위하여 지난 10년간 개발 단계 초기에 소프트웨어의 결함을 예측하는 연구들이 수행되었다. 이 연구들에서는 디자인 단계의 산출물을 대상으로 메트릭을 측정하여 결함과의 관계를 발견하였다. 이때 객체지향 패러다임을 사용한 소프트웨어 개발의 증가로 인하여 연구에 사용된 메트릭은 대부분 객체지향 메트릭이 였으며 소프트웨어의 크기 및 복잡도가 함께 사용되기도 하였다. 또한 연구자들은 예측 모형을 구축하는데 주로 로지스틱 회귀분석을 사용하였다. 로지스틱 회귀분석을 이용하면 결함이 발생하기 쉬운 클래스와 그렇지 않은 클래스를 분류할 수 있다. To solve this problem, studies have been conducted to predict software defects early in the development phase over the last decade. In these studies, metrics were measured for the outputs of the design phase to find relationships with defects. At this time, due to the increase of software development using the object-oriented paradigm, most of the metrics used in the research were object-oriented metrics, and the size and complexity of the software were also used. The researchers also used logistic regression mainly to build predictive models. Logistic regression can be used to classify classes that are prone to defects and those that do not.

특히 최근 3년간 Gyimothy, Zhou 그리고 Olague가 로지스틱 회귀모형을 사용하여 대규모 시스템의 결함을 예측하였다. 우선 Gyimothy는 대형 오픈 소스 소프트웨어인 모질라를 대상으로 결함 예측모형을 개발하였다. 결함 예측 모형을 개발하기 위한 기법으로 로지스틱 회귀분석과 함께 머신 러닝 기법을 적용하여 예측률을 높이고자 하는 시도를 하였다. 그 결과 로지스틱 회귀분석으로 개발된 예측모형은 정확성(Correctness)이 기계학습(Machine Learning)으로 개발된 모형에 비하여 높았다. 또한 Zhou는 NASA에서 개발한 소프트웨어를 대상으로 결함의 심각도별로 예측 모형을 개발하였다. 그 결과 로지스틱 회귀분석으로 만들어진 예측 모형은 높은 심각도의 결함보다 낮은 심각도의 결함에 대하여 예측률이 높았다. 또한 Olague는 반복적인 개발 혹은 기민한 개발 프로세스를 이용하여 개발된 시스템을 대상으로 CK, MOOD 메트릭 모음을 이용하여 결함 예측 모형을 만들었으며, CK 메트릭 모음을 이용한 결함 예측 모형의 예측률이 MOOD 메트릭 모음을 이용한 결함 예측 모형의 예측률보다 높았다.In particular, Gyimothy, Zhou and Olague have used logistic regression models to predict defects in large systems over the last three years. First, Gyimothy developed a defect prediction model for Mozilla, a large open source software. As a technique for developing a defect prediction model, an attempt was made to increase the prediction rate by applying the machine learning technique together with logistic regression. As a result, the predictive model developed by logistic regression analysis was higher in accuracy than the model developed by machine learning. Zhou also developed predictive models for defect severity for software developed by NASA. As a result, the predictive model produced by logistic regression had higher prediction rate for defects of lower severity than those of higher severity. In addition, Olague created defect prediction models using CK and MOOD metric collections for systems developed using iterative or agile development processes. It was higher than the prediction rate of the defect prediction model.

이와 같은 연구 결과가 개발 현장에서 폭 넓게 활용되기 위해서는 결함 예측 모형이 범용성을 가져야 한다. 그러나 기존의 연구들은 특정 소프트웨어에서 수집한 결함 및 메트릭 데이터를 바탕으로 결함 예측 모형을 구축한 후 이 모형을 동일한 소프트웨어에 적용하여 모형의 범용성에 관한 검증이 이루어지지 않았다. 따라서 연결 결함 예측 모형이 개발 현장에서 널리 사용되기 위해서 범용성을 가지는지 유무를 판별하기 위한 실험이 필요하다.In order for these findings to be widely used in the development site, the defect prediction model must be universal. However, the existing studies have established a defect prediction model based on defect and metric data collected from specific software, and then applied this model to the same software to verify the generality of the model. Therefore, in order to be widely used in the development site, an experiment is needed to determine whether the connection defect prediction model has universality.

본 발명은 기존의 정의된 결함 예측 모형을 범용적으로 타 시스템에 적용하기 위하는데 목적이 있다. 범용성을 통해서 얻을 수 있는 이점은 크게 두 가지이다 우선 소프트웨어의 개발 비용이 절감 된다. 왜냐하면 결함 예측 모형을 개발 시스템에 맞게 새로 개발하려면 결함 정보에 대한 관리를 위해 별도의 관리 시스템이 필요하다. 또한 결함 예측 모형의 개발에 필요한 통계적인 작업 절차와 도구가 필 요하기 때문에 다른 시스템의 결함 예측 모형을 사용하면 불필요한 개발 시스템 구축 비용이 발생 하지 않는다. 다음으로 소프트웨어 특성이 다른 새로운 시스템을 개발 할 경우 기존의 결함 정보는 존재하지 않기 때문에 이런 경우에 기존의 결함 예측 모형을 범용적으로 사용하여 효율적으로 결함 유무를 판별하는데 이점이 있다.An object of the present invention is to apply a conventionally defined defect prediction model to other systems in general. There are two major benefits to versatility. First of all, the cost of developing software is reduced. Because new development of the defect prediction model to the development system requires a separate management system for the management of defect information. In addition, the statistical work procedures and tools required for the development of the defect prediction model are required, so using the defect prediction model of other systems does not incur unnecessary development system cost. Next, when developing a new system with different software characteristics, existing defect information does not exist. In this case, there is an advantage in efficiently determining whether there is a defect using the existing defect prediction model.

따라서 본 발명은 상기와 같은 문제점을 해결하기 위해 안출한 것으로서, 측정된 메트릭 값의 변환과정을 통하여 기존의 결함 예측 모형이 대규모 시스템에서도 적합한 예측력을 가질 수 있는 메트릭을 이용한 소프트웨어 결함 예측 계산 장치 및 계산 방법을 제공하는데 그 목적이 있다.Accordingly, the present invention has been made to solve the above problems, the software defect prediction calculation device and calculation using the metric that the existing defect prediction model can have suitable prediction power even in a large system through the conversion process of the measured metric value The purpose is to provide a method.

본 발명의 다른 목적은 결함 예측 모형의 예측력을 높여서 결함이 발생할 것으로 예상되는 모듈에 소프트웨어 개발 자원을 집중 관리하여 개발 비용의 효율적인 관리가 가능한 메트릭을 이용한 소프트웨어 결함 예측 계산 장치 및 계산 방법을 제공하는데 있다.Another object of the present invention is to provide an apparatus and method for calculating a software defect prediction using a metric capable of efficiently managing development costs by intensively managing software development resources in a module that is expected to cause a defect by increasing the predictive power of the defect prediction model. .

상기와 같은 목적을 달성하기 위한 본 발명에 따른 메트릭을 이용한 소프트웨어 결함 예측 계산 장치의 특징은 실험대상인 소스 파일을 파싱하고 불필요한 정보를 제거한 후에 메트릭으로 측정하여 정제된 메트릭 측정값을 검출하는 메트릭 측정값 입력 모듈과, 상기 메트릭 측정값 입력 모듈에서 검출된 메트릭 측정값을 정규화 과정을 수행하거나, 또는 표준 정규화 과정을 수행하여 정규화된 메트릭 값을 산출하는 정규화 모듈과, 내제된 결함 예측 모형과 상기 정규화 모듈에서 정규 화된 메트릭 값을 바탕으로 실험대상의 클래스별 결함 유무를 예측한 결과값을 도출하는 결함 예측 모듈과, 버그 추적 시스템(Bug Tracking System)에 존재하는 결함 정보를 기반으로 소스 내용 중에서 결함 있는 것으로 등록된 클래스의 이름을 이용하여 실험대상의 실제 결함 유무에 대한 정보를 수집하는 결함 정보 수집 모듈과, 상기 결함 예측 모듈에서 결함 예측 모형에 의해서 예측된 결과값과 상기 결함 정보 수집 모듈에서 수집된 실제 결함이 있는 클래스의 정보를 바탕으로 결함 예측 모형의 예측력을 평가하는 예측력 평가 모듈을 포함하는데 있다.A feature of the software defect prediction calculation apparatus using the metric according to the present invention for achieving the above object is a metric measurement value for parsing the source file as an experiment and removing unnecessary information and then measuring the metric measurement value to detect the refined metric measurement value. A normalization module for performing a normalization process on the metric measurement values detected by the metric measurement input module, or performing a standard normalization process to calculate a normalized metric value, an implicit defect prediction model, and the normalization module Defect prediction module that derives the result of estimating the presence or absence of defects of each subject based on the normalized metric value in, and based on the defect information present in the bug tracking system. The actual name of the subject using the registered class name A defect prediction model that collects information about the presence and absence of a defect, and a defect prediction model based on the result value predicted by the defect prediction model in the defect prediction module and the information of the actual defective class collected in the defect information collection module. It includes a predictive power evaluation module for evaluating the predictive power of.

바람직하게 상기 예측력 평가 모듈의 결함 예측 모형의 예측력 평가 기준으로는 전체 클래스 중에서 예측 모형에 의하여 정확히 분류된 클래스가 차지하는 비율을 나타내는 정밀성(Precision)과, 예측 모형에 의하여 결함으로 분류된 전체 클래스 중에서 실제로도 결함인 클래스가 차지하는 비율을 나타내는 정확성(Correctness)과, 실제로 존재하는 총결함 중에서 예측 모형에서 결함으로 분류한 클래스에 존재하는 결함으로 비율을 나타내는 완전성(Completeness)을 사용하는 것을 특징으로 한다.Preferably, the predictive power evaluation criteria of the defect prediction model of the predictive power evaluation module include precision representing a ratio of a class correctly classified by the prediction model among all classes, and actually among all classes classified as defects by the prediction model. It is characterized by using the accuracy (Correctness) indicating the proportion of the class which is a defect, and the completeness (Completeness) indicating the ratio of defects existing in the class classified as defects in the prediction model among the total defects actually present.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 메트릭을 이용한 소프트웨어 결함 예측 계산 방법의 특징은 (a) 실험 대상인 소스 파일을 수집하고 메트릭으로 측정하여 정제된 메트릭 측정값을 검출하는 단계와, (b) 상기 검출된 메트릭 측정값을 일정한 크기로 나누어 단일화시켜 정규 분포화를 만드는 정규화 과정을 수행하거나, 또는 상기 정규 분포들을 하나의 표준 형태의 표준 정규 분포화를 만드는 표준 정규화 과정을 수행하여 정규화된 메트릭 값을 산출하는 단계와, (c) 상 기 산출된 정규화된 메트릭 값과 내제된 결함 예측 모형의 임계값을 기반으로 실험대상의 클래스별 결함 유무를 예측한 결과값을 도출하는 단계와, (d) 상기 (a) 내지 (c)단계와는 독립적으로, 실험대상의 실제 결함 유무에 대한 정보를 수집하는 단계와, (e) 상기 (c) 단계에서 예측된 클래스 정보와, 상기 (d) 단계에서 수집된 실제 결함 있는 클래스 정보를 바탕으로 결함 예측 모형의 예측력을 평가하는 단계를 포함하는데 있다.In order to achieve the above object, the software defect prediction calculation method using the metric according to the present invention is characterized by (a) collecting a source file that is an experiment target and measuring the metric to detect a refined metric measurement value, (b A metric normalized by performing a normalization process of dividing the detected metric measurement into a predetermined size and unifying it to make a normal distribution, or performing a standard normalization process of making the normal distributions in a standard form. Calculating a value, (c) deriving a result of predicting the presence or absence of a defect for each class of a test subject based on the normalized metric value and the threshold value of an implicit defect prediction model; ) Independent of the steps (a) to (c), collecting information about the actual defects of the test subject, and (e) in step (c) Based on the actual fault class information collected at the predicted class information and, (d) the step may comprises a step of evaluating the predictive power of the model fault prediction.

바람직하게 상기 (c) 단계는 수식

에서 클래스의 결함 발생 확률(

)을 계산하는 단계와, 상기 계산된 결함 발생 확률과 내제된 결함 예측 모형의 임계값을 비교하는 단계와, 상기 비교결과, 임계값이 결함 확률(

)과 같거나 큰 경우 결함이 있는 것으로 판단하고, 임계값보다 결함 확률(

) 이 작으면 결함이 없는 것으로 판단하는 단계를 포함하며, 이때,

는 독립변수인 객체지향 메트릭의 측정값,

은

의 회귀계수이며,

값의 범의는 0부터 1까지인 것을 특징으로 한다.Preferably step (c) is a formula

Probability of defects in a class in

), Comparing the calculated defect occurrence probability with a threshold value of an implicit defect prediction model, and as a result of the comparison, the threshold is a defect probability (

Is greater than or equal to), it is determined to be defective, and

) Is determined to be free from defects, wherein

Is a measure of an object-oriented metric that is an independent variable,

silver

Is the regression coefficient of,

The range of values is characterized by being from 0 to 1.

바람직하게 상기 수식에서 사용되는 로지값은 사용되는 결함 예측 모형에 따른 로지스틱 회귀 모형식에서 표현된 로지값을 적용하여 사용하는 것을 특징으로 한다.Preferably, the logistic value used in the above equation may be used by applying the logistic value expressed in the logistic regression model according to the defect prediction model used.

바람직하게 상기 (d) 단계는 이클립스 웹사이트에서 제공하는 버그질 라(bugzilla) 및 NASA와 PROMISE에서 제공하는 파일에서 결함 정보를 수집하는 단계와, 상기 수집된 결함 정보를 이용하여 소스 내용 중에서 결함이 있는 것으로 등록된 클래스의 이름을 추출하는 단계와, 상기 클래스의 이름이 한 번씩 추출될 때 마다 결함의 수를 1씩 증가시키는 단계와, 메트릭 측정값에 존재하지 않는 클래스나 잘못된 클래스의 이름이 추출되면 실제 결함 정보에서 해당 클래스의 이름을 제거하는 단계를 포함하는 것을 특징으로 한다.Preferably, the step (d) includes collecting defect information from bugzilla provided by the Eclipse website and files provided by NASA and PROMISE, and using the collected defect information to detect defects in the source content. Extracting the name of the class registered as being present, increasing the number of defects by one each time the name of the class is extracted, and extracting the name of a class or invalid class that does not exist in the metric measurement If it is characterized in that it comprises the step of removing the name of the class from the actual defect information.

바람직하게 상기 (e) 단계에서의 예측력 평가 기준은 정밀성(Precision), 정확성(Correctness) 그리고 완전성(Completeness)을 사용하며, 정확성과 완전성 사이의 절충관계(trade-off)를 통한 절충점을 임계값으로 결정하여 모델의 정확성 및 완전성을 계산하는 것을 특징으로 한다.Preferably, the criterion for evaluating predictive power in step (e) uses precision, correctness, and completeness, and uses a tradeoff between accuracy and completeness as a threshold. To determine the accuracy and completeness of the model.

이상에서 설명한 바와 같은 본 발명에 따른 메트릭을 이용한 소프트웨어 결함 예측 계산 장치 및 계산 방법은 다음과 같은 효과가 있다.Software defect prediction calculation apparatus and calculation method using the metric according to the present invention as described above has the following effects.

첫째, 타 시스템에서 측정된 메트릭 값을 변환하는 과정을 커여 결함 예측 모형을 타 시스템에 범용적으로 적용할 수 있는 효과가 있다.First, the process of converting the metric values measured in other systems is increased, so that the defect prediction model can be applied to other systems in general.

둘째, 측정된 메트릭 값의 변환과정 및 평가도구를 통하여 기존의 결함예측 모형의 예측력을 높일 수 있는 효과가 있다.Second, through the conversion process and evaluation tool of the measured metric value, it is possible to increase the predictive power of the existing defect prediction model.

셋째, 예측력이 높아짐에 따라 더욱 정확하게 결함이 발생할 것으로 예상되는 모듈에 소프트웨어 개발 자원을 집중 관리 할 수 있어 개발 비용의 효율적인 관리가 가능해지는 효과가 있다.Third, as the predictive power is increased, software development resources can be centrally managed in modules that are expected to cause defects more accurately, thereby enabling effective management of development costs.

본 발명의 다른 목적, 특성 및 이점들은 첨부한 도면을 참조한 실시예들의 상세한 설명을 통해 명백해질 것이다.Other objects, features and advantages of the present invention will become apparent from the following detailed description of embodiments with reference to the accompanying drawings.

본 발명에 따른 메트릭을 이용한 소프트웨어 결함 예측 계산 장치 및 계산 방법의 바람직한 실시예에 대하여 첨부한 도면을 참조하여 설명하면 다음과 같다. 그러나 본 발명은 이하에서 개시되는 실시예에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예는 본 발명의 개시가 완전하도록하며 통상의 지식을 가진자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다. 따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.Referring to the accompanying drawings, a preferred embodiment of a software defect prediction calculation apparatus and a calculation method using the metric according to the present invention will be described. However, the present invention is not limited to the embodiments disclosed below, but can be embodied in various forms, and only the present embodiments are intended to complete the disclosure of the present invention and to those skilled in the art to fully understand the scope of the invention. It is provided to inform you. Therefore, the embodiments described in the specification and the drawings shown in the drawings are only the most preferred embodiment of the present invention and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

도 1 은 본 발명의 실시예에 따른 메트릭을 이용한 소프트웨어 결함 예측 계산 장치의 구조를 나타낸 블록도이다.1 is a block diagram illustrating a structure of a software defect prediction calculation apparatus using metrics according to an embodiment of the present invention.

도 1과 같이, 소프트웨어 결함 예측 계산 장치는 메트릭 측정값 입력 모듈(10)과, 정규화 모듈(20)과, 결함 예측 모듈(30)과, 결함 정보 수집 모듈(40)과, 예측력 평가 모듈(50)을 포함한다.As shown in FIG. 1, the software defect prediction calculator includes a metric measurement value input module 10, a normalization module 20, a defect prediction module 30, a defect information collection module 40, and a predictive power evaluation module 50. ).

상기 메트릭 측정값 입력 모듈(10)은 실험대상인 소스 파일을 파싱하고 불필요한 정보를 제거한 후에 CK 메트릭으로 측정하여 정제된 메트릭 측정값을 검출한 후, 다음 모듈인 정규화 모듈(20)에 전달한다. 이때, 상기 메트릭 측정은 Together 2007을 사용하며, 측정값은 XML 파일 형태로 출력된다.The metric measurement input module 10 parses the source file, which is a subject of experiment, removes unnecessary information, measures the measured CK metric, detects the purified metric measurement, and then transfers the refined metric measurement to the normalization module 20, which is the next module. In this case, the metric measurement uses Together 2007, and the measured value is output in the form of an XML file.

상기 정규화 모듈(20)은 상기 메트릭 측정값 입력 모듈(10)에서 검출된 CK 메트릭 측정값을 일정한 크기로 나누어 단일화시켜 정규 분포화를 만드는 정규화 과정을 수행하거나, 또는 상기 정규 분포들을 하나의 표준 형태의 표준 정규 분포화를 만드는 표준 정규화 과정을 수행하여 정규화된 CK 메트릭 값을 산출한다.The normalization module 20 divides the CK metric measurement value detected by the metric measurement input module 10 into a predetermined size and performs a normalization process to create a normal distribution, or the normal distributions are in one standard form. A normalized CK metric value is produced by performing a standard normalization process that produces a standard normal distribution of.

상기 결함 예측 모듈(30)은 내제된 결함 예측 모형과 상기 정규화 모듈(20)에서 정규화된 CK 메트릭 값을 바탕으로 실험대상의 클래스별 결함 유무를 예측한 결과값을 도출한다. 상기 결함 예측 모형은 Olague, Yuming 그리고 Gyimothy 모형이고, 각 모형별로 클래스의 결함을 판별하기 위한 임계값이 있다. The defect prediction module 30 derives a result value of predicting the presence or absence of a defect for each class of the test subject based on the implicit defect prediction model and the CK metric value normalized by the normalization module 20. The defect prediction models are Olague, Yuming and Gyimothy models, and each model has a threshold value for determining a class defect.

상기 결함 정보 수집 모듈(40)은 버그 추적 시스템(Bug Tracking System)에 존재하는 결함 정보를 기반으로 소스 내용 중에서 결함 있는 것으로 등록된 클래스의 이름을 이용하여 실험대상의 실제 결함 유무에 대한 정보를 수집한다. 참고로 상기 결함 정보는 이클립스 웹사이트에서 제공하는 버그 추적 시스템인 버그질라(bugzilla) 및 NASA와 PROMISE에서 제공하는 파일에서 수집된 정보이다.The defect information collecting module 40 collects information on the actual defect presence or absence of a test subject by using a name of a class registered as a defect among source contents based on defect information existing in a bug tracking system. do. For reference, the defect information is information collected from bugzilla, a bug tracking system provided by the Eclipse website, and files provided by NASA and PROMISE.

상기 예측력 평가 모듈(50)은 상기 결함 예측 모듈(20)에서 결함 예측 모형에 의해서 예측된 결과값과 상기 결함 정보 수집 모듈(30)에서 수집된 실제 결함이 있는 클래스의 정보를 바탕으로 결함 예측 모형의 예측력을 평가한다. 이때 상기 결함 예측 모형의 예측력 평가 기준으로는 전체 클래스 중에서 예측 모형에 의하여 정확히 분류된 클래스가 차지하는 비율을 나타내는 정밀성(Precision)과, 예측 모형에 의하여 결함으로 분류된 전체 클래스 중에서 실제로도 결함인 클래스가 차지 하는 비율을 나타내는 정확성(Correctness)과, 그리고 실제로 존재하는 총결함 중에서 예측 모형에서 결함으로 분류한 클래스에 존재하는 결함으로 비율을 나타내는 완전성(Completeness)을 사용한다.The predictive power evaluation module 50 is based on the result predicted by the defect prediction model in the defect prediction module 20 and the defect prediction model based on the information of the actual defective class collected by the defect information collection module 30 Evaluate the predictive power of At this time, as the criterion for evaluating the predictive power of the defect prediction model, the precision representing the ratio of the class correctly classified by the prediction model among the entire classes, and the class that is actually the defect among all the classes classified as the defect by the prediction model are occupied. We use the correctness of the ratio, and the completeness of the proportion of defects in the class classified as defects in the predictive model.

이와 같이 구성된 본 발명에 따른 메트릭을 이용한 소프트웨어 결함 예측 계산 장치의 계산 방법을 첨부한 도면을 참조하여 상세히 설명하면 다음과 같다. 도 1과 동일한 참조부호는 동일한 기능을 수행하는 동일한 부재를 지칭한다. The calculation method of the software defect prediction calculation apparatus using the metric according to the present invention configured as described above will be described in detail with reference to the accompanying drawings. The same reference numerals as in FIG. 1 refer to the same members performing the same function.

도 2 는 본 발명의 실시예에 따른 메트릭을 이용한 소프트웨어 결함 예측 계산 방법을 설명하기 위한 흐름도이다. 2 is a flowchart illustrating a software defect prediction calculation method using metrics according to an embodiment of the present invention.

도 2를 참조하여 설명하면, 먼저 메트릭 측정값 입력모듈(10)은 실험 대상인 Eclipse 3.3 소스 파일에서 기본적인 자료를 수집하여 메트릭으로 측정하여 정제된 메트릭 측정값을 검출한다(S10). 상기 메트릭 측정은 Together 2007으로 하였다. 그러나 기존의 다른 모형 연구에서는 자체 개발한 메트릭 측정도구를 사용하거나 상용 도구로 측정하고 있다. 이처럼 메트릭 측정에 사용하는 측정도구는 본 발명의 기술적 사상의 범위 내에서 다양한 실시예가 가능함에 주의하여야 한다. 그리고 실험의 대상으로 선정된 공개 소프트웨어인 이클립스(Eclipse)는 IBM, 노키아 등이 지원을 하고 있는 자바 기반의 오픈 소스 프로젝트로서 JAVA, C++ 개발을 위한 IDE(Integrated Development Evironment) 도구이다.Referring to FIG. 2, first, the metric measurement input module 10 collects basic data from an Eclipse 3.3 source file, which is an experiment target, measures the metric, and detects purified metric measurement (S10). The metric measurement was set to Together 2007. However, other existing model studies use self-developed metric measuring tools or commercial tools. As described above, it should be noted that the measuring tool used for measuring the metric can be various embodiments within the scope of the technical idea of the present invention. Eclipse, the open source software selected for the experiment, is a Java-based open source project supported by IBM, Nokia, and others. It is an IDE (Integrated Development Evironment) tool for JAVA and C ++ development.

이어 정규화 모듈(20)은 상기 검출된 메트릭 측정값을 일정한 크기로 나누어 단일화시켜 정규 분포화를 만드는 정규화 과정을 수행하거나, 또는 상기 정규 분포 들을 하나의 표준 형태의 표준 정규 분포화를 만드는 표준 정규화 과정을 수행하여 정규화된 메트릭 값을 산출한다(S20).Subsequently, the normalization module 20 performs a normalization process of generating a normal distribution by dividing the detected metric measurement into a predetermined size and unifying it, or a standard normalization process of creating the standard normal distribution of one standard form. In operation S20, the normalized metric value is calculated.

그리고 결함 예측 모듈(30)은 상기 산출된 정규화된 메트릭 값과 내제된 결함 예측 모형의 임계값을 기반으로 실험대상의 클래스별 결함 유무를 예측한 결과값을 도출한다(S30). 이때 사용된 예측 모형은 Olague, Yuming 그리고 Gyimothy 모형이고, 각 결함 예측 모형별로 클래스의 결함을 판별하기 위한 임계값이 있다. The defect prediction module 30 derives a result value of predicting the presence or absence of a defect for each class of the test subject based on the calculated normalized metric value and the threshold value of the implicit defect prediction model (S30). Prediction models used are Olague, Yuming and Gyimothy models, and there are thresholds for determining the class defects for each defect prediction model.

이때, 실험 대상 클래스에서 결함 유무는 다음 수학식 1에서 클래스의 결함 발생 확률(

)을 계산한다. 즉, 결함 예측 모형은 개별 클래스에서 임계값이 결함 확률(

)과 같거나 큰 경우 결함이 있는 것으로 예측하고, 임계값보다 결함 확률(

) 이 작으면 결함이 없는 것으로 예측한다.At this time, the presence or absence of a defect in the test target class is the probability of occurrence of a defect of the class

). In other words, the defect prediction model shows that the threshold for each class

Is equal to or greater than), and it is assumed to be defective,

Smaller) predicts that there are no defects.

이때,

는 독립변수인 객체지향 메트릭의 측정값이며, 종속변수인

는 클래스에서 결함이 발견될 확률을 의미하며, 메트릭 측정값을 사용하여 계산된 확률이 임계값 이상이면 해당 클래스에는 결함이 있다고 간주한다.

값의 범의는 0부터 1까지이다.

은

의 회귀계수를 위미하며,

값이 크면 클수록 해당 독립변수가 클래스에서 결함을 발견할 확률에 영향력이 높다.At this time,

Is a measure of the object-oriented metric, which is an independent variable,

Is the probability of finding a defect in a class. If the probability calculated using a metric measure is above the threshold, the class is considered to have a defect.

The range of values is 0 through 1.

silver

The regression coefficient of,

The larger the value, the higher the probability that the independent variable will find a defect in the class.

또한, 상기 수학식 1에서 사용되는 로지값은 사용되는 예측 모형에 따라 각 각 정의되는데, 수학식 2는 로지스틱 회귀 모형식으로 표현한 Olague 모형을 표현한 것으로, Olague 모형을 사용하여 클래스의 결함을 판별하는 경우에는 수학식 2의 예측모형에서 사용되는 로지값(

)을 상기 수학식 1에 적용하여 클래스의 결함 발생 확률로 계산된다.In addition, the lodge values used in Equation 1 are each defined according to the predictive model used, and Equation 2 represents an Olague model represented by a logistic regression model, and uses the Olague model to determine a class defect. In this case, the logistic value used in the prediction model of Equation 2

) Is calculated as a probability of occurrence of a defect of a class by applying to Equation 1 above.

한편, 결함 정보 수집 모듈(40)은 상기 “S10”단계 내지 “S30”단계와는 독립적으로, 실험대상의 실제 결함 유무에 대한 정보를 수집한다. 이는 이클립스 웹사이트에서 제공하는 버그 추적 시스템인 버그질라에 존재하는 결함 정보를 URL을 통해서 수집하는데, 상기 URL로 제공하고 있는 결함 정보는 HTML 소스로부터 데이터를 수집한다. 그 외 결함 정보는 NASA와 PROMISE에서 제공하는 파일로 입수한다. 그리고 이렇게 수집된 정보를 이용하여 소스 내용 중에서 결함이 있는 것으로 등록된 클래스의 이름을 추출하며, 이름이 한 번씩 나올 때 마다 결함의 수를 1씩 증가시킨다. 그리고 메트릭 측정값에 존재하지 않는 클래스나 잘못된 클래스의 이름은 실제 결함 정보에서 해당 클래스의 이름을 제거한다(S40).On the other hand, the defect information collection module 40 collects information on the presence or absence of actual defects of the test subject, independently of the steps "S10" to "S30". It collects defect information that exists in Bugzilla, a bug tracking system provided by the Eclipse website through a URL. The defect information provided by the URL collects data from an HTML source. Other defect information is obtained from files provided by NASA and PROMISE. Using the collected information, the name of the class registered as defective is extracted from the source contents, and the number of defects is increased by 1 for each occurrence of the name. And the name of the class or wrong class that does not exist in the metric measurement value removes the name of the class from the actual defect information (S40).

이어 예측력 평가 모듈(50)은 상기 결함 예측 모듈(20)에서 결함 예측 모형 에 의해서 예측된 클래스 정보와, 상기 결함 정보 수집 모듈(30)에서 수집된 실제 결함 있는 클래스 정보를 바탕으로 결함 예측 모형의 예측력을 평가하고, 예측력 평가와, 각 영역에 속한 클래스의 메트릭 값과, 결함 유무를 파일로 출력한다(S50). 이때 예측력의 평가 기준으로는 정밀성(Precision), 정확성(Correctness) 그리고 완전성(Completeness)을 사용한다. Then, the predictive power evaluation module 50 may determine the class of the defect prediction model based on the class information predicted by the defect prediction model in the defect prediction module 20 and the actual defective class information collected by the defect information collection module 30. The predictive power is evaluated, and the predictive power evaluation, the metric value of the class belonging to each area, and the presence or absence of a defect are output to a file (S50). Precise, correctness and completeness are used for evaluating predictive power.

상기 정밀성은 전체 클래스 중에서 예측 모형에 의하여 정확히 분류된 클래스가 차지하는 비율로서, 다음 수학식 2와 같이 나타낼 수 있다.The precision is a ratio occupied by a class correctly classified by a prediction model among all classes, and may be expressed by Equation 2 below.

정밀성 = (A1+A4)/(A1+A2+A3+A4)Precision = (A1 + A4) / (A1 + A2 + A3 + A4)

그리고 상기 정확성은 예측 모형에 의하여 결함으로 분류된 전체 클래스 중에서 실제로도 결함인 클래스가 차지하는 비율로서, 다음 수학식 3과 같이 나타낼 수 있다.The accuracy is a ratio of a class that is actually a defect among all classes classified as defects by the prediction model, and may be expressed as in Equation 3 below.

정확성 = A4/(A2+A4)Accuracy = A4 / (A2 + A4)

또한 상기 완전성은 실제로 존재하는 총결함 중에서 예측 모형에서 결함으로 분류한 클래스에 존재하는 결함으로 비율로서, 다음 수학식 4와 같이 나타낼 수 있다. In addition, the completeness is a ratio of defects existing in the class classified as defects in the prediction model among the total defects actually present, and can be expressed as Equation 4 below.

완전성 = S2/(B1+B2)Integrity = S2 / (B1 + B2)

이때, A1~A4까지 영역은 분류된 자바 클래스들의 수이고, B1, B2는 해당 영 역에서 분류된 소스파일이 가지고 있는 실제 결함의 수이다.At this time, the areas A1 to A4 are the number of classified Java classes, and B1 and B2 are the actual number of defects of the source files classified in the corresponding area.

예를 들어 결함 예측 모형의 정밀성이 60%로 계산된 경우에는 100개의 클래스 중 60개의 클래스를 결함이 있거나 없는 클래스로 예측하였음을 나타낸다. 다음으로 정확성이 60%로 계산된 경우에는 100개의 결함이 있는 클래스에서 60개의 클래스를 결함이 있다고 예측하였음을 나타낸다. 마지막으로 완전성이 60%로 계산된 경우에는 100개의 결함 중 60개의 결함을 예측하였음을 나타낸다. 참고로 정밀성, 정확성, 완전성을 이용하여 결함 예측 모형을 평가할 때 주의해야 할 사항으로는 정확성과 완전성 사이에는 절충관계(trade-off)가 있으므로 도 3과 같이, 절충점을 임계값으로 결정하여 모델의 정확성 및 완전성을 계산한다.For example, if the precision of the defect prediction model is calculated as 60%, it indicates that 60 out of 100 classes are predicted as defective or missing classes. Next, when the accuracy is calculated at 60%, it indicates that 60 classes are predicted to be defective in 100 defective classes. Finally, when completeness is calculated at 60%, it indicates that 60 of 100 defects were predicted. For reference, when evaluating a defect prediction model using precision, accuracy, and completeness, there is a trade-off between accuracy and completeness. Therefore, as shown in FIG. Calculate accuracy and completeness.

이와 같은 계산 장치 및 계산 방법을 통해 기존의 결함 예측 모형이 대규모 시스템에서도 적합한 예측력을 가질 수 있는 결과를 나타낼 수 있음을 이하의 실험 결과에서 알 수 있다.It can be seen from the experimental results below that the calculation device and the calculation method can show the result that the existing defect prediction model can have suitable prediction power even in a large-scale system.

도 4a는 비정규화된 메트릭 값으로 계산한 결함 발생 확률의 분포를 나타낸 그래프이고, 도 4b는 메트릭 값을 표준 정규화 시킨 후 Gyimothy 모형이 계산한 결함 발생 확률의 분포를 나타낸 그래프이다. Figure 4a is a graph showing the distribution of defect occurrence probability calculated by the denormalized metric value, Figure 4b is a graph showing the distribution of defect occurrence probability calculated by the Gyimothy model after the standard normalized metric value.

도 4a와 같이 비정규화된 메트릭 값으로 계산한 결함 발생 확률의 분포를 나타내는 그래프와 다르게 도 4b의 그래프는 결함 발생 확률의 평균을 중심으로 좌우로 고르게 분포하고 있는 것을 확인 할 수 있다. 따라서 임계값의 변화에 따라서 모형의 예측력의 변동성이 부드럽게 변화되는 것을 기대할 수 있다. Unlike the graph showing the distribution of defect occurrence probability calculated with non-normalized metric values as shown in FIG. 4A, the graph of FIG. 4B may be evenly distributed from side to side based on the average of defect occurrence probability. Therefore, it can be expected that the variability of the predictive power of the model changes smoothly according to the change of the threshold value.

그리고 도 5는 NASA KCI 시스템의 표준 정규화된 메트릭 값을 이용하여 Gyimothy 모형으로 결함을 예측한 결과를 나타낸 그래프이다.FIG. 5 is a graph showing a result of predicting defects using a Gyimothy model using standard normalized metric values of the NASA KCI system.

도 5와 같이 비정규화된 메트릭 값을 이용한 예측력 결과 실험에서는 결함으로 예측하지 못하였으나 정규화된 메트릭 값을 이용한 예측에서는 결함 예측의 정확성은 70.7%, 완전성은 70.5%를 나타내었다. 이는 Yuming이 NASA KCI 데이터를 바탕으로 구축한 결함 예측 모형에서 주중된 예측력과 비슷한 결과를 보였다.As shown in FIG. 5, the predictive power result using the non-normalized metric value was not predicted as a defect in the experiment, but in the prediction using the normalized metric value, the accuracy of the defect prediction was 70.7% and the perfection was 70.5%. This result was similar to the main predictive power in the defect prediction model that Yuming built based on NASA KCI data.

또한 표 1은 메트릭 값을 정규화 시킨 후 결함 예측 모형을 실험 대상 시스템에 적용한 결과이다. 총 8번의 실험 중 6번에서 결함 예측 모형의 결과값을 해석할 수 있었다. 가장 주목해 본 모형은 Gyimothy 모형이다. 비정규화된 메트릭 값을 사용하여 예측한 실험에서 세 개의 시스템을 대상으로 결함 예측에 모두 실패하였으나 정규화된 메트릭 값을 사용해서는 세 개의 시스템 모두에서 결함을 예측하였다. Yuming의 모형에서도 Eclipse를 대상으로 결함 예측의 결과를 해설 할 수 있었다. 따라서 결함 예측 모형을 타 시스템에 적용할 경우에 먼저 결함 예측 모형이 계산한 결함 발생 확률 분포를 살펴보고 그 분포가 고르게 되어 있지 않은 경우에 메트릭 값을 표준 정규화시킨 후 시스템에 재적용해 보는 것이 예측력을 개선시킬 수 있는 방법으로 생각된다.Table 1 also shows the results of applying the defect prediction model to the test system after normalizing the metric values. In six of eight experiments, the results of the defect prediction model could be interpreted. The most notable model is the Gyimothy model. In experiments predicted using non-normalized metric values, all three systems failed to predict defects, but normalized metric values were used to predict defects in all three systems. In Yuming's model, we could explain the result of defect prediction in Eclipse. Therefore, when applying the defect prediction model to other systems, it is first necessary to examine the distribution of probability of occurrence of defects calculated by the defect prediction model, and if the distribution is not even, normalize the metric values and reapply them to the system. I think that can be improved.

결함예측모형Fault prediction model 실험대상시스템Test Target System 정밀성Precision 정확성accuracy 완전성completeness 절충값Compromise OlagueOlague 주장된 예측력Claimed predictive power -- 82%82% -- Eclipse 3.3Eclipse 3.3 76.8%76.8% 21.7%21.7% 21.7%21.7% 0.1050.105 NASA KCINASA KCI -- -- -- 0.0440.044 jEdit 4.0jEdit 4.0 -- -- -- 0.0390.039 YumingYuming 주장된 예측력Claimed predictive power 69.7%69.7% 61.4%61.4% 74.6%74.6% Eclipse 3.3Eclipse 3.3 38.3%38.3% 34.3%34.3% 34.3%34.3% 0.4880.488 NASA KCINASA KCI 실험하지 않음Do not experiment jEdit 4.0jEdit 4.0 60.4%60.4% 64.2%64.2% 64.2%64.2% 0.4210.421 GyimothyGyimothy 주장된 예측력Claimed predictive power 69.6%69.6% 72.6%72.6% 65.2%65.2% Eclipse 3.3Eclipse 3.3 82.4%82.4% 38.3%38.3% 38.3%38.3% 0.7160.716 NASA KCINASA KCI 71.7%71.7% 70.7%70.7% 70.5%70.5% 0.5790.579 jEdit 4.0jEdit 4.0 59.9%59.9% 63.7%63.7% 63.7%63.7% 0.3320.332

상기에서 설명한 본 발명의 기술적 사상은 바람직한 실시예에서 구체적으로 기술되었으나, 상기한 실시예는 그 설명을 위한 것이며 그 제한을 위한 것이 아님을 주의하여야 한다. 또한, 본 발명의 기술적 분야의 통상의 지식을 가진자라면 본 발명의 기술적 사상의 범위 내에서 다양한 실시예가 가능함을 이해할 수 있을 것이다. 따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다. Although the technical spirit of the present invention described above has been described in detail in a preferred embodiment, it should be noted that the above-described embodiment is for the purpose of description and not of limitation. In addition, those skilled in the art will understand that various embodiments are possible within the scope of the technical idea of the present invention. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

도 1 은 본 발명의 실시예에 따른 메트릭을 이용한 소프트웨어 결함 예측 계산 장치의 구조를 나타낸 블록도1 is a block diagram showing the structure of a software defect prediction calculation apparatus using metrics according to an embodiment of the present invention.

도 2 는 본 발명의 실시예에 따른 메트릭을 이용한 소프트웨어 결함 예측 계산 방법을 설명하기 위한 흐름도2 is a flowchart illustrating a software defect prediction calculation method using metrics according to an embodiment of the present invention.

도 3 은 본 발명의 실시예에 따른 메트릭을 이용한 소프트웨어 결함 예측 계산에서 결함 예측 모형의 예측력 평가시 정확성 및 완전성 사이의 절출관계를 나타낸 흐름도3 is a flowchart illustrating a cutoff relationship between accuracy and completeness in evaluating predictive power of a defect prediction model in software defect prediction calculation using metrics according to an embodiment of the present invention.

도 4a는 비정규화된 메트릭 값으로 계산한 결함 발생 확률의 분포를 나타낸 그래프4A is a graph showing the distribution of defect occurrence probability calculated from nonnormalized metric values.

도 4b는 메트릭 값을 표준 정규화시킨 후 Gyimothy 모형이 계산한 결함 발생 확률의 분포를 나타낸 그래프4B is a graph showing the distribution of defect occurrence probability calculated by the Gyimothy model after normalizing metric values.

도 5는 NASA KCI 시스템의 표준 정규화된 메트릭 값을 이용하여 Gyimothy 모형으로 결함을 예측한 결과를 나타낸 그래프5 is a graph showing the results of predicting defects using a Gyimothy model using standard normalized metric values of the NASA KCI system.

*도면의 주요부분에 대한 부호의 설명DESCRIPTION OF THE REFERENCE NUMERALS

10 : 메트릭 측정값 입력 모듈 20 : 정규화 모듈10: metric measurement input module 20: normalization module

30 : 결함 예측 모듈 40 : 결함 정보 수집 모듈30: defect prediction module 40: defect information collection module

50 : 예측력 평가 모듈50: predictive evaluation module

Claims

A metric measurement input module for parsing an experimental source file and measuring the metric to detect refined metric measurement;

A normalization module configured to normalize the metric measurement value detected by the metric measurement value input module or to perform a standard normalization process to calculate a normalized metric value;

A defect prediction module for deriving a result of predicting the presence or absence of defects for each class of the test subject based on the implicit defect prediction model and the metric value normalized in the normalization module,

A defect information collecting module that collects information on the actual defects of the subject by using the name of a class registered as defective among source contents based on defect information existing in a bug tracking system;

And a predictive power evaluation module for evaluating the predictive power of the defect prediction model based on the result predicted by the defect prediction model in the defect prediction module and the information of the actual defective class collected by the defect information collection module. Software defect prediction calculation device using the metric.

The method of claim 1,

The defect prediction model is any one of an Olague, Yuming and Gyimothy model, the software defect prediction calculation apparatus using a metric, characterized in that the threshold for determining the defect of the class is defined for each model.

The method of claim 1,

The defect information is a software defect prediction calculation device using a metric, characterized in that any one of the information collected from bugzilla (bugzilla) provided by the Eclipse website and files provided by NASA and PROMISE.

According to claim 1, Predictive power evaluation criteria of the defect prediction model of the predictive power evaluation module

Precision, which represents the proportion of the classes correctly classified by the prediction model among all classes,

Correctness, which represents the proportion of the classes that are actually defects among all the classes classified as defects by the prediction model,

Software defect prediction calculation apparatus using a metric, characterized in that the completeness that represents the percentage of the total defects actually present in the class classified as defects in the prediction model.

(a) collecting a source file that is an experiment target and measuring the measured metric to detect purified metric measurements;

(b) normalize by normalizing by dividing the detected metric measurement into a predetermined size and unifying to make a normal distribution; Calculating a calculated metric value,

(c) deriving a result of predicting the presence or absence of defects for each class of the test subject based on the calculated normalized metric value and the threshold value of the implicit defect prediction model;

(d) collecting information on the presence or absence of actual defects of the test subject, independently of steps (a) to (c);

(e) evaluating the predictive power of the defect prediction model based on the class information predicted in step (c) and the actual defective class information collected in step (d). How to calculate software defect predictions.

The method of claim 5, wherein step (c)

Equation

Probability of defects in a class in

),

Comparing the calculated defect occurrence probability with a threshold value of the implicit defect prediction model;

As a result of the comparison, the threshold is the probability of defect (

Is greater than or equal to), it is determined to be defective, and

) Is determined to be free from defects, wherein

Is a measure of an object-oriented metric that is an independent variable,

silver

Is the regression coefficient of,

Software flaw prediction calculation method using a metric, characterized in that the range of the value is from 0 to 1.

The method of claim 6,

The logarithmic value used in the equation is a software defect prediction calculation method using a metric, characterized in that to use the logarithm value expressed in the logistic regression model according to the used defect prediction model.

The method of claim 5, wherein step (d)

Collecting defect information from bugzilla provided by the Eclipse website and files provided by NASA and PROMISE,

Extracting a name of a class registered as defective from source contents using the collected defect information;

Increasing the number of defects by one each time the name of the class is extracted;

And extracting the name of the class that is not present in the metric measurement or the wrong class, removing the name of the class from the actual defect information.

The method of claim 5,

The criterion for evaluating predictive power in step (e) uses precision, correctness and completeness, and determines the tradeoff between the accuracy and completeness as a threshold. A method for calculating software defect prediction using metrics, which calculates the accuracy and completeness of the model.