KR102488409B1

KR102488409B1 - Repair method and apparatus for automatic bug localization using deep learning algorithm

Info

Publication number: KR102488409B1
Application number: KR1020200184812A
Authority: KR
Inventors: 이병정; 양근석
Original assignee: 서울시립대학교 산학협력단
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2023-01-13
Also published as: KR20220093760A

Abstract

버그 정정 장치는 프로그램의 소스 코드를 입력받는 입력부, 오토 인코더(AutoEncoder) 및 CNN(Convolution Neural Network)에 기초하여 상기 소스 코드 및 상기 프로그램에 대한 검증 정보로부터 버그가 존재하는 버기 라인(buggy line)의 위치를 식별하는 버그 위치 식별부 및 상기 버기 라인의 코드를 시퀀스 적대적 생성 네트워크(Sequence Generative Adversarial Network) 기반의 정정 모델에 입력하여 상기 버그가 정정된 정정 코드를 생성하는 버그 정정부를 포함한다.The bug correction device detects a buggy line where a bug exists from verification information about the source code and the program based on an input unit for receiving the source code of the program, an autoencoder, and a Convolution Neural Network (CNN). and a bug location identification unit for identifying a location and a bug correction unit for generating a correction code in which the bug is corrected by inputting the code of the buggy line into a correction model based on a sequence adversarial network.

Description

Correction method and apparatus through automatic bug location identification using deep learning algorithm {REPAIR METHOD AND APPARATUS FOR AUTOMATIC BUG LOCALIZATION USING DEEP LEARNING ALGORITHM}

본 발명은 딥러닝 알고리즘을 적용한 자동 버그 위치 식별을 통한 정정 방법 및 장치에 관한 것이다.The present invention relates to a correction method and apparatus through automatic bug location identification using a deep learning algorithm.

프로그램의 내용 중에 잘못된 코드가 들어있는 경우, 이러한 결함에 의해 프로그램의 오류나 오작동을 일으키는 원인을 버그(Bug)라고 한다. 소프트웨어의 사이즈 및 복잡도가 증가함에 따라, 불가피하게 크고 작은 버그들이 발생한다.If there is an erroneous code in the contents of the program, the cause of the error or malfunction of the program by such a defect is called a bug. As the size and complexity of software increases, bugs large and small inevitably occur.

이클립스(Eclipse), 모질라(Mozilla)와 같은 오픈 소스 프로젝트에서는 평균적으로 하루에 약 350 건의 버그 리포트가 제출되고 있으며, 개발자들은 소프트웨어를 유지 및 보수하는 과정에 있어서 대부분의 시간을 디버깅하는 데에 할애하고 있다.On average, about 350 bug reports are submitted per day in open source projects such as Eclipse and Mozilla, and developers spend most of their time debugging in the process of maintaining and repairing software. there is.

따라서, 소프트웨어의 유지 및 보수에 지속적으로 많은 시간과 비용이 소요된다는 문제점이 있었다. 이를 해결하기 위해, 버그 정정을 자동화하는 기술에 대한 연구가 진행되고 있다. 예를 들어, 유전 프로그래밍을 기반으로 버그 정정을 자동화하는 방법이 있다.Accordingly, there is a problem in that a lot of time and money are continuously required for maintenance and repair of the software. To solve this problem, research on technology for automating bug correction is being conducted. For example, there are ways to automate bug fixes based on genetic programming.

또한, 종래의 버그 위치를 식별하는 방법은, 버그 리포트의 첨부 파일과 확장 쿼리로 버그 리포트의 정보를 확장하였다. 그러나, 학습 데이터가 확장되어도 버그 위치를 식별하는 성능이 개선되지 않는다는 한계가 있었다.In addition, the conventional method of identifying a bug location has expanded the information of the bug report with an attachment of the bug report and an extended query. However, there is a limitation that the performance of identifying bug locations is not improved even if the training data is expanded.

한편, 적대적 생성 네트워크(Generative Adversarial Network; GAN)는 생성 모델과 식별 모델이 서로 경쟁하는 대립적 프로세스를 통해 발전된다. 적대적 생성 네트워크는 주로, 실제 예제와 매우 유사한 결과물을 생성하는 데에 이용되고 있다.On the other hand, a generative adversarial network (GAN) develops through an adversarial process in which a generative model and an identification model compete with each other. Adversarial generative networks are mainly used to generate outputs very similar to real examples.

한국공개특허 제 10-2019-0089615호 (2019.07.31. 공개)Korean Patent Publication No. 10-2019-0089615 (published on July 31, 2019)

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 프로그램의 오류 소스 코드를 입력받고, 오토 인코더 및 CNN에 기초하여 소스 코드 파일을 식별한다. 그리고 동일한 입력에 대해 프로그램에 대한 검증 정보로부터 버기 라인(buggy line)의 위치를 식별하고, 버기 라인의 코드를 시퀀스 적대적 생성 네트워크 기반의 정정 모델에 입력하여 버그가 정정된 정정 코드를 생성하는 방법을 제공하고자 한다.The present invention is to solve the problems of the prior art described above, receives an error source code of a program, and identifies a source code file based on an auto encoder and a CNN. And a method for identifying the location of a buggy line from verification information about a program for the same input and inputting the code of the buggy line into a sequence adversarial generation network-based correction model to generate a corrected code in which the bug is corrected. want to provide

오토 인코더 및 정정 모델을 학습시킴으로써 버그 위치 식별의 정확성 및 버그 정정의 성능이 점차 개선되는 방법을 제공하고자 한다.By training the auto-encoder and correction model, we intend to provide a method in which the accuracy of bug location identification and the performance of bug correction are gradually improved.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the present embodiment is not limited to the technical problems described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 버그 정정 장치에 있어서, 프로그램의 소스 코드를 입력받는 입력부, 오토 인코더(AutoEncoder) 및 CNN(Convolution Neural Network)에 기초하여 상기 소스 코드 및 상기 프로그램에 대한 검증 정보로부터 버그가 존재하는 버기 라인(buggy line)의 위치를 식별하는 버그 위치 식별부 및 상기 버기 라인의 코드를 시퀀스 적대적 생성 네트워크(Seq-GAN) 기반의 정정 모델에 입력하여 상기 버그가 정정된 정정 코드를 생성하는 버그 정정부를 포함할 수 있다.As a means for achieving the above-described technical problem, an embodiment of the present invention is a bug correction device based on an input unit for receiving a program source code, an autoencoder, and a convolution neural network (CNN). A bug location identification unit that identifies the location of a buggy line where a bug exists from the source code and verification information about the program, and the code of the buggy line is converted to a correction model based on a sequence adversarial generation network (Seq-GAN) It may include a bug correction unit that generates a correction code in which the bug is corrected by inputting the code.

일 실시예에서, 상기 정정 코드에 기초하여 상기 프로그램을 변환하는 변환부 및 상기 변환된 프로그램의 소스 코드에 대하여 적합성 테스트를 수행하는 테스트부를 더 포함할 수 있다.In one embodiment, it may further include a conversion unit that converts the program based on the corrected code and a test unit that performs a conformance test on the source code of the converted program.

일 실시예에서, 상기 검증 정보는 상기 프로그램에 대한 버그 리포트 및 스택 트레이스(Stack trace)를 포함할 수 있다.In one embodiment, the verification information may include a bug report and a stack trace for the program.

일 실시예에서, 상기 버그 위치 식별부는 상기 소스 코드 및 상기 검증 정보로부터 하나 이상의 특징을 추출하고, 상기 추출된 하나 이상의 특징을 오토 인코더의 인코더에 입력하고, 상기 인코더로부터 출력된 잠재 벡터를 상기 CNN에 입력하여 상기 CNN의 출력으로부터 상기 버기 라인의 위치를 식별할 수 있다.In one embodiment, the bug location identification unit extracts one or more features from the source code and the verification information, inputs the extracted one or more features to an encoder of an auto encoder, and converts the latent vector output from the encoder into the CNN. It is possible to identify the position of the buggy line from the output of the CNN by inputting to .

일 실시예에서, 상기 정정 모델은 소스 코드를 생성하는 RNN 기반의 생성 모델 및 상기 생성 모델에 의해 생성된 소스 코드의 정상 여부를 식별하는 CNN 기반의 식별 모델을 포함할 수 있다.In one embodiment, the correction model may include an RNN-based generative model for generating source code and a CNN-based identification model for identifying whether or not the source code generated by the generative model is normal.

일 실시예에서, 상기 정정 모델을 학습시키는 학습부를 더 포함하고, 상기 학습부는 상기 RNN 기반의 생성 모델이 정상 또는 비정상의 소스 코드를 생성하고, 상기 CNN 기반의 식별 모델이 상기 정상 또는 비정상의 소스 코드의 정상 여부를 식별하도록 상기 식별 모델을 학습시키고, 상기 오토 인코더의 입력값과 출력값이 동일하도록 상기 오토 인코더를 학습시킬 수 있다.In one embodiment, a learning unit for learning the correction model is further included, wherein the learning unit generates a normal or abnormal source code for the RNN-based generation model and the normal or abnormal source code for the CNN-based identification model. The identification model may be trained to identify whether a code is normal, and the auto-encoder may be trained to have the same input value and output value of the auto-encoder.

일 실시예에서, 상기 버그 정정부는 상기 프로그램의 소스 코드에 기초하여 공통 단어 사전 및 사용자 단어 사전을 생성할 수 있다.In one embodiment, the bug correction unit may generate a common word dictionary and a user word dictionary based on the source code of the program.

일 실시예에서, 상기 변환부는 상기 버기 라인에 상기 정정 코드를 적용하고, 변수명 복원 기술을 이용하여 상기 프로그램을 변환할 수 있다.In an embodiment, the conversion unit may apply the correction code to the buggy line and convert the program using a variable name recovery technique.

일 실시예에서, 상기 테스트부는 복수의 테스트 케이스에 기초하여 상기 적합성 테스트를 수행하고, 상기 변환된 프로그램의 소스 코드가 상기 복수의 테스트 케이스 전부를 통과하는 경우에 버그 정정이 적합한 것으로 판정할 수 있다.In one embodiment, the test unit performs the conformance test based on a plurality of test cases, and when the source code of the converted program passes all of the plurality of test cases, it may be determined that the bug correction is appropriate. .

본 발명의 다른 실시예는, 버그 정정 방법에 있어서, 프로그램의 소스 코드를 입력받는 단계, 오토 인코더(AutoEncoder) 및 CNN(Convolution Neural Network)에 기초하여 상기 소스 코드 및 상기 프로그램에 대한 검증 정보로부터 버그가 존재하는 버기 라인(buggy line)의 위치를 식별하는 단계 및 상기 버기 라인의 코드를 시퀀스 적대적 생성 네트워크(Seq-GAN) 기반의 정정 모델에 입력하여 상기 버그가 정정된 정정 코드를 생성하는 단계를 포함할 수 있다.In another embodiment of the present invention, in the bug correction method, the step of receiving the source code of a program, based on an autoencoder and CNN (Convolution Neural Network), a bug from the source code and verification information for the program Identifying the location of a buggy line where the buggy line exists and generating a correction code in which the bug is corrected by inputting the code of the buggy line into a correction model based on a sequence adversarial generation network (Seq-GAN). can include

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described means for solving the problems is only illustrative and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 프로그램의 소스 코드를 입력받고, 오토 인코더 및 CNN에 기초하여 소스 코드 및 프로그램에 대한 검증 정보로부터 버기 라인(buggy line)의 위치를 식별하고, 버기 라인의 코드를 시퀀스 적대적 생성 네트워크 기반의 정정 모델에 입력하여 버그가 정정된 정정 코드를 생성하는 방법을 제공할 수 있다.According to any one of the above-described problem solving means of the present invention, the source code of the program is input, and the position of the buggy line is identified from the source code and verification information for the program based on the auto encoder and CNN, It is possible to provide a method of generating a corrected code in which a bug is corrected by inputting the code of the buggy line into a correction model based on a sequence adversarial generation network.

또한, 학습 데이터가 확장됨에 따라 오토 인코더를 이용하여 버기 라인의 위치를 식별하는 방법의 정확성이 개선될 수 있다.Also, as the learning data is expanded, the accuracy of the method of identifying the location of the buggy line using the auto-encoder may be improved.

또한, 적대적 생성 네트워크 기반의 정정 모델에 강화 학습 정책을 적용하여 정정 성능이 점차 향상되는 버그 정정 방법을 제공할 수 있다.In addition, it is possible to provide a bug correction method in which correction performance is gradually improved by applying a reinforcement learning policy to a correction model based on an adversarial generation network.

또한, 자동화된 버그 정정 방법을 제공함으로써 프로그램의 디버깅에 소요되는 인력 및 비용을 절감할 수 있다.In addition, manpower and costs required for program debugging can be reduced by providing an automated bug correction method.

도 1은 본 발명의 일 실시예에 따른 버그 정정 장치의 구성도이다.
도 2는 종래의 버그 위치 식별 방법과 본 발명의 일 실시예에 따른 버그 위치 식별 방법의 결과를 비교한 도면이다.
도 3은 종래의 버그 정정 방법과 본 발명의 일 실시예에 따른 버그 정정 방법의 결과를 비교한 도면이다.
도 4는 본 발명의 일 실시예에 따른 버그 정정 방법의 순서도이다.1 is a block diagram of a bug correction device according to an embodiment of the present invention.
2 is a diagram comparing results of a conventional bug location identification method and a bug location identification method according to an embodiment of the present invention.
3 is a diagram comparing results of a conventional bug correction method and a bug correction method according to an embodiment of the present invention.
4 is a flowchart of a bug correction method according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail so that those skilled in the art can easily practice the present invention with reference to the accompanying drawings. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, this means that it may further include other components, not excluding other components, unless otherwise stated, and one or more other characteristics. However, it should be understood that it does not preclude the possibility of existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, a "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware. On the other hand, '~ unit' is not limited to software or hardware, and '~ unit' may be configured to be in an addressable storage medium or configured to reproduce one or more processors. Therefore, as an example, '~unit' refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. Functions provided within components and '~units' may be combined into smaller numbers of components and '~units' or further separated into additional components and '~units'. In addition, components and '~units' may be implemented to play one or more CPUs in a device or a secure multimedia card.

이하에서 언급되는 "네트워크"는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다.The "network" referred to below refers to a connection structure capable of exchanging information between nodes such as terminals and servers, such as a local area network (LAN) and a wide area network (WAN). , the Internet (WWW: World Wide Web), wired and wireless data communications networks, telephone networks, and wired and wireless television communications networks. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi, Bluetooth communication, infrared communication, ultrasonic communication, visible light communication (VLC: Visible Light Communication), LiFi, and the like, but are not limited thereto.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.In this specification, some of the operations or functions described as being performed by a terminal or device may be performed instead by a server connected to the terminal or device. Likewise, some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the corresponding server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 버그 정정 장치의 구성도이다. 도 1을 참조하면, 버그 정정 장치(100)는 입력부(110), 버그 위치 식별부(120), 버그 정정부(130), 학습부(140), 변환부(150) 및 테스트부(160)를 포함할 수 있다.1 is a block diagram of a bug correction device according to an embodiment of the present invention. Referring to FIG. 1 , the bug correction device 100 includes an input unit 110, a bug location identification unit 120, a bug correction unit 130, a learning unit 140, a conversion unit 150, and a test unit 160. can include

버그 정정 장치(100)는 프로그램의 소스 코드에 존재하는 버그를 정정할 수 있다. 버그 정정 장치(100)에 의해 버그를 정정함으로써 프로그램의 오류 또는 오작동을 방지할 수 있다The bug correction device 100 may correct a bug existing in the source code of a program. Errors or malfunctions of the program can be prevented by correcting the bug by the bug correction device 100.

버그 정정 장치(100)는 예를 들어, 먼저 소스 코드에서 버그가 존재하는 버기 라인(buggy line)의 위치를 식별(Bug Localization)하고, 버기 라인에서 버그를 제거하는 버그 정정(Bug Repair)을 수행할 수 있다.For example, the bug correction device 100 first identifies the location of a buggy line in the source code (Bug Localization) and performs bug repair to remove the bug from the buggy line. can do.

입력부(110)는 프로그램의 소스 코드를 입력받을 수 있다. 프로그램의 소스 코드는 하나 이상의 라인을 포함하고, 각 라인은 하나 이상의 단어(token)를 포함할 수 있다.The input unit 110 may receive a source code of a program. The source code of a program contains one or more lines, and each line may contain one or more words (tokens).

버그 위치 식별부(120)는 프로그램의 소스 코드 및 프로그램에 대한 검증 정보로부터 버그가 존재하는 버기 라인의 위치를 식별할 수 있다. 프로그램에 대한 검증 정보는 예를 들어, 프로그램에 대한 버그 리포트 및 스택 트레이스(Stack trace)를 포함할 수 있다.The bug location identification unit 120 may identify the location of the buggy line where the bug exists from the source code of the program and verification information about the program. Verification information about the program may include, for example, a bug report and a stack trace about the program.

버그 위치 식별부(120)는 프로그램의 소스 코드 및 프로그램에 대한 검증 정보로부터 하나 이상의 특징을 추출할 수 있다.The bug location identification unit 120 may extract one or more features from the source code of the program and verification information about the program.

예를 들어, 버그 위치 식별부(120)는 rVSM(revised Vector Space Model) 기법을 이용하여 버그 리포트와 유사한 소스 코드의 파일을 추적할 수 있다. 버그 위치 식별부(120)는 버그 리포트 및 소스 코드의 파일에 자연어 처리 기술 및 카멜표기법(CamelCase)을 적용할 수 있다.For example, the bug location identification unit 120 may track a source code file similar to a bug report by using a Revised Vector Space Model (rVSM) technique. The bug location identification unit 120 may apply natural language processing technology and CamelCase to the bug report and source code files.

자연어 처리 기술은 예를 들어, 문장의 토큰화, 정지 단어의 제거, 또는 특수 문자의 제거 처리를 포함할 수 있다.Natural language processing techniques may include, for example, tokenization of sentences, removal of stop words, or removal of special characters.

카멜표기법(CamelCase)을 적용함으로써, 복합 문자가 분리될 수 있다. 예를 들어, "CreatedQuery"와 같은 함수명은 카멜표기법에 의하여 "Created" 및 "Query"로 분리될 수 있다.By applying CamelCase, complex characters can be separated. For example, a function name such as "CreatedQuery" can be separated into "Created" and "Query" by camel notation.

예를 들어, 버그 위치 식별부(120)는 BRTracer를 통해 버그 리포트 및 스택 트레이스 간의 연관성을 분석할 수 있다.For example, the bug location identification unit 120 may analyze a correlation between a bug report and a stack trace through BRTracer.

버그 위치 식별부(120)는 오토 인코더(AutoEncoder) 및 CNN(Convolution Neural Network)에 기초하여 버기 라인의 위치를 식별할 수 있다.The bug location identification unit 120 may identify the location of the buggy line based on an autoencoder and a convolution neural network (CNN).

버그 위치 식별부(120)는 추출된 하나 이상의 특징을 오토 인코더의 인코더에 입력하여 잠재 벡터를 출력할 수 있다. 버그 위치 식별부(120)는 오토 인코더로부터 출력된 잠재 벡터를 CNN에 입력할 수 있다.The bug location identification unit 120 may output a latent vector by inputting one or more extracted features to an encoder of an auto encoder. The bug location identification unit 120 may input the latent vector output from the auto encoder to the CNN.

예를 들어, 버그 위치 식별부(120)는 CNN을 이용하여 버그 리포트와 소스 코드 사이의 랭크 스코어를 출력할 수 있다. 버그 위치 식별부(120)는 CNN으로부터 출력된 랭크 스코어에 기초하여 버기 라인의 위치를 식별할 수 있다.For example, the bug location identification unit 120 may output a rank score between a bug report and source code using CNN. The bug location identification unit 120 may identify the location of the buggy line based on the rank score output from the CNN.

도 2는 종래의 버그 위치 식별 방법과 본 발명의 일 실시예에 따른 버그 위치 식별 방법의 결과를 비교한 도면이다.2 is a diagram comparing results of a conventional bug location identification method and a bug location identification method according to an embodiment of the present invention.

도 2의 (a) 내지 (c)는, Tomcat, AspectJ, Birt, Eclipse, JDT 및 SWT를 포함하는 베이스라인에 있어서, 버그 위치 식별의 정확도를 그래프로 도시한 것이다. 도 2의 (a)는 각 방법에 따른 버그 위치 식별의 평균적인 정확도를 도시하고, 도 2의 (b)는 각 방법의 MRR 퍼포먼스의 정확도를 도시하고, 도 2의 (c)는 각 방법의 MAP 퍼포먼스의 정확도를 도시한다.Figure 2 (a) to (c), in the baseline including Tomcat, AspectJ, Birt, Eclipse, JDT and SWT, shows the accuracy of bug location identification as a graph. Figure 2(a) shows the average accuracy of bug location identification according to each method, Figure 2(b) shows the accuracy of MRR performance of each method, and Figure 2(c) shows the accuracy of each method Shows the accuracy of MAP performance.

도 2의 (a) 내지 (c)를 참조하면, 종래의 버그 위치 식별 방법과 비교하여, 본 발명에 따른 방법에 의할 때 버그 위치 식별의 정확도가 개선된 것을 확인할 수 있다.Referring to (a) to (c) of FIG. 2 , it can be confirmed that the accuracy of identifying a bug location is improved by the method according to the present invention compared to the conventional bug location identification method.

버그 정정부(130)는 프로그램의 소스 코드에 기초하여 공통 단어 사전(Common Token Dictionary) 및 사용자 단어 사전(Common Token Dictionary)을 생성할 수 있다.The bug correction unit 130 may generate a common token dictionary and a user common token dictionary based on the source code of the program.

버그 정정부(130)는 프로그램의 소스 코드에 포함되는 각 단어를 공통 단어 또는 사용자 단어로 구분함으로써 공통 단어 사전 및 사용자 단어 사전을 생성할 수 있다.The bug correcting unit 130 may generate a common word dictionary and a user word dictionary by classifying each word included in the program source code into a common word or a user word.

공통 단어 사전에는 프로그램의 소스 코드에 일반적으로 사용되는 공통 단어가 포함될 수 있다. 예를 들어, 공통 단어 사전에는 타입, 키워드, 세미콜론(;), 라이브러리 함수에 관련된 단어들이 포함될 수 있다.The common word dictionary may include common words commonly used in the source code of the program. For example, the common word dictionary may include words related to types, keywords, semicolons (;), and library functions.

사용자 단어 사전에는 특정 프로그램의 소스 코드에서 사용되는 단어인 사용자 단어가 포함될 수 있다. 즉, 사용자 단어 사전에는 프로그램의 소스 코드에 포함되는 단어 중 공통 단어를 제외한 나머지 단어들이 포함될 수 있다.The user word dictionary may include user words, which are words used in the source code of a specific program. That is, the user word dictionary may include words other than common words among words included in the source code of the program.

예를 들어, 사용자 단어 사전에는 사용자 단어들이 프로그램 소스 코드에 나타난 순서대로 저장될 수 있다.For example, in the user word dictionary, user words may be stored in the order in which they appear in the program source code.

버그 정정부(130)는 시퀀스 적대적 생성 네트워크(Seq-GAN) 기반의 정정 모델에 기초하여 버그가 정정된 정정 코드를 생성할 수 있다. 시퀀스 적대적 생성 네트워크 기반의 정정 모델은 생성 모델 및 식별 모델을 포함할 수 있다.The bug correction unit 130 may generate a correction code in which a bug is corrected based on a correction model based on a sequence adversarial generation network (Seq-GAN). The correction model based on the sequence adversarial generation network may include a generation model and an identification model.

생성 모델은 예를 들어, 소스 코드를 생성하도록 학습된 RNN 기반의 모델일 수 있다. RNN 기반의 생성 모델에는, 프로그램 시퀀스 단어(token)가 입력되고 은닉 유닛 활성화(hidden unit activation)가 업데이트될 수 있다.The generative model may be, for example, an RNN-based model that has been trained to generate source code. In the RNN-based generative model, a program sequence word (token) may be input and hidden unit activation may be updated.

은닉 유닛의 활성화(h_t)는 예를 들어, 수학식 1과 같이 표현될 수 있다. 수학식 1에서, x_t는 RNN 모델의 입력이고, t는 시간일 수 있다.Activation of the hidden unit (h _t ) can be expressed as Equation 1, for example. In Equation 1, x _t is an input of the RNN model, and t may be time.

RNN 모델의 출력(z(h_t))은 예를 들어, 수학식 2와 같이 표현될 수 있다. 수학식 2에서, b는 RNN의 바이어스 벡터이고, W_vector는 가중치 벡터 매트릭스일 수 있다.The output (z(h _t )) of the RNN model can be expressed as Equation 2, for example. In Equation 2, b is a bias vector of the RNN, and W _vector may be a weight vector matrix.

식별 모델은 예를 들어, 생성 모델에 의해 생성된 소스 코드의 정상 여부를 식별하는 CNN 기반의 모델일 수 있다.The identification model may be, for example, a CNN-based model that identifies whether or not the source code generated by the generative model is normal.

도 1에 도시된 바와 같이, 버그 정정 장치(100)는 학습부(140)를 더 포함할 수 있다. 학습부(140)는 정정 모델을 학습시킬 수 있다. 학습부(140)는 강화 학습 정책에 기초하여 정정 모델에 포함되는 생성 모델 및 식별 모델을 학습시킬 수 있다.As shown in FIG. 1 , the bug correction device 100 may further include a learning unit 140 . The learning unit 140 may train a correction model. The learning unit 140 may train a generation model and an identification model included in a correction model based on a reinforcement learning policy.

학습부(140)는 RNN 기반의 생성 모델이 정상 또는 비정상의 소스 코드를 생성하도록 학습시킬 수 있다. 학습부(140)는 CNN 기반의 식별 모델이 생성 모델에 의해 생성된 정상 또는 비정상의 소스 코드의 정상 여부를 식별하도록 학습시킬 수 있다.The learning unit 140 may train the RNN-based generation model to generate normal or abnormal source codes. The learning unit 140 may train a CNN-based identification model to identify normal or abnormal source codes generated by the generative model.

학습부(140)는 오토 인코더를 학습시킬 수 있다. 예를 들어, 학습부(140)는 오토 인코더의 입력값과 출력값이 동일하도록 오토 인코더를 학습시킬 수 있다.The learning unit 140 may train the auto encoder. For example, the learner 140 may train the auto-encoder so that an input value and an output value of the auto-encoder are the same.

예를 들어, 버그 정정부(130)는 버기 라인의 코드를 정정 모델에 입력하여 정정 코드를 생성할 수 있다.For example, the bug correction unit 130 may generate a correction code by inputting a code of a buggy line into a correction model.

변환부(150)는 정정 코드에 기초하여 프로그램을 변환할 수 있다. 예를 들어, 변환부(150)는 버기 라인에 정정 코드를 적용하고, 변수명 복원 기술을 이용하여 프로그램을 변환할 수 있다.The conversion unit 150 may convert the program based on the correction code. For example, the conversion unit 150 may apply a correcting code to the buggy line and convert the program using a variable name recovery technique.

변환부(150)는 예를 들어, 공통 단어 사전 및 사용자 단어 사전을 참조하여 프로그램을 변환할 수 있다.For example, the conversion unit 150 may convert a program by referring to a common word dictionary and a user word dictionary.

테스트부(160)는 변환된 프로그램의 소스 코드에 대하여 적합성 테스트를 수행할 수 있다. 적합성 테스트는 버기 라인에 정정코드가 적용된 프로그램이 정상 작동하는지 여부를 판정하는 테스트일 수 있다.The test unit 160 may perform a conformance test on the source code of the converted program. The conformance test may be a test for determining whether a program to which a correction code is applied to a buggy line normally operates.

예를 들어, 테스트부(160)는 복수의 테스트 케이스에 기초하여 적합성 테스트를 수행할 수 있다. 테스트부(160)는 변환된 프로그램의 소스 코드가 복수의 테스트 케이스 전부를 통과하는 경우에 버그 정정이 적합한 것으로 판정할 수 있다.For example, the test unit 160 may perform a conformance test based on a plurality of test cases. The test unit 160 may determine that bug correction is appropriate when the source code of the converted program passes all of a plurality of test cases.

도 3은 종래의 버그 정정 방법과 본 발명의 일 실시예에 따른 버그 정정 방법의 결과를 비교한 도면이다.3 is a diagram comparing results of a conventional bug correction method and a bug correction method according to an embodiment of the present invention.

도 3은 종래의 방법 및 본 발명에 따른 방법에 의한 버그 정정의 성능이 도시한 그래프이다. 도 3을 참조하면, 종래의 버그 정정 방법인 DeepFix와 비교하여, 본 발명에 따른 방법에 의할 때 버그 정정의 성능이 향상된 것을 확인할 수 있다.3 is a graph showing performance of bug correction by the conventional method and the method according to the present invention. Referring to FIG. 3 , it can be seen that performance of bug correction is improved by the method according to the present invention compared to DeepFix, which is a conventional bug correction method.

도 4는 본 발명의 일 실시예에 따른 버그 정정 방법의 순서도이다. 도 4에 도시된 버그 정정 장치(100)에서 수행되는 버그를 정정하는 방법(400)은 도 1에 도시된 실시예에 따라 버그 정정 장치(100)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1에 도시된 실시예에 따라 버그 정정 장치(100)에서 수행되는 버그를 정정하는 방법에도 적용된다.4 is a flowchart of a bug correction method according to an embodiment of the present invention. The method 400 for correcting a bug performed by the bug correction apparatus 100 shown in FIG. 4 includes steps processed time-sequentially by the bug correction apparatus 100 according to the embodiment shown in FIG. 1 . Therefore, even if the content is omitted below, it is also applied to the method of correcting a bug performed by the bug correction apparatus 100 according to the embodiment shown in FIG. 1 .

단계 S410에서 버그 정정 장치(100)는 프로그램의 소스 코드를 입력받을 수 있다.In step S410, the bug correction device 100 may receive the source code of the program.

단계 S420에서 버그 정정 장치(100)는 오토 인코더 및 CNN에 기초하여 소스 코드 및 프로그램에 대한 검증 정보로부터 버그가 존재하는 버기 라인의 위치를 식별할 수 있다.In step S420, the bug correction apparatus 100 may identify the location of a buggy line where a bug exists from the source code and verification information about the program based on the auto-encoder and the CNN.

단계 S430에서 버그 정정 장치(100)는 버기 라인의 코드를 시퀀스 적대적 생성 네트워크 기반의 정정 모델에 입력하여 버그가 정정된 정정 코드를 생성할 수 있다.In step S430, the bug correction apparatus 100 may input the code of the buggy line into a sequence hostile generation network-based correction model to generate a correction code in which the bug is corrected.

상술한 설명에서, 단계 S410 내지 S430은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the foregoing description, steps S410 to S430 may be further divided into additional steps or combined into fewer steps, depending on the implementation of the present invention. Also, some steps may be omitted as needed, and the order of steps may be switched.

도 1 내지 도 4를 통해 설명된 버그 정정 장치에서 버그를 정정하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램 또는 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다.The method of correcting a bug in the bug correction apparatus described with reference to FIGS. 1 to 4 may be implemented in the form of a computer program stored in a medium executed by a computer or a recording medium including instructions executable by a computer.

컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer readable media may include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be construed as being included in the scope of the present invention. do.

100: 버그 정정 장치
110: 입력부
120: 버그 위치 식별부
130: 버그 정정부
140: 학습부
150: 변환부
160: 테스트부100: bug correction device
110: input unit
120: bug location identification unit
130: bug corrector
140: learning unit
150: conversion unit
160: test unit

Claims

In the bug correction device,
an input unit for receiving the source code of the program;
a bug location identification unit that identifies a location of a buggy line where a bug exists from the source code and verification information about the program based on an autoencoder and a convolution neural network (CNN); and
and a bug correction unit generating a correction code in which the bug is corrected by inputting the code of the buggy line into a correction model based on a sequence generative adversarial network.

According to claim 1,
a conversion unit that converts the program based on the correction code; and
A test unit that performs a conformance test on the source code of the converted program.
Further comprising a bug correction device.

According to claim 1,
The verification information includes a bug report and a stack trace for the program.

According to claim 1,
The bug location identification unit extracts one or more features from the source code and the verification information, inputs the extracted one or more features to an encoder of an auto encoder, and inputs a latent vector output from the encoder to the CNN to determine the CNN. identifying the location of the buggy line from the output of

According to claim 1,
The correction model includes a Recurrent Neural Network (RNN)-based generation model for generating source code and a CNN-based identification model for identifying whether or not the source code generated by the generation model is normal.

According to claim 5,
Further comprising a learning unit for learning the correction model,
The learning unit trains the identification model so that the generation model generates normal or abnormal source code, and the identification model identifies whether the normal or abnormal source code is normal,
and learning the auto-encoder so that an input value and an output value of the auto-encoder are the same.

According to claim 1,
Wherein the bug correcting unit generates a common word dictionary and a user word dictionary based on the source code of the program.

According to claim 2,
wherein the conversion unit applies the correction code to the buggy line and converts the program using a variable name recovery technique.

According to claim 2,
Wherein the test unit performs the conformance test based on a plurality of test cases, and determines that bug correction is suitable when the source code of the converted program passes all of the plurality of test cases.

In the bug correction method,
Receiving the source code of the program;
Identifying a location of a buggy line where a bug exists from the source code and verification information for the program based on an autoencoder and a convolution neural network (CNN); and
Generating a correction code in which the bug is corrected by inputting the code of the buggy line into a correction model based on a sequence adversarial network (Sequence Generative Adversarial Network).
Including, bug correction method.