JP2018018197A

JP2018018197A - Source code evaluation program

Info

Publication number: JP2018018197A
Application number: JP2016146192A
Authority: JP
Inventors: 啓一田端; Keiichi Tabata
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-07-26
Filing date: 2016-07-26
Publication date: 2018-02-01

Abstract

PROBLEM TO BE SOLVED: To provide an estimation program allowing a spot having a high possibility of containing a bug to be automatically evaluated.SOLUTION: A source code evaluation device 10 causes a computer to execute a difference detection part 11 for detecting a description which is a difference between a first source code and a second source code after modification of the first code and determining whether or not the difference is according to the modification of a bug based on a content of the detected difference; a generation part 12 for dividing the description used as the difference and generating a vector which arranges a numerical value defined according to the type of each phrase in order of each phrase in the description; a learning part 13 for learning a relationship with a determination result by the vector and the difference detection part; and a recognition part 15 for generating the vector relating to any description of a third source code and evaluating a possibility of the bug being contained in the description based on a learning result by the vector and the learning part.SELECTED DRAWING: Figure 2

Description

本発明は、ソースコード評価プログラムに関する。 The present invention relates to a source code evaluation program.

ソフトウェア開発において、バグの検出作業は、ソフトウェアの品質を確保するために重要な作業である。 In software development, bug detection is an important task for ensuring the quality of software.

一般的に、バグの検出は、開発者等によって作成されたテスト仕様書に基づくテストの実施により行われている。このような作業は、作業者に対する作業負担が大きく、効率性に難が有る。 In general, bug detection is performed by performing a test based on a test specification created by a developer or the like. Such work has a heavy work burden on the worker and has a difficulty in efficiency.

そこで、従来、バグの検出を支援するための技術が検討されている（例えば、非特許文献１、非特許文献２）。 Therefore, techniques for supporting bug detection have been studied (for example, Non-Patent Document 1 and Non-Patent Document 2).

Y. Higo, K. Murao, S. Kusumoto, K. Inoue, "Predicting Fault-Prone Modules Based on Metrics Transitions", DEFECTS '08.Y. Higo, K. Murao, S. Kusumoto, K. Inoue, "Predicting Fault-Prone Modules Based on Metrics Transitions", DEFECTS '08. S. Kim, T. Zimmermann, E.J. Whitehead, and A. Zeller, "Predicting faults from cached history," ICSE '07.S. Kim, T. Zimmermann, E.J.Whitehead, and A. Zeller, "Predicting faults from cached history," ICSE '07.

しかしながら、上記の従来技術では、バグを含んでいそうな箇所を、行単位又は文単位等の詳細な単位で自動的に推定するのが困難であった。 However, in the above-described conventional technology, it is difficult to automatically estimate a portion that may contain a bug in a detailed unit such as a line unit or a sentence unit.

本発明は、上記の点に鑑みてなされたものであって、ソースコードにおいてバグを含んでいる可能性の高い箇所を自動的に推定可能とすることを目的とする。 The present invention has been made in view of the above points, and an object of the present invention is to enable automatic estimation of a portion that is highly likely to contain a bug in a source code.

そこで上記課題を解決するため、ソースコード評価プログラムは、第１のソースコードと前記第１のソースコードの変更後の第２のソースコードとの差分となる記述を検出し、検出された差分の内容に基づいて、前記差分がバグの修正によるものであるか否かを判定する差分検出部と、前記差分となる記述を字句に分割し、前記各字句の種別に応じて定義された数値を前記記述における前記各字句の順番で配列したベクトルを生成する生成部と、前記ベクトルと前記差分検出部による判定結果との関係を学習する学習部と、第３のソースコードのいずれかの記述について前記ベクトルを生成し、当該ベクトルと前記学習部による学習結果とに基づいて、当該記述にバグが含まれている可能性を評価する評価部と、としてコンピュータを機能させる。 Therefore, in order to solve the above problem, the source code evaluation program detects a description that is a difference between the first source code and the second source code after the change of the first source code, and the detected difference is detected. Based on the content, a difference detection unit that determines whether or not the difference is due to a bug correction, and the description that becomes the difference is divided into lexical terms, and numerical values defined according to the types of the lexical terms Description of any one of a generation unit that generates a vector arranged in the order of each lexical word in the description, a learning unit that learns a relationship between the vector and a determination result by the difference detection unit, and a third source code The computer functions as an evaluation unit that generates the vector and evaluates the possibility that a bug is included in the description based on the vector and a learning result by the learning unit. That.

ソースコードにおいてバグを含んでいる可能性の高い箇所を自動的に推定可能とすることができる。 It is possible to automatically estimate a portion that is likely to contain a bug in the source code.

本発明の実施の形態におけるソースコード評価装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the source code evaluation apparatus in embodiment of this invention. 本発明の実施の形態におけるソースコード評価装置の機能構成例を示す図である。It is a figure which shows the function structural example of the source code evaluation apparatus in embodiment of this invention. 学習処理の処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence of a learning process. 差分の検出とバグ修正有無ラベルの付与とを説明するための図である。It is a figure for demonstrating the detection of a difference, and provision of a bug correction presence / absence label. 字句ベクトルの生成を説明するための図である。It is a figure for demonstrating the production | generation of a lexical vector. 字句ベクトルとバグ修正有無ラベルとの関係の学習を説明するための図である。It is a figure for demonstrating learning of the relationship between a lexical vector and a bug correction presence / absence label. ニューラルネットワークの一例を示す図である。It is a figure which shows an example of a neural network. ソースコードの評価処理の処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence of a source code evaluation process.

以下、図面に基づいて本発明の実施の形態を説明する。図１は、本発明の実施の形態におけるソースコード評価装置のハードウェア構成例を示す図である。図１のソースコード評価装置１０は、それぞれバスＢで相互に接続されているドライブ装置１００、補助記憶装置１０２、メモリ装置１０３、ＣＰＵ１０４、インタフェース装置１０５、表示装置１０６、及び入力装置１０７等を有する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a diagram illustrating a hardware configuration example of a source code evaluation apparatus according to an embodiment of the present invention. The source code evaluation device 10 in FIG. 1 includes a drive device 100, an auxiliary storage device 102, a memory device 103, a CPU 104, an interface device 105, a display device 106, an input device 107, etc., which are mutually connected by a bus B. .

ソースコード評価装置１０での処理を実現するプログラムは、ＣＤ−ＲＯＭ等の記録媒体１０１によって提供される。プログラムを記憶した記録媒体１０１がドライブ装置１００にセットされると、プログラムが記録媒体１０１からドライブ装置１００を介して補助記憶装置１０２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１０１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１０２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。 A program that realizes processing in the source code evaluation apparatus 10 is provided by a recording medium 101 such as a CD-ROM. When the recording medium 101 storing the program is set in the drive device 100, the program is installed from the recording medium 101 to the auxiliary storage device 102 via the drive device 100. However, the program need not be installed from the recording medium 101 and may be downloaded from another computer via a network. The auxiliary storage device 102 stores the installed program and also stores necessary files and data.

メモリ装置１０３は、プログラムの起動指示があった場合に、補助記憶装置１０２からプログラムを読み出して格納する。ＣＰＵ１０４は、メモリ装置１０３に格納されたプログラムに従ってソースコード評価装置１０に係る機能を実現する。インタフェース装置１０５は、ネットワークに接続するためのインタフェースとして用いられる。表示装置１０６はプログラムによるＧＵＩ（Graphical User Interface）等を表示する。入力装置１０７はキーボード及びマウス等で構成され、様々な操作指示を入力させるために用いられる。 The memory device 103 reads the program from the auxiliary storage device 102 and stores it when there is an instruction to start the program. The CPU 104 realizes functions related to the source code evaluation device 10 according to a program stored in the memory device 103. The interface device 105 is used as an interface for connecting to a network. The display device 106 displays a GUI (Graphical User Interface) or the like by a program. The input device 107 includes a keyboard and a mouse, and is used for inputting various operation instructions.

図２は、本発明の実施の形態におけるソースコード評価装置の機能構成例を示す図である。図２において、ソースコード評価装置１０は、差分検出部１１、字句ベクトル生成部１２、学習部１３、評価対象解析部１４、及び認識部１５等を有する。これら各部は、ソースコード評価装置１０にインストールされた１以上のプログラムが、ＣＰＵ１０４に実行させる処理により実現される。ソースコード評価装置１０は、また、ＶＣＳリポジトリ１２１、差分記憶部１２２、ラベル記憶部１２３、字句ＤＢ１２４、字句ベクトル記憶部１２５、及び学習情報記憶部１２６を利用する。これら各種記憶部は、例えば、メモリ装置１０３、補助記憶装置１０２、又はソースコード評価装置１０にネットワークを介して接続可能な記憶装置等を用いて実現可能である。 FIG. 2 is a diagram illustrating a functional configuration example of the source code evaluation apparatus according to the embodiment of the present invention. 2, the source code evaluation device 10 includes a difference detection unit 11, a lexical vector generation unit 12, a learning unit 13, an evaluation target analysis unit 14, a recognition unit 15, and the like. Each of these units is realized by a process that the CPU 104 causes one or more programs installed in the source code evaluation apparatus 10 to execute. The source code evaluation device 10 also uses a VCS repository 121, a difference storage unit 122, a label storage unit 123, a lexical DB 124, a lexical vector storage unit 125, and a learning information storage unit 126. These various storage units can be realized using, for example, a storage device that can be connected to the memory device 103, the auxiliary storage device 102, or the source code evaluation device 10 via a network.

ＶＣＳリポジトリ１２１は、非図示のバージョン管理システム（ＶＣＳ（Version Control System））のリポジトリである。例えば、ＶＣＳリポジトリ１２１には、ソースコードのファイルとその変更履歴やリビジョン番号等が記憶されている。 The VCS repository 121 is a repository of a version control system (VCS (Version Control System)) (not shown). For example, the VCS repository 121 stores a source code file, its change history, revision number, and the like.

差分検出部１１は、ＶＣＳリポジトリ１２１に記憶されている情報に基づいて、ソースコードの変更前後（改版前後）の差分を検出する。検出された差分に関する情報は、差分記憶部１２２に記憶される。差分検出部１１は、また、差分ごとに、当該差分の内容に基づいて、当該差分がバグの修正によるものであるか否かを判定し、判定結果を示すラベル（以下、「バグ修正有無ラベル」という。）を各差分に関連付けてラベル記憶部１２３に記憶する。 Based on the information stored in the VCS repository 121, the difference detection unit 11 detects the difference before and after the source code change (before and after the revision). Information regarding the detected difference is stored in the difference storage unit 122. The difference detection unit 11 also determines, for each difference, whether or not the difference is due to bug correction based on the content of the difference, and a label indicating the determination result (hereinafter, “bug correction presence / absence label”). Is stored in the label storage unit 123 in association with each difference.

字句ベクトル生成部１２は、差分記憶部１２２に記憶された各差分について、当該差分となる記述（ソースコード上の記述）を字句（トークン）に分割する。字句ベクトル生成部１２は、各字句の種別に応じて定義された数値を、差分となる記述における各字句の順番で配列したベクトルを生成する。以下、当該ベクトルを「字句ベクトル」という。生成された字句ベクトルは、字句ベクトル記憶部１２５に記憶される。なお、字句の種別に応じて定義された数値は、字句ＤＢ１２４に記憶されている。 For each difference stored in the difference storage unit 122, the lexical vector generation unit 12 divides a description (description on the source code) that becomes the difference into a lexical (token). The lexical vector generation unit 12 generates a vector in which numerical values defined according to the type of each lexical are arranged in the order of each lexical in the description that is the difference. Hereinafter, this vector is referred to as a “lexical vector”. The generated lexical vector is stored in the lexical vector storage unit 125. The numerical values defined according to the lexical type are stored in the lexical DB 124.

学習部１３は、各字句ベクトルと、当該字句ベクトルの生成元の差分に関連付けられているバグ修正有無ラベルとの関係を学習し、学習結果を学習情報記憶部１２６に記憶する。 The learning unit 13 learns the relationship between each lexical vector and the bug correction presence / absence label associated with the difference between the lexical vector generation sources, and stores the learning result in the learning information storage unit 126.

評価対象解析部１４は、バグの有無の評価対象とされたソースコードの或る箇所（或る行、或る文、又は或る式等）について、字句ベクトルを生成する。 The evaluation object analysis unit 14 generates a lexical vector for a certain part (a certain line, a certain sentence, a certain expression, or the like) of the source code that is an object to be evaluated for bugs.

認識部１５は、評価対象解析部１４によって生成された字句ベクトルを、学習情報記憶部１２６に記憶されている学習結果に適用して、当該字句ベクトルに対応する箇所にバグが含まれている可能性を評価する。 The recognition unit 15 may apply the lexical vector generated by the evaluation target analysis unit 14 to the learning result stored in the learning information storage unit 126, and a bug may be included at a location corresponding to the lexical vector. Assess sex.

以下、ソースコード評価装置１０が実行する処理手順について説明する。図３は、学習処理の処理手順の一例を説明するためのフローチャートである。図３の処理手順は、例えば、ユーザによって学習実施の指示が入力されると開始される。 Hereinafter, a processing procedure executed by the source code evaluation device 10 will be described. FIG. 3 is a flowchart for explaining an example of the processing procedure of the learning process. The processing procedure in FIG. 3 is started, for example, when a learning execution instruction is input by the user.

ステップＳ１０１において、差分検出部１１は、或るプログラムの全てのソースコードについて、ＶＣＳリポジトリ１２１に記憶されている、改版前後の２つのリビジョンごとに差分を検出（抽出）し、検出された差分を差分記憶部１２２に記憶する。なお、差分は、例えば、ｄｉｆｆの出力形式で生成されてもよい。 In step S101, the difference detection unit 11 detects (extracts) a difference for every two revisions before and after the revision stored in the VCS repository 121 for all source codes of a program, and detects the detected difference. Store in the difference storage unit 122. The difference may be generated in, for example, a diff output format.

続いて、差分検出部１１は、検出された差分ごとに、当該差分の内容に基づいて、当該差分がバグ修正によるものであるか否かを判定し、判定結果を示すバグ修正有無ラベルを当該差分に付与する（Ｓ１０２）。付与されたバグ修正有無ラベルは、各差分に関連付けられてラベル記憶部１２３に記憶される。差分とバグ修正有無ラベルとの関連付けは、例えば、差分記憶部１２２に記憶された各差分の識別情報（以下、「差分ＩＤ」という。）と、修正有無ラベルとの関連付けによって実現されてもよい。 Subsequently, for each detected difference, the difference detection unit 11 determines whether the difference is due to bug correction based on the content of the difference, and displays a bug correction presence / absence label indicating the determination result. The difference is given (S102). The given bug correction presence / absence label is stored in the label storage unit 123 in association with each difference. The association between the difference and the bug correction presence / absence label may be realized, for example, by associating each difference identification information (hereinafter referred to as “difference ID”) stored in the difference storage unit 122 with a correction presence / absence label. .

図４は、差分の検出とバグ修正有無ラベルの付与とを説明するための図である。図４には、リビジョン１〜リビジョン６の変更履歴を有する或るソースコードについて、前後の２つのリビジョンごと（すなわち、変更前のリビジョンＮ及び変更後のリビジョンＮ＋１ごと）に差分が検出される例が示されている。具体的には、リビジョン１とリビジョン２との差分として、差分ｄ１が検出され、リビジョン２とリビジョン３との差分として、差分ｄ２が検出され、リビジョン３とリビジョン４との差分として、差分ｄ３が検出されている。 FIG. 4 is a diagram for explaining the difference detection and the bug correction presence / absence label assignment. FIG. 4 shows an example in which a difference is detected for every two previous and subsequent revisions (that is, every revision N before change and revision N + 1 after change) for a certain source code having a change history of revision 1 to revision 6. It is shown. Specifically, the difference d1 is detected as the difference between revision 1 and revision 2, the difference d2 is detected as the difference between revision 2 and revision 3, and the difference d3 is calculated as the difference between revision 3 and revision 4. Has been detected.

差分の内容は、追加された記述、削除された記述である。図４では、追加された記述に対しては「＋」が付与され、削除された記述に対しては「−」が付与されている。なお、差分は、必ずしも行単位でなくてもよい。１文又は１つの式が複数行に跨る場合が有るからである。例えば、改行コードが検出されるまでの範囲を１つの単位として、当該単位ごとに差分が検出されてもよい。 The contents of the difference are added descriptions and deleted descriptions. In FIG. 4, “+” is assigned to the added description, and “−” is assigned to the deleted description. Note that the difference does not necessarily have to be in units of rows. This is because one sentence or one expression may span a plurality of lines. For example, assuming that a range until a line feed code is detected as one unit, a difference may be detected for each unit.

また、図４では、各差分に対するバグ修正有無ラベルＬ１〜Ｌ３が示されている。バグ修正有無ラベルの値は、「バグ修正である」又は「バグ修正でない」のいずれかである。本実施の形態では、相互に対応する箇所について、減った記述と増えた記述の双方が存在する差分（すなわち、或る記述が他の記述に置き換わった箇所）について、「バグ修正である」と判定され、そうでない差分について、「バグ修正でない」と判定される。他の記述に置き換えられた箇所は、経験的にバグの修正である可能性が高いからである。 FIG. 4 also shows bug correction presence / absence labels L1 to L3 for each difference. The value of the bug correction presence / absence label is either “bug correction” or “not bug correction”. In the present embodiment, “difference in bugs” is indicated for a difference in which both the reduced description and the increased description exist (that is, a place where a certain description is replaced with another description) at locations corresponding to each other. It is determined, and the difference that is not so is determined as “not a bug correction”. This is because a place replaced with another description is likely to be a bug correction empirically.

続いて、字句ベクトル生成部１２は、差分記憶部１２２に記憶された差分ごとに、当該差分の内容（ソースコードの記述）について字句解析を行って、当該記述を字句（トークン）に分割する（Ｓ１０３）。なお、「バグ修正である」のバグ修正有無ラベルが付与された差分については、削除された記述と追加された記述とが含まれている。この場合、削除された記述が各字句に分割される。 Subsequently, for each difference stored in the difference storage unit 122, the lexical vector generation unit 12 performs lexical analysis on the content of the difference (description of the source code), and divides the description into lexical characters (tokens) ( S103). Note that the difference to which the bug correction presence / absence label “is a bug correction” is added includes a deleted description and an added description. In this case, the deleted description is divided into each phrase.

続いて、字句ベクトル生成部１２は、各字句について、当該字句の種別（識別子、変数の基本型、制御構造を表すキーワード、括弧など）に対応付けられて字句ＤＢ１２４に記憶されている数値を取得する（Ｓ１０４）。続いて、字句ベクトル生成部１２は、取得された数値を、各字句の並び順に配列することで、字句ベクトルを生成し、当該字句ベクトルを字句ベクトル記憶部１２５に記憶する（Ｓ１０５）。各字句ベクトルは、生成元の差分の差分ＩＤに関連付けられて字句ベクトル記憶部１２５に記憶される。 Subsequently, the lexical vector generation unit 12 obtains a numerical value stored in the lexical DB 124 in association with the type of the lexical phrase (identifier, basic type of variable, keyword indicating control structure, parentheses, etc.) for each lexical phrase. (S104). Subsequently, the lexical vector generation unit 12 generates a lexical vector by arranging the acquired numerical values in the order in which each lexical is arranged, and stores the lexical vector in the lexical vector storage unit 125 (S105). Each lexical vector is stored in the lexical vector storage unit 125 in association with the difference ID of the source difference.

図５は、字句ベクトルの生成を説明するための図である。図５では、差分ｄ２のうちの削除された記述が字句に分割され、各字句の種別（条件文、識別子）に対応した数値の配列が、字句ベクトルｖ２として生成される例が示されている。すなわち、字句ベクトルは、削除された記述について生成される。或る記述が他の記述に置き換えられた場合、当該或る記述（すなわち、削除された記述）にバグが含まれていた可能性が高いからである。 FIG. 5 is a diagram for explaining generation of a lexical vector. FIG. 5 shows an example in which the deleted description in the difference d2 is divided into lexical terms, and an array of numerical values corresponding to each lexical type (conditional sentence, identifier) is generated as the lexical vector v2. . That is, a lexical vector is generated for the deleted description. This is because when a description is replaced with another description, there is a high possibility that a bug is included in the certain description (that is, the deleted description).

Ｎ個の字句から成る差分については、Ｎ次元の字句ベクトルが生成される。また、型や識別子等と、制御構造を表すキーワードや括弧等とで、数値には大きな差がつけられる。すなわち、ソースコードにおける関連性が相対的に高い各種別に対応する数値の違いが相対的に小さくなり、ソースコードにおける関連性が相対的に低い各種別に対応する数値の違いが相対的に大きくなるように種別に応じた数値が字句ＤＢ１２４に定義されている。そうすることで、記述ごとのパターンの違いを顕著なものとすることができ、ソースコードの意味合いを定義付けることができる。 For a difference consisting of N lexical terms, an N-dimensional lexical vector is generated. In addition, there is a large difference in numerical values between a type, an identifier, and the like, and a keyword or parenthesis representing a control structure. That is, the difference in numerical values corresponding to each type having relatively high relevance in the source code is relatively small, and the numerical value corresponding to each type having relatively low relevance in the source code is relatively large. A numerical value corresponding to the type is defined in the lexical DB 124. By doing so, the difference in the pattern for each description can be made remarkable, and the meaning of the source code can be defined.

具体的には、図５では、制御構造を表す「ｉｆ」の数値は、２００であり、他の字句の数値と大きく異なっている。また、「（」と「）」とは、対応関係を有するため、それぞれに対応する数値は１００、１０１であり、その差分は小さい。 Specifically, in FIG. 5, the numerical value of “if” representing the control structure is 200, which is greatly different from the numerical values of other lexical phrases. Also, since “(” and “)” have a correspondence relationship, the numerical values corresponding to them are 100 and 101, respectively, and the difference between them is small.

続いて、学習部１３は、差分ごとに、字句ベクトル及びバグ修正有無ラベルの組を、学習アルゴリズムに入力し、字句ベクトルとバグ修正有無ラベルとの関係を学習する（Ｓ１０６）。すなわち、ソースコードの記述のパターンと、バグの有無との関係が学習される。学習部１３は、学習結果を学習情報記憶部１２６に記憶する。なお、学習は、例えば、教師あり二値分類器を利用して行われる。 Subsequently, the learning unit 13 inputs a set of the lexical vector and the bug correction presence / absence label for each difference to the learning algorithm, and learns the relationship between the lexical vector and the bug correction presence / absence label (S106). That is, the relationship between the source code description pattern and the presence or absence of bugs is learned. The learning unit 13 stores the learning result in the learning information storage unit 126. Note that learning is performed using a supervised binary classifier, for example.

図６は、字句ベクトルとバグ修正有無ラベルとの関係の学習を説明するための図である。図６では、差分ｄ２について生成された字句ベクトルｖ２と、差分ｄ２に対して付与されたバグ修正有無ラベルＬ２との関係が学習されて、学習結果が学習情報記憶部１２６に記憶される例が示されている。 FIG. 6 is a diagram for explaining learning of the relationship between the lexical vector and the bug correction presence / absence label. FIG. 6 shows an example in which the relationship between the lexical vector v2 generated for the difference d2 and the bug correction presence / absence label L2 given to the difference d2 is learned, and the learning result is stored in the learning information storage unit 126. It is shown.

例えば、教師あり二値分類器においてニューラルネットワークが用いられる場合、図７に示されるようなニューラルネットワークを規定するパラメータ（係数）が、学習情報記憶部１２６に記憶される。なお、ニューラルネットワークについては、例えば、「Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (8 October 1986). "Learning representations by back-propagating errors". Nature 323 (6088): 533-536.」に詳しい。 For example, when a neural network is used in a supervised binary classifier, parameters (coefficients) that define the neural network as shown in FIG. 7 are stored in the learning information storage unit 126. Regarding neural networks, for example, “Rumelhart, David E .; Hinton, Geoffrey E .; Williams, Ronald J. (8 October 1986).“ Learning representations by back-propagating errors ”. Nature 323 (6088): 533 -536. "

なお、学習部１３は、字句ベクトルの次元数が、二値分類器の入力次元数（図７では６４字句）に足りないときは、字句ベクトルを中央に寄せて左右をゼロで埋めることで、字句ベクトルの次元数を拡張する。また、学習部１３は、字句ベクトルの次元数が、二値分類器の入力次元数よりも多いときは、字句ベクトルの中央に配置されている数値から前後方向に入力次元数分の字句を抽出することで、字句ベクトルの次元数を縮小する。すなわち、ソースコードの各行の長さにばらつきがあるところ、ソースコードとしての特徴は一部であっても捉えることができる。また、中央寄せにより、重要度の高い情報を中央に寄せることができるため、主要な特徴を反映させることで、精度の高い学習を行うことができる。 In addition, when the number of dimensions of the lexical vector is less than the number of input dimensions of the binary classifier (64 lexicons in FIG. 7), the learning unit 13 moves the lexical vector to the center and fills the left and right with zeros. Extend the number of dimensions of the lexical vector. In addition, when the number of dimensions of the lexical vector is larger than the number of input dimensions of the binary classifier, the learning unit 13 extracts the lexical equivalent of the number of input dimensions in the front-rear direction from the numerical value arranged at the center of the lexical vector. By doing so, the number of dimensions of the lexical vector is reduced. In other words, where the length of each line of the source code varies, it is possible to grasp even a part of the characteristics as the source code. In addition, since centered information can bring highly important information to the center, it is possible to perform highly accurate learning by reflecting the main features.

次に、学習情報記憶部１２６に記憶された学習結果に基づいて、或るソースコードについて、バグの有無の可能性を評価する処理について説明する。 Next, processing for evaluating the possibility of bugs in a certain source code based on the learning result stored in the learning information storage unit 126 will be described.

図８は、ソースコードの評価処理の処理手順の一例を説明するためのフローチャートである。例えば、ユーザによって評価対象のソースコードが指定されて、評価の開始指示が入力されると、図８の処理が開始される。 FIG. 8 is a flowchart for explaining an example of the processing procedure of the source code evaluation processing. For example, when the source code to be evaluated is designated by the user and an evaluation start instruction is input, the process of FIG. 8 is started.

ステップＳ２０１において、評価対象解析部１４は、評価対象のソースコードから所定の単位の記述（以下、「対象記述」という。）を入力する。所定の単位は、例えば、改行コードによって区分される単位である。 In step S201, the evaluation object analysis unit 14 inputs a description of a predetermined unit (hereinafter referred to as “object description”) from the source code to be evaluated. The predetermined unit is, for example, a unit divided by a line feed code.

続いて、評価対象解析部１４は、対象記述について字句ベクトルを生成する（Ｓ２０２）。字句ベクトルの生成方法は、上記した通りである。 Subsequently, the evaluation target analysis unit 14 generates a lexical vector for the target description (S202). The generation method of the lexical vector is as described above.

続いて、認識部１５は、生成された字句ベクトルに対して、学習情報記憶部１２６の学習結果を適用して、対象記述がバグを含んでいる可能性（バグの潜在可能性）を評価する（Ｓ２０３）。例えば、図７に示されるようなニューラルネットワークに対して、対象記述が入力されてもよい。この場合、認識部１５は、字句ベクトルの次元数が、二値分類器の入力次元数（図７では６４字句）に足りないときは、字句ベクトルを中央に寄せて左右をゼロで埋めることで、字句ベクトルの次元数を拡張する。また、認識部１５は、字句ベクトルの次元数が、二値分類器の入力次元数よりも多いときは、字句ベクトルの中央を取り出すことで、字句ベクトルの次元数を縮小する。 Subsequently, the recognition unit 15 applies the learning result of the learning information storage unit 126 to the generated lexical vector, and evaluates the possibility that the target description includes a bug (potential of bug). (S203). For example, an object description may be input to a neural network as shown in FIG. In this case, when the number of dimensions of the lexical vector is less than the number of input dimensions of the binary classifier (64 lexicons in FIG. 7), the recognition unit 15 puts the lexical vector in the center and fills the left and right with zeros. Extend the number of dimensions of the lexical vector. Further, when the number of dimensions of the lexical vector is larger than the number of input dimensions of the binary classifier, the recognition unit 15 reduces the number of dimensions of the lexical vector by extracting the center of the lexical vector.

なお、評価対象のソースコードが新規のソースコードである場合、図８の処理手順は、当該ソースコードの各記述に対して最初から順番に実行されてもよい。一方、既存のソースコードに対して修正が行われた場合、修正された一部の記述に対して図８の処理が実行されてもよい。修正された記述は、ユーザによって指定されればよい。 When the source code to be evaluated is a new source code, the processing procedure in FIG. 8 may be executed in order from the beginning for each description of the source code. On the other hand, when the existing source code is modified, the process of FIG. 8 may be executed for a part of the modified description. The corrected description may be specified by the user.

上述したように、本実施の形態によれば、プログラムの製造開始以降に、ソースコードにおいてバグを含んでいる可能性の高い箇所を自動的に推定可能とすることができる。その結果、例えば、大規模ソフトウェアの開発においては、開発期間短縮等を可能とし、生産性の向上を期待することができる。 As described above, according to the present embodiment, it is possible to automatically estimate a portion that is likely to contain a bug in the source code after the start of program manufacture. As a result, for example, in the development of large-scale software, the development period can be shortened and an improvement in productivity can be expected.

なお、本実施の形態において、字句ベクトル生成部１２は、生成部の一例である。認識部１５は、評価部の一例である。 In the present embodiment, the lexical vector generation unit 12 is an example of a generation unit. The recognition unit 15 is an example of an evaluation unit.

以上、本発明の実施例について詳述したが、本発明は斯かる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 As mentioned above, although the Example of this invention was explained in full detail, this invention is not limited to such specific embodiment, In the range of the summary of this invention described in the claim, various deformation | transformation・ Change is possible.

１０ソースコード評価装置
１１差分検出部
１２字句ベクトル生成部
１３学習部
１４評価対象解析部
１５認識部
１００ドライブ装置
１０１記録媒体
１０２補助記憶装置
１０３メモリ装置
１０４ＣＰＵ
１０５インタフェース装置
１２１ＶＣＳリポジトリ
１２２差分記憶部
１２３ラベル記憶部
１２４字句ＤＢ
１２５字句ベクトル記憶部
１２６学習情報記憶部
Ｂバス DESCRIPTION OF SYMBOLS 10 Source code evaluation apparatus 11 Difference detection part 12 Lexical vector generation part 13 Learning part 14 Evaluation object analysis part 15 Recognition part 100 Drive apparatus 101 Recording medium 102 Auxiliary storage apparatus 103 Memory apparatus 104 CPU
105 Interface device 121 VCS repository 122 Difference storage unit 123 Label storage unit 124 Lexical DB
125 Lexical vector storage unit 126 Learning information storage unit B bus

Claims

A description which is a difference between the first source code and the second source code after the change of the first source code is detected, and the difference is based on a bug correction based on the content of the detected difference. A difference detection unit for determining whether or not there is,
A generation unit that divides the description to be the difference into words and generates a vector in which numerical values defined according to the types of the words are arranged in the order of the words in the description;
A learning unit for learning a relationship between the vector and the determination result by the difference detection unit;
An evaluation unit that generates the vector for any description of the third source code, and evaluates a possibility that the description includes a bug based on the vector and a learning result by the learning unit;
A source code evaluation program characterized by causing a computer to function as:

The difference in numerical values corresponding to each of the above-mentioned various types having relatively high relevance in the source code is relatively small, and the difference in numerical values corresponding to each of the various types having relatively low relevance in the source code is relatively large A numerical value corresponding to the type is defined in
The source code evaluation program according to claim 1, wherein:

The learning unit extracts a predetermined number of numerical values in the front-rear direction from the numerical values arranged in the center of the vector, and learns the relationship between the extracted numerical vector and the determination result by the difference detection unit,
3. The source code evaluation program according to claim 1 or 2,