JP4976328B2

JP4976328B2 - Author determination apparatus, control method thereof, and program

Info

Publication number: JP4976328B2
Application number: JP2008096072A
Authority: JP
Inventors: 大亮西川
Original assignee: NS Solutions Corp
Current assignee: NS Solutions Corp
Priority date: 2008-04-02
Filing date: 2008-04-02
Publication date: 2012-07-18
Anticipated expiration: 2028-04-02
Also published as: JP2009251727A

Abstract

<P>PROBLEM TO BE SOLVED: To specify the writer of data, based on writers and document data themselves. <P>SOLUTION: A writer ID imparting part 103 imparts wrier identification data by each kind of document data by prescribed unit to be described in the data. A writer ID rate calculating part 104 calculates the ratio of the identification information of the writers imparted to the data. A file writer determining part 105 determines the writer of the data, based on the ratio of calculated rate of the identification information. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、ドキュメントデータが記述されるデータの著者を決定するための技術に関するものである。 The present invention relates to a technique for determining an author of data in which document data is described.

近年、コンピュータシステムの開発において、大量のソースコードを多人数で記述することが行われている。そして、大量のソースコードを記述するような大規模開発を行う場合、プログラマを派遣会社等から受け入れることが一般的に行われている。 In recent years, a large amount of source code has been described by a large number of people in the development of computer systems. When large-scale development is performed such that a large amount of source code is written, it is common practice to accept a programmer from a dispatch company or the like.

しかし、派遣プログラマについては、派遣会社に派遣を依頼するにあたって必要なスキルを注文するものの、必ずしも望むスキルを持っているとは限らなかった。プログラムの品質を管理する意味で、プログラマの評価を行うことが試みられているが、そもそもプログラム等のドキュメントデータの著者の評価を行うには、先ず評価するドキュメントデータの著者を特定する必要がある。 However, for dispatch programmers, although they ordered the skills required to request dispatch from a dispatch company, they did not always have the skills they wanted. Attempts have been made to evaluate programmers in the sense of managing the quality of programs, but in order to evaluate the authors of document data such as programs, it is necessary to first identify the author of the document data to be evaluated. .

例えば一つのファイル内のドキュメントデータの著者が単独である場合には何ら問題はないが、ソースコード等、一つのファイル内のドキュメントを多人数で記述するような場合には当該ファイルの著者を特定することが困難な場合があるという問題があった。 For example, there is no problem if the author of the document data in one file is alone, but if the document in one file is described by many people, such as source code, the author of the file is specified. There was a problem that it might be difficult to do.

例えば、同一ファイルを複数のメンバが編集した場合に、最後にドキュメントデータを記述した人（最終コミッタ）を著者としてしまうと、メンバの実際の貢献、即ち最も多く記述した人が著者にならないことがある。 For example, when the same file is edited by multiple members, if the last person who wrote the document data (final committer) is the author, the actual contribution of the member, that is, the person who wrote the largest number may not be the author. is there.

リポジトリのフォルダ整理のためにファイルを移動した場合、当該ファイルを記述していないのにも関わらず、その移動を行った作業者が著者となってしまう。 When a file is moved for organizing a repository folder, the worker who moved the file becomes the author even though the file is not described.

ソースコード一行毎に著者を記録し、評価を集計して各著者の評価を行うことも考えられるが、現実的にはソースコードの評価を一行毎に行うのは困難である。 Although it is possible to record the author for each line of source code, and evaluate each author by collecting the evaluations, it is actually difficult to evaluate the source code line by line.

ソースコードの変更を行った場合の変更履歴を残すものとして、以下の技術が開示されている。特許文献１には、原始プログラムの改版履歴の作成を自動的に行うもので、編集操作や修正箇所、修正内容、修正日時等が記録されることが開示されている。 The following technique is disclosed as a change history when the source code is changed. Patent Document 1 discloses that a revision history of a source program is automatically created, and an editing operation, a correction location, correction content, a correction date and time, etc. are recorded.

特開平９−１２８２２７号公報JP-A-9-128227

しかしながら、特許文献１に開示される発明では、修正内容については記録されるものの誰が記述したのか、誰が修正したのか等の著者に関しては考慮されていない。 However, in the invention disclosed in Patent Document 1, although the contents of correction are recorded, no consideration is given to the author such as who wrote it and who modified it.

そこで、リポジトリに記録されている著者とドキュメントデータそのものから、ファイル毎の著者を特定する手法が必要とされている。 Therefore, there is a need for a method for identifying the author for each file from the author recorded in the repository and the document data itself.

本発明の目的は、著者とドキュメントデータそのものからデータの著者を特定することにある。 An object of the present invention is to specify the author of data from the author and the document data itself.

本発明の著作決定装置は、データ内に記述される所定の単位のドキュメントデータ毎に著者の識別情報を付与する付与手段と、前記データに付与された著者の識別情報の割合を算出する算出手段と、前記算出手段により算出された識別情報の割合に基づいて、前記データの著者を決定する決定手段とを有することを特徴とする。
本発明の著者決定装置の制御方法は、付与手段と、算出手段と、決定手段とを有する著者決定装置の制御方法であって、前記付与手段が、データ内に記述される所定の単位のドキュメントデータ毎に著者の識別情報を付与する付与ステップと、前記算出手段が、前記データに付与された著者の識別情報の割合を算出する算出ステップと、前記決定手段が、前記算出ステップにより算出された識別情報の割合に基づいて、前記データの著者を決定する決定ステップとを含むことを特徴とする。
本発明のプログラムは、コンピュータを、データ内に記述される所定の単位のドキュメントデータ毎に著者の識別情報を付与する付与手段と、前記データに付与された著者の識別情報の割合を算出する算出手段と、前記算出手段により算出された識別情報の割合に基づいて、前記データの著者を決定する決定手段として機能させることを特徴とする。 The copyright determination apparatus according to the present invention includes a granting unit that grants author identification information to each predetermined unit of document data described in the data, and a calculation unit that calculates a ratio of the author identification information given to the data. And determining means for determining the author of the data based on the ratio of the identification information calculated by the calculating means.
The author determination apparatus control method of the present invention is an author determination apparatus control method comprising an adding means, a calculating means, and a determining means , wherein the assigning means is a document of a predetermined unit described in data. An assigning step of assigning author identification information for each data, a calculating step in which the calculating means calculates a ratio of author identifying information assigned to the data, and the determining means are calculated in the calculating step. And a determination step of determining an author of the data based on a ratio of identification information.
Program of the present invention, calculating computer, which calculates the assigning means for assigning identification information of the author for each predetermined unit of document data described in the data, the percentage of identity of the author given to the data and means, based on the percentage of the identification information calculated by the calculating means, characterized in that to function as a determining means for determining the author of the data.

本発明においては、データ内に記述される所定の単位のドキュメントデータ毎に著者の識別情報を付与し、付与された著者の識別情報の割合に基づいて、当該データの著者を決定するように構成している。従って、本発明によれば、著者とドキュメントデータそのものからデータの著者を特定することが可能となる。 In the present invention, the identification information of the author is assigned to each predetermined unit of document data described in the data, and the author of the data is determined based on the ratio of the identification information of the given author. is doing. Therefore, according to the present invention, it is possible to specify the author of the data from the author and the document data itself.

以下、本発明を適用した好適な実施形態を、添付図面を参照しながら詳細に説明する。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments to which the invention is applied will be described in detail with reference to the accompanying drawings.

図１は、本発明の実施形態に係るファイル著者決定システムの機能的な構成を示す図である。図１に示すように、ファイル著者決定システムは、ファイル著者決定装置１００とファイル管理装置（リポジトリ）３００とが備えられ、互いにＬＡＮ等の通信回線を介して接続される。 FIG. 1 is a diagram showing a functional configuration of a file author determination system according to an embodiment of the present invention. As shown in FIG. 1, the file author determination system includes a file author determination device 100 and a file management device (repository) 300, which are connected to each other via a communication line such as a LAN.

ファイル著者決定装置１００は、その機能構成として、親ファイル判定部１０１、更新部分判定部１０２、著者ＩＤ付与部１０３、著者ＩＤ割合算出部１０４、ファイル著者決定部１０５及び評価部１０６を備える。 The file author determination apparatus 100 includes a parent file determination unit 101, an update part determination unit 102, an author ID assignment unit 103, an author ID ratio calculation unit 104, a file author determination unit 105, and an evaluation unit 106 as functional configurations.

親ファイル判定部１０１は、ファイルの更新又は新規登録があった場合、当該ファイルを生成する元となった親ファイルを判定する。 When there is a file update or new registration, the parent file determination unit 101 determines the parent file from which the file is generated.

更新部分判定部１０２は、親ファイルに記述されるソースコードと、当該親ファイルを基に生成されたファイル（子ファイル）に記述されるソースコードとの相違（更新部分）を判定する。 The update part determination unit 102 determines a difference (update part) between the source code described in the parent file and the source code described in a file (child file) generated based on the parent file.

著者ＩＤ付与部１０３は、親ファイルと子ファイルとの間で相違があると判定された場合、子ファイルにおける親ファイルと相違する部分のソースコードに、後述するファイル管理装置３００における管理内容に基づいて著者のＩＤを付与する。一方、子ファイルにおける親ファイルと同一の部分のソースコードには、親ファイルの該当する部分のソースコードに付与されている著者のＩＤと同一のＩＤを付与する。なお、親ファイルが存在しない子ファイルについては、全ソースコードに対してファイル管理装置３００における管理内容に基づいて著者のＩＤを付与される。 When it is determined that there is a difference between the parent file and the child file, the author ID assigning unit 103 uses the source code of the portion different from the parent file in the child file based on the management content in the file management apparatus 300 described later. Give the author's ID. On the other hand, the same ID as the author ID assigned to the source code of the corresponding part of the parent file is assigned to the source code of the same part as the parent file in the child file. For child files that do not have a parent file, the author's ID is assigned to all source codes based on the management contents in the file management apparatus 300.

著者ＩＤ割合算出部１０４は、ファイル毎に、付与された著者のＩＤの割合を算出する。ファイル著者決定部１０５は、著者ＩＤ割合算出部１０４により算出された割合のうち、最も高い割合のＩＤの著者を該当するファイルの著者として決定する。また、子ファイルにおける親ファイルと相違する部分毎に重要性を判断し、判断した重要性に対応する重み付けを著者のＩＤに対して行った上で、著者のＩＤに付された重み付けの値を集計して著者を決定してもよい。例えば、フォーマットのための空行やコメント等、ソースコードに記述された処理の実行に影響しない部分を除外或いは重み付けを軽くすること等が挙げられる。 The author ID ratio calculation unit 104 calculates the ratio of the assigned author ID for each file. The file author determination unit 105 determines the author with the highest ID among the ratios calculated by the author ID ratio calculation unit 104 as the author of the corresponding file. Also, the importance is determined for each part of the child file that is different from the parent file, the weight corresponding to the determined importance is given to the author ID, and the weighting value assigned to the author ID is set. The author may be determined by counting. For example, a part that does not affect the execution of the processing described in the source code, such as a blank line for formatting or a comment, may be excluded or weighted.

評価部１０６は、所定の評価基準により各ファイルを評価し、それらの評価値をファイル著者決定部１０５によって決定された各ファイルの著者の評価値とする。 The evaluation unit 106 evaluates each file according to a predetermined evaluation criterion, and sets the evaluation value as the evaluation value of the author of each file determined by the file author determination unit 105.

ファイル管理装置３００は、著者によってソースコード（ドキュメントデータ）が記述された複数のファイル自体３０２を格納するとともに、それらのファイルが新規（追加）登録又は更新登録された日時とそれを行った著者のＩＤ（著者ＩＤ）とを管理するための管理テーブル３０１が格納されている。 The file management apparatus 300 stores a plurality of files 302 in which source code (document data) is described by the author, and the date and time when these files were newly (added) registered or updated and the author who performed the file. A management table 301 for managing IDs (author IDs) is stored.

図２は、ファイル著者決定装置１００のハードウェア構成を示すブロック図である。ＣＰＵ２０１は、システムバスに接続される各デバイスやコントローラを統括的に制御する。ＲＯＭ２０３又はＨＤ２０７には、ＣＰＵ２０１の制御プログラムであるＢＩＯＳ（Basic Input/Output System）やオペレーティングシステムプログラムや、ファイル著者決定装置１００が実行する例えば図３及び図４に示す処理のプログラム等が記憶されている。 FIG. 2 is a block diagram illustrating a hardware configuration of the file author determination apparatus 100. The CPU 201 comprehensively controls each device and controller connected to the system bus. The ROM 203 or the HD 207 stores a BIOS (Basic Input / Output System) that is a control program of the CPU 201, an operating system program, a program for the processing shown in FIGS. 3 and 4 executed by the file author determination apparatus 100, and the like. Yes.

なお、図２の例では、ハードディスク（ＨＤ）２０７はファイル著者決定装置１００の内部に配置された構成としているが、他の実施形態としてＨＤ２０７に相当する構成がファイル著者決定装置１００の外部に配置された構成としてもよい。本実施形態に係る例えば図３及び図４に示す処理を行うためのプログラムは、フレキシブルディスク（ＦＤ）２０６やＣＤ−ＲＯＭ等、コンピュータ読み取り可能な記録媒体に記録され、それらの記録媒体から供給される構成としてもよいし、インターネット等の通信媒体を介して供給される構成としてもよい。 In the example of FIG. 2, the hard disk (HD) 207 is configured to be disposed inside the file author determining apparatus 100, but a configuration corresponding to the HD 207 is disposed outside the file author determining apparatus 100 as another embodiment. A configuration may be adopted. For example, the program for performing the processing shown in FIGS. 3 and 4 according to the present embodiment is recorded on a computer-readable recording medium such as a flexible disk (FD) 206 or a CD-ROM, and is supplied from the recording medium. It is good also as a structure which may be supplied via communication media, such as the internet.

ＲＡＭ２０２は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＡＭ２０２にロードして、プログラムを実行することで各種動作を実現するものである。 The RAM 202 functions as a main memory, work area, and the like for the CPU 201. The CPU 201 implements various operations by loading a program necessary for execution of processing into the RAM 202 and executing the program.

ディスクコントローラ２０５は、ＨＤ２０７やＦＤ２０６等の外部メモリへのアクセスを制御する。通信ＩＦコントローラ２０４は、インターネットやＬＡＮと接続し、例えばＴＣＰ／ＩＰによって外部との通信を制御するものである。 The disk controller 205 controls access to external memories such as the HD 207 and the FD 206. The communication IF controller 204 is connected to the Internet or a LAN, and controls communication with the outside by, for example, TCP / IP.

ディスプレイコントローラ２０８は、ディスプレイ２０９における画像表示を制御する。 The display controller 208 controls image display on the display 209.

ＫＢ（キーボード）コントローラ２１０は、キーボード（ＫＢ）２１１からの操作入力を受け付け、ＣＰＵ２０１に対して送信する。なお、図示していないが、キーボード２１１の他に、マウス等のポインティングデバイスもユーザの操作手段として本実施形態に係るファイル著者決定装置１００に適用可能である。 The KB (keyboard) controller 210 receives an operation input from the keyboard (KB) 211 and transmits it to the CPU 201. Although not shown, in addition to the keyboard 211, a pointing device such as a mouse can be applied to the file author determination apparatus 100 according to the present embodiment as a user operation unit.

また、図１の親ファイル判定部１０１、更新部分判定部１０２、著者ＩＤ付与部１０３、著者ＩＤ割合算出部１０４、ファイル著者決定部１０５及び評価部１０６は、例えばＨＤ２０７内に記憶され、必要に応じてＲＡＭ２０２にロードされるプログラム及びそれを実行するＣＰＵ２０１に相当する構成である。 Further, the parent file determination unit 101, the update part determination unit 102, the author ID assigning unit 103, the author ID ratio calculation unit 104, the file author determination unit 105, and the evaluation unit 106 in FIG. Accordingly, the program is loaded into the RAM 202, and the CPU 201 executes the program.

なお、本実施形態においては、図１に示すように、ファイル管理装置３００をファイル著者決定装置１００の外部に設置した構成としているが、ファイル管理装置３００をファイル著者決定装置１００の内部に配置してもよい。この場合、管理テーブル３０１及びファイル群３０２は、ＨＤ２０７内に格納されることになる。 In the present embodiment, as shown in FIG. 1, the file management device 300 is installed outside the file author determination device 100, but the file management device 300 is arranged inside the file author determination device 100. May be. In this case, the management table 301 and the file group 302 are stored in the HD 207.

図３は、ファイル著者決定装置１００によって実行される、対象ファイルを生成する元となった親ファイルを決定する処理を示すフローチャートである。本処理は、ファイル管理装置３００において対象ファイルが更新登録又は新規登録され、ファイル管理装置３００からその旨の通知を受けることにより開始される処理である。 FIG. 3 is a flowchart showing processing for determining a parent file that is a source for generating a target file, which is executed by the file author determination apparatus 100. This process is a process that is started when the target file is updated or registered in the file management apparatus 300 and a notification to that effect is received from the file management apparatus 300.

先ず、親ファイル判定部１０１は、ファイル管理装置３００からの上記通知を受けると、その通知に示される対象ファイルが更新登録又は新規登録されたことを検知し、当該対象ファイルをファイル管理装置３００から取得する（ステップＳ３０１）。 First, when the parent file determination unit 101 receives the notification from the file management apparatus 300, the parent file determination unit 101 detects that the target file indicated in the notification has been updated or registered, and the target file is transferred from the file management apparatus 300. Obtain (step S301).

続いて、親ファイル判定部１０１は、対象ファイルに近いファイルサイズのファイルや類似したファイル名のファイル等を親ファイルの候補として決定し、ファイル管理装置３００から取得する（ステップＳ３０２）。本実施形態において、親ファイルの候補はファイル管理装置３００にて管理される全てファイルを対象にして選択しているが、他の実施形態として、対象ファイルの更新日時から一定期間過去までの更新日時のファイルや、対象ファイルの更新日時に直近の更新日時のファイルだけを対象にして親ファイルの候補を選択してもよいし、対象ファイルと拡張子が同じファイルのみを対象にして親ファイルの候補を選択してもよい。このように親ファイルの候補の選択先を制限することによって、親ファイル判定部１０１の処理負荷を低減させることができる。 Subsequently, the parent file determination unit 101 determines a file having a file size close to the target file, a file with a similar file name, or the like as a candidate for the parent file, and acquires the file from the file management apparatus 300 (step S302). In this embodiment, the parent file candidates are selected for all files managed by the file management apparatus 300. However, as another embodiment, the update date and time from the update date and time of the target file to the past for a certain period. Parent file candidates may be selected only for files with the same update date and time as the target file, or may be selected only for files with the same extension as the target file. May be selected. In this way, by limiting the selection destinations of the parent file candidates, the processing load of the parent file determination unit 101 can be reduced.

続いて、親ファイル判定部１０１は、親ファイルの候補夫々をトークンに分解するとともに、対象ファイルをトークンに分解する（ステップＳ３０３）。ここでいうトークンとは、ファイル内のソースコードの最小比較単位であり、本実施形態では、一行をトークンとしている。 Subsequently, the parent file determination unit 101 decomposes each candidate for the parent file into tokens and also decomposes the target file into tokens (step S303). The token here is a minimum comparison unit of the source code in the file, and in this embodiment, one line is a token.

続いて、親ファイル判定部１０１は、対象ファイルと各親ファイルの候補とを対応するトークン毎に比較し、互いが一致しているか否かを判定する（ステップＳ３０４）。即ち、ここでは一行毎に対象ファイルと各親ファイルの候補とソースコードが一致しているか否かを判定している。 Subsequently, the parent file determination unit 101 compares the target file with each parent file candidate for each corresponding token, and determines whether or not they match each other (step S304). That is, here, it is determined for each line whether the candidate of the target file and each parent file matches the source code.

続いて、親ファイル判定部１０１は、親ファイルの候補群のうち最も一致したトークンが多く、且つ一致したトークン数が所定の閾値以上である親ファイルの候補を親ファイルとして決定する（ステップＳ３０５）。ここで決定された対象ファイルと親ファイル（親子関係）の対応付けは、ファイル著者決定装置１００の内部（例えば、ＨＤ２０７）において管理される。 Subsequently, the parent file determination unit 101 determines, as a parent file, a parent file candidate that has the largest number of matching tokens in the parent file candidate group and the number of matching tokens is equal to or greater than a predetermined threshold (step S305). . The association between the target file determined here and the parent file (parent-child relationship) is managed inside the file author determination apparatus 100 (for example, HD 207).

図４は、ファイル著者決定装置１００によって実行される、対象ファイルの著者を決定する処理を示すフローチャートである。本処理は、ファイル管理装置３００において対象ファイルが更新登録又は新規登録され、ファイル管理装置３００からその旨の通知を受けることにより開始される処理であるが、図３に示す処理によって決定された対象ファイルと親ファイルとの対応付けも使用するため、厳密には図３に示す処理が終了した後に開始される処理である。 FIG. 4 is a flowchart showing processing for determining the author of the target file, which is executed by the file author determining apparatus 100. This process is a process that starts when the target file is updated or registered in the file management apparatus 300 and receives a notification from the file management apparatus 300. The target determined by the process shown in FIG. Since the association between the file and the parent file is also used, strictly speaking, this process is started after the process shown in FIG.

先ず、更新部分判定部１０２は、ファイル管理装置３００からの通知を受けると、その通知に示される対象ファイルが更新登録又は新規登録されたことを検知し、当該対象ファイルとその履歴ファイルとをファイル管理装置３００から取得する（ステップＳ４０１）。ここでいう履歴ファイルとは、当該対象ファイルとその親ファイル、更にその親ファイルの親ファイル、・・・というように当該対象ファイルを生成する上で直接的又は間接的に使用された全てのファイルのことをいう。履歴ファイルは、ファイル著者決定装置１００内部で管理されるファイル間の親子関係を示す対応付けを基に取得される。 First, upon receiving a notification from the file management apparatus 300, the update part determination unit 102 detects that the target file indicated in the notification has been registered for update or new registration, and sets the target file and its history file as a file. Obtained from the management apparatus 300 (step S401). The history file here refers to all the files used directly or indirectly in generating the target file, such as the target file and its parent file, and the parent file of the parent file. I mean. The history file is acquired based on the association indicating the parent-child relationship between files managed in the file author determination apparatus 100.

続いて、更新部分判定部１０２は、当該対象ファイルの履歴ファイルを取得できたか否かを判定する（ステップＳ４０２）。上記親子関係を示す対応付けが存在しなければ、当該対象ファイルの履歴ファイルを取得することができず、一方、上記親子関係を示す対応付けが存在すれば履歴ファイルを取得することができる。従って、ステップＳ４０２の処理は、上記親子関係を示す対応付けが存在するか否かの処理と同義である。 Subsequently, the update portion determination unit 102 determines whether or not the history file of the target file has been acquired (step S402). If the association indicating the parent-child relationship does not exist, the history file of the target file cannot be acquired. On the other hand, if the association indicating the parent-child relationship exists, the history file can be acquired. Therefore, the process of step S402 is synonymous with the process of determining whether or not there is an association indicating the parent-child relationship.

履歴ファイルが取得できた場合、更新部分判定部１０２は、履歴ファイル内で未だ選択されていないファイルのうち最も過去のファイルを選択する（ステップＳ４０３）。この処理は、各履歴ファイルのヘッダ情報におけるファイルの更新／追加日時を参照することにより実現可能である。 When the history file can be acquired, the update part determination unit 102 selects the oldest file among the files that have not been selected in the history file (step S403). This process can be realized by referring to the file update / addition date and time in the header information of each history file.

一方、履歴ファイルを取得することができなかった場合、著者ＩＤ付与部１０３は、ステップＳ４０１において取得されたファイルは対象ファイルのみであるため、当該対象ファイルをトークンに分解し（ステップＳ４０９）、各トークンに対して著者ＩＤを付与する（ステップＳ４１０）。ここで、著者ＩＤ付与部１０３は、管理テーブル３０１を参照することにより、対象ファイルの各トークンに著者ＩＤを付与するが、管理テーブル３０１においては当該対象ファイルについては一つの著者ＩＤしか管理されておらず、当該一つの著者ＩＤを全トークンに対して付与することになる。即ち、履歴ファイルが存在しない対象ファイルは、どのファイルも元とせずに作成されたファイルであるため、一つの著者ＩＤしか付与されない。 On the other hand, when the history file cannot be acquired, the author ID assigning unit 103 decomposes the target file into tokens because the file acquired in step S401 is only the target file (step S409). An author ID is assigned to the token (step S410). Here, the author ID assigning unit 103 assigns an author ID to each token of the target file by referring to the management table 301. In the management table 301, only one author ID is managed for the target file. The single author ID is assigned to all tokens. That is, the target file for which no history file exists is a file created without using any file, and therefore, only one author ID is assigned.

続いて、更新部分判定部１０２は、ステップＳ４０３で選択されたファイルが履歴ファイル内で最も過去のファイルであるか否かを判定する（ステップＳ４０４）。 Subsequently, the update part determination unit 102 determines whether or not the file selected in step S403 is the oldest file in the history file (step S404).

選択されたファイルが履歴ファイル内で最も過去のファイルである場合、著者ＩＤ付与部１０３は、上述したステップＳ４０９、Ｓ４１０を同様に当該ファイルに対して実行する。これも、当該ファイルには履歴ファイルが存在せず、どのファイルも元とせずに作成されたファイルだからである。 When the selected file is the oldest file in the history file, the author ID assigning unit 103 similarly performs the above-described steps S409 and S410 on the file. This is also because the history file does not exist in the file, and the file is created without any file.

一方、選択されたファイルが履歴ファイル内で最も過去のファイルではない場合、即ち、当該ファイルに親ファイルが存在する場合、更新部分判定部１０２は、当該親ファイルとともに当該ファイルをトークンに分解する（ステップＳ４０５）。 On the other hand, when the selected file is not the oldest file in the history file, that is, when the parent file exists in the file, the update part determination unit 102 breaks down the file together with the parent file into tokens ( Step S405).

続いて、更新部分判定部１０２は、当該親ファイルと当該ファイル間をトークン毎に比較し、一致しているか否かを判定する（ステップＳ４０６）。 Subsequently, the update part determination unit 102 compares the parent file and the file for each token, and determines whether or not they match (step S406).

著者ＩＤ付与部１０３は、当該親ファイルに対して一致していない当該ファイルのトークンに対し、管理テーブル３０１を参照して著者ＩＤを付与する（ステップＳ４０７）。ここでは、著者ＩＤ付与部１０３は、当該ファイルのファイル名及びヘッダ情報の更新／追加日時に該当する管理テーブル３０１のレコードを参照し、そのレコード内の著者ＩＤをステップＳ４０７において一致していないと判定されたトークンに対して付与する。なお、一致していると判定されたトークンに対しては当該親ファイルの該当するトークンと同一の著者ＩＤが付与されることになる。 The author ID assigning unit 103 assigns an author ID to the token of the file that does not match the parent file with reference to the management table 301 (step S407). Here, the author ID assigning unit 103 refers to the record in the management table 301 corresponding to the file name of the file and the update / addition date / time of the header information, and the author ID in the record does not match in step S407. Granted to the determined token. Note that the same author ID as the corresponding token of the parent file is assigned to the token determined to match.

続いて、著者ＩＤ付与部１０３は、ステップＳ４０３において選択されたファイルが対象ファイルであるか否かを判定する（ステップＳ４０８）。即ち、ステップＳ４０８では、著者ＩＤを付与する対象となるファイルのうちの最後のファイルである対象ファイルに対して著者ＩＤを付与したか否かを判定している。 Subsequently, the author ID assigning unit 103 determines whether the file selected in step S403 is a target file (step S408). That is, in step S408, it is determined whether or not the author ID is assigned to the target file that is the last file among the files to which the author ID is assigned.

選択されたファイルが対象ファイルではない場合、処理はステップＳ４０３に戻る。即ち、図４の処理においては、履歴ファイルがない最も過去のファイルから対象ファイルまでの全てに著者ＩＤを付与する処理がなされる。 If the selected file is not the target file, the process returns to step S403. That is, in the process of FIG. 4, a process of assigning an author ID to all the files from the oldest file having no history file to the target file is performed.

続いて、著者ＩＤ割合算出部１０４は、対象ファイルにおける各トークンに付与された著者ＩＤの割合を求める（ステップＳ４１１）。 Subsequently, the author ID ratio calculation unit 104 obtains the ratio of the author ID assigned to each token in the target file (step S411).

続いて、ファイル著者決定部１０５は、ステップＳ４１１で求められた著者ＩＤの割合のうち最も高い割合の著者ＩＤの著者を対象ファイルの著者として決定する（ステップＳ４１２）。なお、上述した処理においては、対象ファイルの履歴ファイルを全て取得し、最も過去のファイルから順次著者ＩＤを付与していき、最終的に対象ファイルの著者ＩＤを付与する仕組みとなっている。他の実施形態として、対象ファイルの一つ前までの関連ファイルに付与された著者ＩＤを全てファイル著者決定装置１００側で登録しておき、対象ファイルの著者を決定する際には、ファイル著者決定装置１００は対象ファイルの一つ前の親ファイルのみをファイル管理装置３００から取得し、当該親ファイルと対象ファイルとをトークン毎に比較し、共通するトークンに対しては登録している著者ＩＤを付与し、異なるトークンに対しては管理テーブル３０１に基づいて著者ＩＤを付与するように構成してもよい。このようにすることで、著者ＩＤの付与処理及び著者の決定処理の計算負荷を低減させることが可能となる。 Subsequently, the file author determination unit 105 determines the author with the highest author ID among the author ID ratios obtained in step S411 as the author of the target file (step S412). In the above-described processing, all the history files of the target file are acquired, the author ID is sequentially assigned from the oldest file, and the author ID of the target file is finally given. As another embodiment, all author IDs assigned to related files up to the previous file are registered on the file author determining apparatus 100 side, and when the author of the target file is determined, the file author is determined. The apparatus 100 acquires only the parent file immediately before the target file from the file management apparatus 300, compares the parent file and the target file for each token, and registers the author ID registered for the common token. The author ID may be assigned to different tokens based on the management table 301. By doing in this way, it becomes possible to reduce the calculation load of author ID assignment processing and author determination processing.

図５は、上述のようにして著者が決定されたファイルに対するソースコードの評価方法を具体的に説明するための図である。 FIG. 5 is a diagram for specifically explaining a source code evaluation method for a file whose author has been determined as described above.

ファイルに対してソースコードの評価を行う際、ファイルの可読性指数を用いる。図５には、ライン数が２１、分岐数が６、入れ子の深さが４段のファイルが示されているので、図５に示すファイルの場合には、可読性指数は次の式１に示すように計算される。 When evaluating the source code for a file, the readability index of the file is used. FIG. 5 shows a file with 21 lines, 6 branches, and 4 levels of nesting, so in the case of the file shown in FIG. Is calculated as follows.

ファイルに記述されるソースコードは、ライン数、分岐数及び入れ子の深さが何れも多くなると、可読性が下がる。本実施形態においては、これらの指標を考慮して上記ファイルの可読性指数をファイル毎に算出している。上記の式では、ライン数、分岐数及び入れ子の深さが多い程、可読性指数は高い値となるため、ファイルの可読性指数が低いファイル程、可読性が高く、高い評価となる。 When the number of lines, the number of branches, and the depth of nesting increase, the readability of the source code described in the file decreases. In the present embodiment, the readability index of the file is calculated for each file in consideration of these indexes. In the above formula, the higher the number of lines, the number of branches, and the depth of nesting, the higher the readability index. Therefore, the lower the readability index of the file, the higher the readability and the higher the evaluation.

本実施形態では、このようにファイルの可読性指数をファイル毎に求めることにより、各ファイルの著者を評価することが可能となる。本実施形態では、ライン数、分岐数及び入れ子の全てを評価指標としているが、これらのうちの何れか一つ又は二つを評価指標としてもよい。また、バグになりやすいコードをツール等によって検出し、検出したコードの数を評価指標としてもよい。 In the present embodiment, the author of each file can be evaluated by obtaining the readability index of the file for each file in this way. In the present embodiment, the number of lines, the number of branches, and the nesting are all used as evaluation indexes, but any one or two of them may be used as evaluation indexes. Further, a code that is likely to become a bug may be detected by a tool or the like, and the number of detected codes may be used as an evaluation index.

図６は、少なくとも一つのファイルを含むフォルダの設計上の評価方法を説明するための図である。なお、フォルダの設計上の評価は不図示の評価部によって実行される処理である。 FIG. 6 is a diagram for explaining a design evaluation method for a folder including at least one file. The folder design evaluation is a process executed by an evaluation unit (not shown).

図６には、フォルダＡ〜Ｅが示されており、特にフォルダＡにはファイルＡ〜Ｄが含まれているが、フォルダＢ〜ＥにもフォルダＡと同様に幾つかのファイルが含まれている。また、フォルダ間には矢印で示されるフォルダの呼び出し関係がある。即ち、フォルダＡは、フォルダＥとの間で外部からの参照関係、フォルダＢとの間で外部への参照関係がある。フォルダＢ、Ｃ及びＤの間では循環参照関係がある。 FIG. 6 shows folders A to E. In particular, the folder A includes files A to D, but the folders B to E also include some files like the folder A. Yes. In addition, there is a folder calling relationship indicated by arrows between folders. That is, the folder A has an external reference relationship with the folder E and an external reference relationship with the folder B. There is a circular reference relationship between folders B, C, and D.

ここで、フォルダ間の参照関係が引き起こすフォルダ設計上の問題点について説明する。先ず、ファイルＡを基準にして説明すると、外部からの参照関係（ここでは、フォルダＥからの参照関係）及び外部への参照関係（ここでは、フォルダＢへの参照関係）を極端に無くそうとすると、フォルダＡ内のファイル数の増加（フォルダの肥大化）やフォルダＡ内のファイルが他のフォルダのファイルとのソースコードの重複を引き起こす（問題点１）。 Here, a problem in folder design caused by a reference relationship between folders will be described. First, the description will be made based on the file A. The external reference relationship (here, the reference relationship from the folder E) and the external reference relationship (here, the reference relationship to the folder B) are extremely eliminated. Then, an increase in the number of files in folder A (folder enlargement) and files in folder A cause duplication of source code with files in other folders (problem 1).

その一方で、フォルダＡから外部への参照関係（ここでは、フォルダＢへの参照関係）が極端に多いと、フォルダＡはフォルダＢにおける変更の影響を受けやすくなる（問題点２）。また、外部からの参照関係（ここでは、フォルダＥからの参照関係）が極端に多いと、フォルダＡは変更をし辛くなる（問題点３）。 On the other hand, if the reference relationship from the folder A to the outside (here, the reference relationship to the folder B) is extremely large, the folder A is easily affected by the change in the folder B (problem 2). Also, if there are extremely many external reference relationships (here, reference relationships from folder E), folder A is difficult to change (problem 3).

次に、フォルダＢ、Ｃ、Ｄ間における循環参照関係に着目すると、フォルダ間に循環参照関係がある場合、各フォルダの再利用性が下がるという問題点がある（問題点４）。 Next, focusing on the circular reference relationship between the folders B, C, and D, there is a problem that the reusability of each folder is lowered when there is a circular reference relationship between the folders (problem 4).

以上の問題点１〜４から次の評価基準が挙げられる。即ち、問題点１〜３を考慮したフォルダの可読性指数と問題点４を考慮した循環参照関係にあるフォルダ数とである。フォルダの可読性指数とは、外部から参照されるフォルダ数と外部へ参照するフォルダ数との合算値と当該フォルダ（上記の例では、フォルダＡ）内のファイル数とを統合した値をパラメータとして求められる指数である。この指数は、外部から参照されるフォルダ数と外部へ参照するフォルダ数が少なすぎたり、多すぎたりする場合に高い値をとる。また、当該フォルダ内のファイル数が多い場合にも高い値をとる。従って、フォルダの可読性指数も低い値である程、高い評価値として扱われる。同じく、循環参照関係にあるフォルダ数が多い場合には当該フォルダの再利用性を下げるため、低い値である程、高い評価値として扱われる。なお、フォルダＡの場合、循環参照関係にあるフォルダ数は０となる。 The following evaluation criteria are mentioned from the above problems 1-4. That is, the readability index of the folder considering the problems 1 to 3 and the number of folders in the circular reference relationship considering the problem 4. The folder readability index is a parameter obtained by integrating the sum of the number of folders referenced from outside and the number of folders referenced externally and the number of files in the folder (in the above example, folder A). It is an index to be obtained. This index takes a high value when the number of folders referred from the outside and the number of folders referred to the outside are too few or too many. It also takes a high value when the number of files in the folder is large. Therefore, the lower the readability index of the folder, the higher the evaluation value. Similarly, when the number of folders having a circular reference relationship is large, the reusability of the folder is lowered, so that a lower value is treated as a higher evaluation value. In the case of folder A, the number of folders having a circular reference relationship is zero.

以上のようにして各フォルダについてフォルダの可読性指数と循環参照関係にあるフォルダ数とを求めることにより、各フォルダの設計上での評価を行うことができるが、本実施形態では、各フォルダの著者を評価することが最終的な目的である。従って、各フォルダの著者を決定する必要がある。以下、その方法について説明する。 As described above, by calculating the readability index of the folder and the number of folders having a circular reference relationship for each folder, it is possible to perform evaluation on the design of each folder. In this embodiment, the author of each folder Is the ultimate goal. Therefore, it is necessary to determine the author of each folder. The method will be described below.

第１の方法として、図３及び図４のステップＳ４０７までの処理によって、フォルダ内における各ファイルのトークン（ここでは、一行）毎に著者を対応付ける。次に、当該フォルダ単位で、各ファイルのトークンに対応付けられた著者を集計し、最も数が多い著者を当該フォルダの著者として決定する。 As a first method, an author is associated with each token (here, one line) of each file in the folder by the processing up to step S407 in FIGS. Next, the authors associated with the tokens of each file are tabulated for each folder, and the author with the largest number is determined as the author of the folder.

第２の方法として、図３及び図４を用いて説明した方法により、フォルダ内における各ファイルのトークン（ここでは、一行）毎に著者を対応付け、更に各ファイルの著者を決定する。次に、当該フォルダ単位で、各ファイルについて決定された著者を集計し、最も数が多い著者を当該フォルダの著者として決定する。 As a second method, by the method described with reference to FIGS. 3 and 4, an author is associated with each token (here, one line) of each file in the folder, and the author of each file is determined. Next, the authors determined for each file are tabulated for each folder, and the author with the largest number is determined as the author of the folder.

上述した第１又は第２の方法によりフォルダの著者を決定し、該当するフォルダの設計上での評価値が当該著者の評価値として求められる。 An author of a folder is determined by the first or second method described above, and an evaluation value in designing the corresponding folder is obtained as an evaluation value of the author.

図７は、以上のようにして求められたファイル及びフォルダの各著者の評価値をテーブル形式にまとめた図である。図７に示す評価テーブルは、最終的な出力値としてディスプレイ２０９に表示される。 FIG. 7 is a table in which the evaluation values of the respective authors of the files and folders obtained as described above are summarized in a table format. The evaluation table shown in FIG. 7 is displayed on the display 209 as the final output value.

図７に示す例においては、メンバ５（menber_c_5）、メンバ７（menber_c_7）、メンバ７（menber_c_7）、メンバ９（menber_c_9）、メンバ１０（menber_c_10）、メンバ１１（menber_c_11）、メンバ１２（menber_c_12）、メンバ１７（menber_c_17）、メンバ１８（menber_c_18）、メンバ２０（menber_c_20）、メンバ２７（menber_c_27）毎に、期間、ファイル総行数（Ａ）、コピー行数（Ｂ）、（Ａ）−（Ｂ）、ファイル可読性指数＞３００、循環フォルダ数及びフォルダ可読性指数が示されている。 In the example shown in FIG. 7, member 5 (menber_c_5), member 7 (menber_c_7), member 7 (menber_c_7), member 9 (menber_c_9), member 10 (menber_c_10), member 11 (menber_c_11), member 12 (menber_c_12), For each member 17 (menber_c_17), member 18 (menber_c_18), member 20 (menber_c_20), and member 27 (menber_c_27), the period, the total number of lines (A), the number of copied lines (B), (A)-(B) , File readability index> 300, number of circulating folders and folder readability index are shown.

期間とは、評価の対象となるファイル及びフォルダの更新／追加日時を限定した期間であり、ここに示される期間内に該当する更新／追加日時のファイル及びフォルダが評価の対象となっている。なお、フォルダの更新／追加日時は、当該フォルダの属性情報によって示されている。 The period is a period in which the update / addition date / time of the file and folder to be evaluated is limited, and the file / folder having the update / addition date / time corresponding to the period shown here is the object of evaluation. The update / addition date / time of the folder is indicated by the attribute information of the folder.

ファイル総行数（Ａ）とは、上記期間に該当する各ファイルの行数の総数である。コピー行数（Ｂ）とは、上記ファイル総行数（Ａ）のうちのコピーによって更新された行数である。コピー元のファイルは、従来のクローン検出アルゴリズムによって検出することができ、検出されたコピー元ファイルと対象となるファイルとを比較することによりコピーによって更新された行数を求めることが可能である。 The total number of lines (A) is the total number of lines in each file corresponding to the above period. The number of copied lines (B) is the number of lines updated by copying out of the total number of lines (A) in the file. The copy source file can be detected by a conventional clone detection algorithm, and the number of lines updated by copying can be obtained by comparing the detected copy source file with the target file.

（Ａ）−（Ｂ）とは、ファイル総行数（Ａ）からコピー行数（Ｂ）を差し引いた値である。本実施形態では、この値により示される行数をファイルの可読性指数の算出対象としている。 (A)-(B) is a value obtained by subtracting the copy line number (B) from the total file line number (A). In the present embodiment, the number of lines indicated by this value is the calculation target of the file readability index.

ファイル可読性指数とは、該当する著者のファイルの可読性指数である。循環フォルダ数とは、該当する著者のフォルダに対して循環参照関係にあるフォルダ数である。フォルダ可読性指数とは、該当する著者のフォルダの可読性指数である。 The file readability index is the readability index of the corresponding author's file. The number of circular folders is the number of folders that have a circular reference relationship with the corresponding author's folder. The folder readability index is the readability index of the corresponding author's folder.

図７の右３つの項目（ファイル可読性指数、循環フォルダ数、フォルダ可読性指数）が各著者の評価値となる。これらの各項目が低い値である程、該当する著者は評価の高いソースコードの記述、フォルダ設計を行っていることになる。 The three items on the right of FIG. 7 (file readability index, number of circulating folders, folder readability index) are the evaluation values of each author. The lower the value of each item, the more relevant the author is writing highly rated source code and folder design.

以上のように、本実施形態においては、ファイル内に記述される所定の単位のドキュメントデータ（ソースコード）毎に著者のＩＤを付与し、付与された著者のＩＤの割合に基づいて当該ファイルの著者を決定し、著者が決定された当該ファイルを評価するように構成している。従って、本実施形態によれば、著者とドキュメントデータそのものからファイルの著者を特定し、当該ファイルに記述されたドキュメントデータの評価を当該著者の評価として行うことが可能となる。なお、一ファイル毎に著者を決定するほかに、複数のファイル毎に著者を決定してもよい。また、ファイルもドキュメントデータ以外に画像データを含むものであってもよく、著者の決定単位となるデータは特に制限されるものではない。 As described above, in this embodiment, an author ID is assigned to each predetermined unit of document data (source code) described in the file, and the file ID is determined based on the ratio of the author ID assigned. The author is determined, and the author is configured to evaluate the determined file. Therefore, according to the present embodiment, it is possible to specify the author of a file from the author and the document data itself, and to evaluate the document data described in the file as the author's evaluation. In addition to determining the author for each file, the author may be determined for a plurality of files. Further, the file may include image data in addition to the document data, and the data that is the determination unit of the author is not particularly limited.

また、本実施形態においては、フォルダの著者を決定し、著者が決定された当該フォルダの設計上の評価を行うようにしているため、当該フォルダの設計上の評価を当該著者の評価として行うことが可能となる。 In this embodiment, since the author of a folder is determined and the design of the folder for which the author has been determined is evaluated, the design of the folder is evaluated as the evaluation of the author. Is possible.

本発明の実施形態に係るファイル著者決定システムの機能的な構成を示す図である。It is a figure which shows the functional structure of the file author determination system which concerns on embodiment of this invention. ファイル著者決定装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of a file author determination apparatus. ファイル著者決定装置によって実行される、対象ファイルを生成する元となった親ファイルを決定する処理を示すフローチャートである。It is a flowchart which shows the process which determines the parent file used as the origin which produces | generates the object file performed by the file author determination apparatus. ファイル著者決定装置によって実行される、対象ファイルの著者を決定する処理を示すフローチャートである。It is a flowchart which shows the process which determines the author of an object file performed by the file author determination apparatus. 著者が決定されたファイルに対するソースコードの評価方法を具体的に説明するための図である。It is a figure for demonstrating concretely the evaluation method of the source code with respect to the file for which the author was determined. 少なくとも一つのファイルを含むフォルダの設計上の評価方法を説明するための図である。It is a figure for demonstrating the evaluation method on the design of the folder containing at least 1 file. ファイル及びフォルダの各著者の評価値をテーブル形式にまとめた図である。It is the figure which put together the evaluation value of each author of a file and a folder in a table format.

Explanation of symbols

１００：ファイル著者決定装置
１０１：親ファイル判定部
１０２：更新部分判定部
１０３：著者ＩＤ付与部
１０４：著者ＩＤ割合算出部
１０５：ファイル著者決定部
１０６：評価部
３００：ファイル管理装置
３０１：管理テーブル
３０２：ファイル群 DESCRIPTION OF SYMBOLS 100: File author determination apparatus 101: Parent file determination part 102: Update part determination part 103: Author ID provision part 104: Author ID ratio calculation part 105: File author determination part 106: Evaluation part 300: File management apparatus 301: Management table 302: File group

Claims

A granting unit for granting author identification information for each predetermined unit of document data described in the data;
A calculating means for calculating a ratio of author identification information given to the data;
An author determination apparatus comprising: a determination unit that determines an author of the data based on a ratio of identification information calculated by the calculation unit.

2. The author determination apparatus according to claim 1, further comprising an evaluation unit that evaluates document data described in the data for which the author is determined by the determination unit.

The evaluation means evaluates the data based on at least one of the number of lines, the number of branches, and the number of nestings of source code that is document data described in the data. The author determination apparatus according to claim 2.

The calculation means calculates a ratio of author identification information given by the grant means for each data included in the folder in units of folders, and the determination means calculates a ratio of identification information calculated by the calculation means. The author determination apparatus according to any one of claims 1 to 3, wherein an author of the folder is determined based on the information.

The calculation means calculates a ratio of author identification information given by the grant means for each data included in the folder in units of folders, and the determination means calculates a ratio of identification information calculated by the calculation means. 4. The author of each of the data is determined on the basis of the data, and the author who has the largest number among the authors determined for each of the data is determined as the author of the folder. Author determination device described in.

6. The author determination apparatus according to claim 4, further comprising second evaluation means for performing design evaluation of the folder whose author is determined by the determination means.

The second evaluation means includes a reference relationship from the other folder to the folder, a reference relationship from the folder to another folder, a circular reference relationship between the folder and another folder, and the number of data in the folder. 7. The author determination apparatus according to claim 6, wherein a design evaluation of the folder is performed based on at least one of them.

A method for controlling an author determination apparatus having an assigning means, a calculating means, and a determining means ,
It said applying means, and applying steps to impart identification information of author each predetermined unit of document data described in the data,
The calculating means, and calculating a percentage of identity of the author given to the data,
The author determination apparatus control method , wherein the determination means includes a determination step of determining an author of the data based on the ratio of the identification information calculated in the calculation step.

Computer
A granting unit for granting author identification information for each predetermined unit of document data described in the data;
A calculating means for calculating a ratio of author identification information given to the data;
Based on the ratio of the identification information calculated by said calculation means, a program to function as a determining means for determining the author of the data.