WO2024069741A1 - Software technological field extraction device and software technological field extraction method - Google Patents

Software technological field extraction device and software technological field extraction method Download PDF

Info

Publication number
WO2024069741A1
WO2024069741A1 PCT/JP2022/035898 JP2022035898W WO2024069741A1 WO 2024069741 A1 WO2024069741 A1 WO 2024069741A1 JP 2022035898 W JP2022035898 W JP 2022035898W WO 2024069741 A1 WO2024069741 A1 WO 2024069741A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
technical
field
classification model
field extraction
Prior art date
Application number
PCT/JP2022/035898
Other languages
French (fr)
Japanese (ja)
Inventor
啓太 森
陽一郎 古賀
俊直 石井
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2022/035898 priority Critical patent/WO2024069741A1/en
Publication of WO2024069741A1 publication Critical patent/WO2024069741A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services

Definitions

  • This disclosure relates to a technique for extracting software technical fields from the results of software development.
  • Topic analysis is one of the cluster classification techniques. Topic analysis is a technique for classifying texts into any number of topics that describe the text. Automatic tagging of Q&A sites is an example of topic analysis.
  • Cited Reference 1 describes a mechanism for calculating an engineer's programming skill standard score from source code stored in a source code management repository.
  • Reference 2 describes a system for analyzing files held and extracting employee skills.
  • Cited documents 1 and 2 describe technologies that analyze deliverables such as design documents or source code to help determine programming language proficiency or analyze the skills of individuals. However, these technologies evaluate a developer's skills for a predetermined programming language. Or they use simple character analysis to pick up words and use those words as skills. The technical field of a team or developer can be identified by a combination of multiple keywords, but existing technologies cannot estimate the technical field from multiple keywords. In addition, if each word obtained from a deliverable is extracted as a skill, as with existing technologies, the number of extractions becomes enormous. This makes grouping difficult, making it difficult for people without sufficient knowledge of software technology to handle.
  • This disclosure has been made to solve the above problems, and aims to appropriately extract technical fields from the results of software development and present them to users.
  • the software technical field extraction device disclosed herein includes a preprocessing unit that creates preprocessed data by preprocessing the software development deliverables, a classification model construction unit that creates a classification model that automates the extraction of technical fields from the preprocessed data, a technical field extraction unit that extracts the technical fields of the deliverables from the preprocessed data using the classification model, a skill map creation unit that creates a skill map that represents the proportion of technical fields to which an individual or organization is related by aggregating the technical fields extracted by the technical field extraction unit for each individual or organization, and an output control unit that causes an output device to output the skill map.
  • FIG. 1 is a block diagram showing a configuration of a software technical field extraction device.
  • FIG. 13 is a diagram illustrating an example of source code keyword acquisition rules.
  • FIG. 13 is a diagram illustrating an example of a combination rule.
  • FIG. 13 is a diagram illustrating an example of a post-preprocessing DB.
  • 13 is a flowchart showing an operation of a classification model construction unit.
  • FIG. 2 is a diagram illustrating an example of a technical field name DB.
  • FIG. 13 is a diagram showing an example of a technical field confirmation screen.
  • 13 is a flowchart showing an operation of a technical field extraction unit.
  • FIG. 13 is a diagram illustrating an example of an estimation result DB.
  • 13 is a flowchart showing the operation of a skill map creation unit.
  • FIG. 13 is a diagram illustrating an example of source code keyword acquisition rules.
  • FIG. 13 is a diagram illustrating an example of a combination rule.
  • FIG. 13 is a diagram
  • FIG. 2 is a diagram illustrating an example of an individual skill map DB.
  • FIG. 2 is a diagram illustrating an example of an organization skill map DB.
  • FIG. 13 is a diagram illustrating an example of an individual skill map display screen.
  • FIG. 13 is a diagram illustrating an example of an organization skill map display screen.
  • FIG. 13 is a diagram showing an example of a technical field inquiry screen.
  • FIG. 2 is a diagram illustrating a hardware configuration of a software technical field extraction device.
  • FIG. 2 is a diagram illustrating a hardware configuration of a software technical field extraction device.
  • FIG. 1 is a diagram showing the overall configuration of a software technology field extraction device 101.
  • a set of software development deliverables is input to the software technology field extraction device 101 in response to an instruction input from the input device 40.
  • the deliverables include design documents 30 and source code 31.
  • the input device 40 is, for example, a terminal such as a personal computer.
  • a user inputs instructions to the software technology field extraction device 101 by operating the screen of the terminal.
  • the software technology field extraction device 101 creates a classification model that automates the extraction of the technology fields contained in the deliverables, and extracts the technology fields of the deliverables using the classification model.
  • the results of the extraction of the technology fields are presented to the user by the output device 41.
  • the output device 41 is, for example, a display device.
  • the software technology field extraction device 101 is configured with a preprocessing unit 11, a classification model construction unit 18, a technology field extraction unit 23, a skill map creation unit 26, and an output control unit 32.
  • the pre-processing unit 11 includes a design document pre-processing unit 12, a source code pre-processing unit 13, an information combination unit 14, and a memory unit 51.
  • the memory unit 51 stores source code keyword acquisition rules 15, combination rules 16, and a post-preprocessing database 17.
  • the pre-processing unit 11 starts processing in response to a command input from the input device 40.
  • the design document 30 of the project is input to the design document pre-processing unit 12, and a source file containing source code 31 is input to the source code pre-processing unit 13.
  • the software technology field extraction device 101 may acquire the design document 30 and the source code 31 through communication.
  • the design document preprocessing unit 12 extracts keywords by performing morphological analysis on the design document 30, and converts the design document 30 into a set of keywords.
  • the source code preprocessing unit 13 extracts keywords by performing analysis on the source code 31 based on the source code keyword acquisition rules 15, and converts the source code into a set of keywords.
  • Figure 2 shows an example of source code keyword acquisition rules 15.
  • the source code keyword acquisition rules 15 in Figure 2 define extended regular expressions for acquiring keywords for each extension of a source file that includes source code 31.
  • the first column specifies the extension.
  • the second column specifies the rule name.
  • the third column specifies the extended regular expression.
  • the fourth column specifies whether to perform lexical analysis on the acquired character string.
  • the fifth column specifies whether the rule is enabled or disabled.
  • the first line shows a rule that when the extension is .c or .h, the library name following #include is acquired as a keyword.
  • the second line shows a rule that when the extension is .c or .h, the contents of the comment are acquired and morphological analysis is performed on the contents to acquire keywords.
  • the third line shows a rule that when the extension is .c, the function name is acquired as a keyword.
  • the data created by the design document preprocessing unit 12 and the source code preprocessing unit 13 is input to the information combination unit 14.
  • the information combination unit 14 associates the design document 30 with the source code 31 based on the combination rules 16.
  • Figure 3 shows an example of a merge rule 16.
  • the merge rule 16 in Figure 3 describes the process of associating a design document 30 with source code 31 in a programming language. This merge rule 16 searches the input source code 31 for each input design document (document), and determines that source code 31 with matching keywords is related source code 31.
  • the information integration unit 14 When the information integration unit 14 creates data that associates the design document 30 with the source code 31, it stores this data as preprocessed data in the preprocessed database (DB) 17.
  • DB preprocessed database
  • Figure 4 shows an example of the pre-processed DB 17.
  • a relational DB is used for the pre-processed DB 17 in Figure 4.
  • the pre-processed DB 17 in Figure 4 consists of tables 401, 402, 403, and 404.
  • Table 401 has fields such as data ID, project ID, data owner ID, design document name, related source file name, and keywords.
  • Table 402 has fields such as project ID and project name.
  • Table 403 has fields such as data owner ID, name, and organization ID.
  • Table 404 has fields such as organization ID and organization name.
  • FIG. 5 is a flowchart showing the operation of the classification model construction unit 18.
  • the classification model construction unit 18 includes a model construction unit 19, a technical field naming unit 20, and a memory unit 52.
  • the memory unit 52 stores a technical field classification model 21 and a technical field name database 22.
  • the classification model construction unit 18 performs processing using the post-preprocessing database 17.
  • the classification model construction unit 18 executes processing in the model construction unit 19 in response to a command input from the input device 40 or the completion of processing by the preprocessing unit 11.
  • step S101 the model construction unit 19 uses the preprocessed data from the preprocessed database 17 to construct N technology field classification models with 1 to N topics using a topic analysis model algorithm such as PLSA or LDA.
  • a topic analysis model algorithm such as PLSA or LDA.
  • step S102 the model construction unit 19 evaluates the topic model performance indicators (perplexity, coherence, etc.) for the N technical field classification models, and stores the most highly evaluated technical field classification model in the memory unit 52 as the technical field classification model 21 to be used in processing by the technical field extraction unit 23.
  • topic model performance indicators perplexity, coherence, etc.
  • the technical field naming unit 20 takes the most frequently occurring keyword among the keywords constituting each topic of the technical field classification model 21 as the technical field name, which is a phrase representing the technical field. At this time, it is assumed that a keyword that is the technical field name of a certain topic does not appear in other topics.
  • the technical field naming unit 20 reflects the technical field names of each topic of the technical field classification model 21 in the technical field name DB 22 of the storage unit 52.
  • FIG. 6 shows an example of a technical field name DB22.
  • a relational DB is used for the technical field name DB22.
  • the technical field name DB22 has fields for a technical field ID and a technical field name.
  • step S104 the output control unit 32 causes the output device 41 to display a technical field confirmation screen that shows, in a two-dimensional map, each topic in the technical field classification model 21, the technical field name corresponding to each topic, and the keywords that make up each topic.
  • Figure 7 shows an example of a technical field confirmation screen.
  • the technical field confirmation screen displays a two-dimensional map 702 of the keywords that make up the topics.
  • the two-dimensional map 702 displays various keywords that make up the topics.
  • keywords that appear more frequently are displayed in larger font size, and keywords that appear less frequently are displayed in smaller font size.
  • the technical field confirmation screen also displays the technical field name 703 of the topic.
  • the user can input a correction to the technical field name to the software technical field extraction device 101 by operating the technical field confirmation screen with the input device 40.
  • the technical field naming unit 20 corrects the technical field name in the technical field name DB 22 in step S106.
  • FIG. 8 is a flowchart showing the operation of the technical field extraction unit 23.
  • the technical field extraction unit 23 is configured with an estimation unit 24 and a storage unit 53.
  • the storage unit 53 stores the technical field name database 25.
  • step S201 the estimation unit 24 inputs the preprocessed data of the preprocessed DB 17 specified by the user or the skill map creation unit 26 into the technology field classification model 21, and calculates the probability that the deliverable includes each technology field.
  • step S202 the estimation unit 24 estimates that the technology field that exceeds a certain probability is the technology field of the deliverable. This certain probability is set by the user.
  • step S203 the estimation unit 24 registers the estimation results of the technology field in the estimation result DB 25 of the storage unit 53.
  • Figure 9 shows an example of the estimation result DB25.
  • a relational DB is used for the estimation result DB25 in Figure 9.
  • the estimation result DB25 has fields such as data ID, classification result, and technical field estimation result.
  • the data IDs in the estimation result DB25 correspond to the data IDs in the preprocessing DB17.
  • the classification result stores the probability that certain preprocessing data includes a technical field of a certain technical field ID.
  • the technical field ID of a technical field with this probability above a certain level is stored as the estimation result of the technical field.
  • FIG. 10 is a flowchart showing the operation of the skill map creation unit 26.
  • the skill map creation unit 26 is configured with a creation processing unit 27 and a memory unit 54.
  • the memory unit 54 stores an individual skill map DB 28 and an organization skill map DB 29.
  • the individual skill map DB 28 is a database of skill map data for each individual
  • the organization skill map DB 29 is a database of skill map data for each organization.
  • the creation processing unit 27 instructs the technical field extraction unit 23 to extract technical fields using the preprocessed data in the preprocessed DB 17, triggered by a command input from the input device 40 or a scheduled timing such as when the preprocessed DB 17 is updated or when the technical field classification model 21 is updated (step S301).
  • step S302 the creation processing unit 27 selects one piece of estimation result data from the estimation result DB 25.
  • step S303 the creation processing unit 27 uses the data ID of the selected estimation result data as a key to confirm the owner and the owner's organization of each data from the preprocessing DB 17.
  • the creation processing unit 27 then counts the number of technical field IDs extracted for each owner and organization, and updates the individual skill map DB 28 and the organizational skill map DB 29.
  • FIG. 11 shows an example of an individual skill map DB 28.
  • a relational DB is used for the individual skill map DB 28 in FIG. 11.
  • the individual skill map DB 28 has technical field ID and count as fields.
  • a table exists for each individual. For each individual, the number of extractions from the deliverables owned is counted for each technical field ID.
  • FIG. 12 shows an example of an organization skill map DB 29.
  • a relational DB is used for the organization skill map DB 29 in FIG. 12.
  • the organization skill map DB 29 has technical field ID and count as fields.
  • a table exists for each individual. For each organization, the number of times extractions are made from the deliverables owned is counted for each technical field ID.
  • step S304 the creation processing unit 27 determines whether or not there is unselected inference result data in the inference result DB 25. If there is unselected inference result data, the process of the creation processing unit 27 returns to step S302. Once the process has been completed for all the inference result data in the inference result DB 25, in step S305 the output control unit 32 causes the output device 41 to display the individual skill map data and organizational skill map data.
  • FIG. 13 shows an example of an individual skill map display screen on which individual skill map data is displayed on the output device 41.
  • the individual skill map display screen displays an individual selection tab 1301, technical fields owned by the individual 1302, the number of data owned by the individual 1303, related projects 1304 of the individual, and a pie chart 1305 representing the technical fields owned by the individual.
  • the user can select an individual for whom he/she wants to check the skill map from the individual selection tab 1301.
  • the technical fields owned by the individual 1302 show the technical fields owned by the individual along with their percentages. This percentage is calculated from the count number in the individual skill map DB 28.
  • the skill map of "A" shows the technical fields as embedded (60%) and WEB (40%), the number of data as 100, and the related projects as A and B.
  • FIG. 14 shows an example of an organization skill map display screen on which organization skill map data is displayed on the output device 41.
  • the organization skill map display screen displays an organization selection tab 1401, technical fields owned by the organization 1402, the number of data owned by the organization 1403, related projects of the organization 1404, and a pie chart 1405 showing the technical fields owned by the organization.
  • the user can select the organization for which he wants to check the skill map from the organization selection tab 1401.
  • the technical fields owned by the organization 1402 are shown along with the percentages of the technical fields owned by the organization. This percentage is calculated from the count number in the organization skill map DB 29.
  • the skill map of "Section A" shows the technical fields as embedded (60%) and WEB (40%), the number of data as 1000, and the related projects as A, B, and C.
  • Figure 15 shows an example of a technical field inquiry screen that the software technical field extraction device 101 displays on the output device 41.
  • the technical field inquiry screen includes a design document selection button 1501 and an inquiry button 1503.
  • the user selects a design document registered in the preprocessing DB 17 by pressing the design document selection button 1501.
  • the selected design document and the associated source code related to the selected design document are displayed to the right of the design document selection button 1501.
  • the output control unit 32 retrieves the extraction results of the technical field of the selected design document from the estimation result DB 25, and displays them as extraction results to the right of the inquiry button 1503.
  • the preprocessing unit 11, the classification model construction unit 18, the technical field extraction unit 23, the skill map creation unit 26, and the output control unit 32 in the above-mentioned software technical field extraction device 101 are realized by a processing circuit 81 shown in Fig. 16. That is, the processing circuit 81 includes the preprocessing unit 11, the classification model construction unit 18, the technical field extraction unit 23, the skill map creation unit 26, and the output control unit 32 (hereinafter, the preprocessing unit 11, etc.).
  • the processing circuit 81 may be implemented by dedicated hardware or a processor that executes a program stored in a memory.
  • the processor may be, for example, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), etc.
  • the processing circuit 81 When the processing circuit 81 is dedicated hardware, the processing circuit 81 corresponds to, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), or a combination of these.
  • Each function of each part such as the pre-processing unit 11 may be realized by multiple processing circuits 81, or the functions of each part may be combined and realized by a single processing circuit.
  • the processing circuit 81 When the processing circuit 81 is a processor, the functions of the preprocessing unit 11 and the like are realized by a combination of software, etc. (software, firmware, or software and firmware).
  • the software, etc. is written as a program and stored in a memory.
  • the processor 82 applied to the processing circuit 81 realizes the functions of each unit by reading and executing the program stored in the memory 83.
  • the software technology field extraction device 101 includes a memory 83 for storing a program that, when executed by the processing circuit 81, results in the execution of the steps of the preprocessing unit 11 performing preprocessing on the software development product to create preprocessed data, the classification model construction unit 18 creating a classification model that automates the extraction of technology fields from the preprocessed data, the technology field extraction unit 23 extracting technology fields from the preprocessed data using the classification model, and the skill map creation unit 26 creating a skill map of an individual or organization by aggregating the technology fields extracted by the technology field extraction unit 23 for each individual or organization.
  • this program can be said to cause a computer to execute the procedure or method of the preprocessing unit 11 and the like.
  • memory 83 may be, for example, non-volatile or volatile semiconductor memory such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disk) and its drive device, or any storage medium to be used in the future.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • flash memory EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disk) and its drive device, or any storage medium to be used in the future.
  • EPROM Erasable Programmable Read Only Memory
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • HDD Hard Disk Drive
  • magnetic disk
  • the above describes a configuration in which the functions of the pre-processing unit 11, etc. are realized either by hardware or software, etc. However, this is not limited to the above, and a configuration in which part of the pre-processing unit 11, etc. is realized by dedicated hardware and another part is realized by software, etc.
  • the functions of the pre-processing unit can be realized by a processing circuit as dedicated hardware, and the remaining functions can be realized by the processing circuit 81 as the processor 82 reading and executing a program stored in the memory 83.
  • the processing circuit can realize each of the above-mentioned functions by hardware, software, etc., or a combination of these.
  • the storage units 51, 52, 53, and 54 are composed of memory 83, but they may be composed of a single memory 83, or each may be composed of an individual memory.
  • the software technology field extraction device 101 may be configured on a user terminal, which is a terminal used by a user, or on an administrator terminal, which is a terminal managed by an administrator.
  • the software technology field extraction device 101 may also be configured as a system that combines a user terminal or an administrator terminal with a server. In this case, each function or each component of the software technology field extraction device 101 described above may be distributed and placed on each device that makes up the system, or may be centrally placed on one of the devices.
  • Preprocessing section 12 Design document preprocessing section, 13 Source code preprocessing section, 14 Information combination section, 15 Source code keyword acquisition rules, 16 Combination rules, 17 Post-preprocessing database, 18 Classification model construction section, 19 Model construction section, 20 Technical field naming section, 21 Technical field classification model, 22 Technical field name database, 23 Technical field extraction section, 24 Estimation section, 25 Technical field name database, 26 Skill map creation section, 27 Creation processing section, 28 Individual skill map database, 29 Organization skill map database, 30 Design document, 31 Source code, 32 Output control section, 40 Input device, 41 Output device, 51, 52, 53, 54 Memory section, 81 Processing circuit, 82 Processor, 83 Memory, 101 Software technical field extraction device.

Abstract

The purpose of the present disclosure is to suitably extract a technological field from a software development deliverable and provide the technological field to a user. This software technological field extraction device (101) comprises: a preprocessing unit (11) that preprocesses a software development deliverable, thereby creating preprocessed data; a classification model construction unit (18) that creates a classification model for automating extraction of the technological field from the preprocessed data; a technological field extraction unit (23) that extracts the technological field of the deliverable from the preprocessed data by using the classification model; a skill map creation unit (26) that aggregates the technological fields extracted by the technological field extraction unit (23) for each individual or organization, thereby creating a skill map that represents the proportions of technological fields to which the individuals or organizations are related; and an output control unit (32) that causes an output device to output the skill map.

Description

ソフトウェア技術分野抽出装置およびソフトウェア技術分野抽出方法Apparatus and method for extracting software technical field
 本開示は、ソフトウェア開発の成果物からソフトウェアの技術分野を抽出する技術に関する。 This disclosure relates to a technique for extracting software technical fields from the results of software development.
 自然言語処理の分野において、機械学習モデルを用いたクラスタ分類の活用例がある。スパムフィルタはその一例である。 In the field of natural language processing, there are examples of cluster classification using machine learning models. Spam filters are one example.
 クラスタ分類の技術の一つとして、トピック分析がある。トピック分析とは、文章を、文章を表現する任意の数のトピックに分類する技術である。質問サイトの自動タグ付けは、トピック分析の一例である。 Topic analysis is one of the cluster classification techniques. Topic analysis is a technique for classifying texts into any number of topics that describe the text. Automatic tagging of Q&A sites is an example of topic analysis.
 ソフトウェア開発の分野において、個人またはチームが所有するスキルを、過去の成果物から抽出する技術がある。引用文献1には、ソースコード管理リポジトリに保存されているソースコードからエンジニアのプログラミングスキル偏差値を算出する仕組みについて記載されている。 In the field of software development, there is technology that extracts the skills possessed by individuals or teams from past deliverables. Cited Reference 1 describes a mechanism for calculating an engineer's programming skill standard score from source code stored in a source code management repository.
 引用文献2には、保有しているファイルを解析し、従業員のスキルを抽出する仕組みについて記載されている。 Reference 2 describes a system for analyzing files held and extracting employee skills.
特開2020-035077号公報JP 2020-035077 A 特開2005-202812号公報JP 2005-202812 A 国際公開第2021/019942号International Publication No. 2021/019942 特開2012-221316号公報JP 2012-221316 A 特表2015-511733号公報JP 2015-511733 A
 近年、ソフトウェア技術の分野は多種多様に広がってきている。そういった背景の中、組織において、ソフトウェアを生産するチームまたは開発者がどのような技術分野のスキルを持っているかの抽出が難しくなってきた。 In recent years, the field of software technology has become increasingly diverse. In this context, it has become difficult for organizations to extract the technical skills of the teams or developers who produce software.
 引用文献1,2には、設計文書またはソースコードなどの成果物を解析し、プログラミング言語能力の判定を支援する、または個人の持つスキルを分析する技術について記載されている。しかし、これらの技術は、あらかじめ決まったプログラミング言語に関して、開発者のスキルを評価するものである。あるいは、簡単な文字解析で単語を拾い、その単語をスキルとするものである。あるチームまたは開発者が保有する技術分野は、複数のキーワードの組み合わせによって特定できるものであるが、既存技術では、複数キーワードからの技術分野の推定が出来ない。また、既存技術のように、成果物から得られた単語一つ一つをスキルとして抽出してしまうと、抽出数が膨大になってしまう。そのため、グルーピングが困難となり、ソフトウェア技術に関して十分な知識を持たない者には取り扱いが難しくなる。 Cited documents 1 and 2 describe technologies that analyze deliverables such as design documents or source code to help determine programming language proficiency or analyze the skills of individuals. However, these technologies evaluate a developer's skills for a predetermined programming language. Or they use simple character analysis to pick up words and use those words as skills. The technical field of a team or developer can be identified by a combination of multiple keywords, but existing technologies cannot estimate the technical field from multiple keywords. In addition, if each word obtained from a deliverable is extracted as a skill, as with existing technologies, the number of extractions becomes enormous. This makes grouping difficult, making it difficult for people without sufficient knowledge of software technology to handle.
 本開示は、上記の問題点を解決するためになされたものであり、ソフトウェア開発の成果物から技術分野を適切に抽出してユーザに提示することを目的とする。 This disclosure has been made to solve the above problems, and aims to appropriately extract technical fields from the results of software development and present them to users.
 本開示のソフトウェア技術分野抽出装置は、ソフトウェア開発の成果物に前処理を行うことにより前処理後データを作成する前処理部と、前処理後データから技術分野の抽出を自動化する分類モデルを作成する分類モデル構築部と、分類モデルにより前処理後データから成果物の技術分野を抽出する技術分野抽出部と、技術分野抽出部が抽出した技術分野を個人または組織ごとに集計することにより、個人または組織が関係する技術分野の割合を表すスキルマップを作成するスキルマップ作成部と、出力装置にスキルマップを出力させる出力制御部と、を備える。 The software technical field extraction device disclosed herein includes a preprocessing unit that creates preprocessed data by preprocessing the software development deliverables, a classification model construction unit that creates a classification model that automates the extraction of technical fields from the preprocessed data, a technical field extraction unit that extracts the technical fields of the deliverables from the preprocessed data using the classification model, a skill map creation unit that creates a skill map that represents the proportion of technical fields to which an individual or organization is related by aggregating the technical fields extracted by the technical field extraction unit for each individual or organization, and an output control unit that causes an output device to output the skill map.
 本開示のソフトウェア技術分野抽出装置によれば、ソフトウェア開発の成果物から技術分野を適切に抽出することができる。本開示の目的、特徴、態様、および利点は、以下の詳細な説明と添付図面とによって、より明白となる。 The software technology field extraction device disclosed herein can appropriately extract technology fields from software development results. The objectives, features, aspects, and advantages of the present disclosure will become more apparent from the following detailed description and the accompanying drawings.
ソフトウェア技術分野抽出装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a software technical field extraction device. ソースコードキーワード取得規則の一例を示す図である。FIG. 13 is a diagram illustrating an example of source code keyword acquisition rules. 結合規則の一例を示す図である。FIG. 13 is a diagram illustrating an example of a combination rule. 前処理後DBの一例を示す図である。FIG. 13 is a diagram illustrating an example of a post-preprocessing DB. 分類モデル構築部の動作を示すフローチャートである。13 is a flowchart showing an operation of a classification model construction unit. 技術分野名DBの一例を示す図である。FIG. 2 is a diagram illustrating an example of a technical field name DB. 技術分野確認画面の一例を示す図である。FIG. 13 is a diagram showing an example of a technical field confirmation screen. 技術分野抽出部の動作を示すフローチャートである。13 is a flowchart showing an operation of a technical field extraction unit. 推定結果DBの一例を示す図である。FIG. 13 is a diagram illustrating an example of an estimation result DB. スキルマップ作成部の動作を示すフローチャートである。13 is a flowchart showing the operation of a skill map creation unit. 個人スキルマップDBの一例を示す図である。FIG. 2 is a diagram illustrating an example of an individual skill map DB. 組織スキルマップDBの一例を示す図である。FIG. 2 is a diagram illustrating an example of an organization skill map DB. 個人スキルマップ表示画面の一例を示す図である。FIG. 13 is a diagram illustrating an example of an individual skill map display screen. 組織スキルマップ表示画面の一例を示す図である。FIG. 13 is a diagram illustrating an example of an organization skill map display screen. 技術分野問い合わせ画面の一例を示す図である。FIG. 13 is a diagram showing an example of a technical field inquiry screen. ソフトウェア技術分野抽出装置のハードウェア構成を示す図である。FIG. 2 is a diagram illustrating a hardware configuration of a software technical field extraction device. ソフトウェア技術分野抽出装置のハードウェア構成を示す図である。FIG. 2 is a diagram illustrating a hardware configuration of a software technical field extraction device.
 <A.実施の形態1>
 図1は、ソフトウェア技術分野抽出装置101の全体構成を表す図である。
<A. First embodiment>
FIG. 1 is a diagram showing the overall configuration of a software technology field extraction device 101.
 ソフトウェア技術分野抽出装置101には、入力装置40から入力された命令を契機として、ソフトウェア開発の成果物の集合が入力される。ここで、成果物は設計文書30およびソースコード31を含む。入力装置40は、例えばパーソナルコンピュータなどの端末である。ユーザは、端末の画面を操作することにより、ソフトウェア技術分野抽出装置101に命令を入力する。ソフトウェア技術分野抽出装置101は、成果物に含まれる技術分野の抽出を自動化する分類モデルを作成し、分類モデルを用いて成果物の技術分野の抽出を行う。技術分野の抽出結果は、出力装置41によりユーザに提示される。出力装置41は、例えば表示装置である。 A set of software development deliverables is input to the software technology field extraction device 101 in response to an instruction input from the input device 40. Here, the deliverables include design documents 30 and source code 31. The input device 40 is, for example, a terminal such as a personal computer. A user inputs instructions to the software technology field extraction device 101 by operating the screen of the terminal. The software technology field extraction device 101 creates a classification model that automates the extraction of the technology fields contained in the deliverables, and extracts the technology fields of the deliverables using the classification model. The results of the extraction of the technology fields are presented to the user by the output device 41. The output device 41 is, for example, a display device.
 ソフトウェア技術分野抽出装置101は、前処理部11、分類モデル構築部18、技術分野抽出部23、スキルマップ作成部26および出力制御部32を備えて構成される。 The software technology field extraction device 101 is configured with a preprocessing unit 11, a classification model construction unit 18, a technology field extraction unit 23, a skill map creation unit 26, and an output control unit 32.
 前処理部11は、設計文書前処理部12、ソースコード前処理部13、情報結合部14および記憶部51を備える。記憶部51には、ソースコードキーワード取得規則15、結合規則16および前処理後データベース17が格納されている。前処理部11は、入力装置40から入力された命令を契機として処理を開始する。プロジェクトの設計文書30が設計文書前処理部12に入力され、ソースコード31が記載されたソースファイルがソースコード前処理部13に入力される。ソフトウェア技術分野抽出装置101は、設計文書30およびソースコード31を通信により取得してもよい。 The pre-processing unit 11 includes a design document pre-processing unit 12, a source code pre-processing unit 13, an information combination unit 14, and a memory unit 51. The memory unit 51 stores source code keyword acquisition rules 15, combination rules 16, and a post-preprocessing database 17. The pre-processing unit 11 starts processing in response to a command input from the input device 40. The design document 30 of the project is input to the design document pre-processing unit 12, and a source file containing source code 31 is input to the source code pre-processing unit 13. The software technology field extraction device 101 may acquire the design document 30 and the source code 31 through communication.
 設計文書前処理部12は、設計文書30に対して形態素解析を実行することによりキーワードを抽出し、設計文書30をキーワードの集合に変換する。ソースコード前処理部13は、ソースコード31に対してソースコードキーワード取得規則15に基づいた解析を実行することによりキーワードを抽出し、ソースコードをキーワードの集合に変換する。 The design document preprocessing unit 12 extracts keywords by performing morphological analysis on the design document 30, and converts the design document 30 into a set of keywords. The source code preprocessing unit 13 extracts keywords by performing analysis on the source code 31 based on the source code keyword acquisition rules 15, and converts the source code into a set of keywords.
 図2は、ソースコードキーワード取得規則15の一例を示している。図2のソースコードキーワード取得規則15は、ソースコード31を含むソースファイルの拡張子ごとに、キーワードを取得するための拡張正規表現を定義したものである。1列目は拡張子を指定する。2列目は規則名を指定する。3列目は拡張正規表現を指定する。4列目は取得した文字列に対し、字句解析を実施するかを指定する。5列目はその規則が有効か無効かを指定する。1行目は、拡張子が.cと.hの場合に#includeの後に続くライブラリ名をキーワードとして取得するという規則を示している。2行目は、拡張子が.cと.hの場合に、コメントの内容を取得し、その内容に形態素解析をかけてキーワードを取得するという規則を示している。3行目は、拡張子が.cの場合に関数名をキーワードとして取得するという規則を示している。 Figure 2 shows an example of source code keyword acquisition rules 15. The source code keyword acquisition rules 15 in Figure 2 define extended regular expressions for acquiring keywords for each extension of a source file that includes source code 31. The first column specifies the extension. The second column specifies the rule name. The third column specifies the extended regular expression. The fourth column specifies whether to perform lexical analysis on the acquired character string. The fifth column specifies whether the rule is enabled or disabled. The first line shows a rule that when the extension is .c or .h, the library name following #include is acquired as a keyword. The second line shows a rule that when the extension is .c or .h, the contents of the comment are acquired and morphological analysis is performed on the contents to acquire keywords. The third line shows a rule that when the extension is .c, the function name is acquired as a keyword.
 設計文書前処理部12とソースコード前処理部13が作成したデータは、情報結合部14に入力される。情報結合部14は、結合規則16に基づき、設計文書30とソースコード31とを関連付ける。 The data created by the design document preprocessing unit 12 and the source code preprocessing unit 13 is input to the information combination unit 14. The information combination unit 14 associates the design document 30 with the source code 31 based on the combination rules 16.
 図3は、結合規則16の一例を示している。図3の結合規則16は、プログラミング言語により、設計文書30とソースコード31とを関連付けする処理を記述したものである。この結合規則16は、入力された設計文書(ドキュメント)ごとに、入力されたソースコード31を検索し、一致するキーワードを有するソースコード31を関連するソースコード31と判定する、という規則である。 Figure 3 shows an example of a merge rule 16. The merge rule 16 in Figure 3 describes the process of associating a design document 30 with source code 31 in a programming language. This merge rule 16 searches the input source code 31 for each input design document (document), and determines that source code 31 with matching keywords is related source code 31.
 情報結合部14は、設計文書30とソースコード31とを関連付けるデータを作成すると、このデータを前処理後データとして前処理後データベース(DB)17に保存する。 When the information integration unit 14 creates data that associates the design document 30 with the source code 31, it stores this data as preprocessed data in the preprocessed database (DB) 17.
 図4は、前処理後DB17の一例を示している。図4の前処理後DB17にはリレーショナルDBが使用されている。図4の前処理後DB17はテーブル401,402,403,404からなる。テーブル401は、フィールドとして、データID、プロジェクトID,データ所有者ID,設計書名、関連ソースファイル名、およびキーワードを備える。データID=1の前処理後データは、設計書Aのキーワードの後に、XXX.cのキーワードとYYY.cのキーワードとを結合したものである。テーブル402は、フィールドとしてプロジェクトIDとプロジェクト名とを備える。テーブル403は、フィールドとしてデータ所有者ID、氏名、および所属組織IDを備える。テーブル404は、フィールドとして所属組織IDと組織名とを備える。 Figure 4 shows an example of the pre-processed DB 17. A relational DB is used for the pre-processed DB 17 in Figure 4. The pre-processed DB 17 in Figure 4 consists of tables 401, 402, 403, and 404. Table 401 has fields such as data ID, project ID, data owner ID, design document name, related source file name, and keywords. Pre-processed data for data ID = 1 is a combination of the keywords for design document A, followed by the keywords for XXX.c and YYY.c. Table 402 has fields such as project ID and project name. Table 403 has fields such as data owner ID, name, and organization ID. Table 404 has fields such as organization ID and organization name.
 図5は、分類モデル構築部18の動作を示すフローチャートである。分類モデル構築部18は、モデル構築部19、技術分野命名部20および記憶部52を備える。記憶部52には、技術分野分類モデル21および技術分野名データベース22が格納される。分類モデル構築部18は、前処理後データベース17を用いて処理を行う。分類モデル構築部18は、入力装置40から入力された命令、または前処理部11の処理完了を契機として、モデル構築部19における処理を実行する。 FIG. 5 is a flowchart showing the operation of the classification model construction unit 18. The classification model construction unit 18 includes a model construction unit 19, a technical field naming unit 20, and a memory unit 52. The memory unit 52 stores a technical field classification model 21 and a technical field name database 22. The classification model construction unit 18 performs processing using the post-preprocessing database 17. The classification model construction unit 18 executes processing in the model construction unit 19 in response to a command input from the input device 40 or the completion of processing by the preprocessing unit 11.
 ステップS101においてモデル構築部19は、前処理後データベース17の前処理後データを使用し、PLSAまたはLDAなどのトピック分析モデルのアルゴリズムを用いて、トピック数が1個からN個の技術分野分類モデルをN個構築する。 In step S101, the model construction unit 19 uses the preprocessed data from the preprocessed database 17 to construct N technology field classification models with 1 to N topics using a topic analysis model algorithm such as PLSA or LDA.
 次に、ステップS102においてモデル構築部19は、N個の技術分野分類モデルに対し、トピックモデルの性能指標(Perplexity,Coherenceなど)の評価を行い、最も評価の高い技術分野分類モデルを、技術分野抽出部23の処理に使用する技術分野分類モデル21として記憶部52に保存する。 Next, in step S102, the model construction unit 19 evaluates the topic model performance indicators (perplexity, coherence, etc.) for the N technical field classification models, and stores the most highly evaluated technical field classification model in the memory unit 52 as the technical field classification model 21 to be used in processing by the technical field extraction unit 23.
 その後、ステップS103において技術分野命名部20は、技術分野分類モデル21の各トピックを構成するキーワードの中で、最も出現頻度が高いキーワードを技術分野を表す語句として技術分野名とする。この時、あるトピックの技術分野名となるキーワードは他のトピックで出現しないものとする。技術分野命名部20は、技術分野分類モデル21の各トピックの技術分野名を記憶部52の技術分野名DB22に反映する。 Then, in step S103, the technical field naming unit 20 takes the most frequently occurring keyword among the keywords constituting each topic of the technical field classification model 21 as the technical field name, which is a phrase representing the technical field. At this time, it is assumed that a keyword that is the technical field name of a certain topic does not appear in other topics. The technical field naming unit 20 reflects the technical field names of each topic of the technical field classification model 21 in the technical field name DB 22 of the storage unit 52.
 図6は、技術分野名DB22の一例を示している。図6の例において技術分野名DB22にはリレーショナルDBが使用される。技術分野名DB22は、フィールドとして技術分野IDと技術分野名とを備える。 FIG. 6 shows an example of a technical field name DB22. In the example of FIG. 6, a relational DB is used for the technical field name DB22. The technical field name DB22 has fields for a technical field ID and a technical field name.
 ステップS103の後、ステップS104において出力制御部32は、技術分野分類モデル21の各トピックと、各トピックに対応する技術分野名と、各トピックを構成するキーワードとを二次元マップで表す技術分野確認画面を出力装置41に表示させる。 After step S103, in step S104, the output control unit 32 causes the output device 41 to display a technical field confirmation screen that shows, in a two-dimensional map, each topic in the technical field classification model 21, the technical field name corresponding to each topic, and the keywords that make up each topic.
 図7は、技術分野確認画面の一例を示している。技術分野確認画面には、技術分野1、技術分野Nといったトピックに加えて、トピックを構成するキーワードの二次元マップ702が表示されている。二次元マップ702にはトピックを構成する種々のキーワードが表示されている。二次元マップ702では、出現頻度の高いキーワードほど大きな文字サイズで表示され、出現頻度の低いキーワードほど小さな文字サイズで表示される。また、技術分野確認画面にはトピックの技術分野名703が表示されている。 Figure 7 shows an example of a technical field confirmation screen. In addition to topics such as technical field 1 and technical field N, the technical field confirmation screen displays a two-dimensional map 702 of the keywords that make up the topics. The two-dimensional map 702 displays various keywords that make up the topics. In the two-dimensional map 702, keywords that appear more frequently are displayed in larger font size, and keywords that appear less frequently are displayed in smaller font size. The technical field confirmation screen also displays the technical field name 703 of the topic.
 ユーザは、入力装置40により技術分野確認画面を操作することによって、技術分野名の修正をソフトウェア技術分野抽出装置101に入力することができる。ステップS105において、ユーザから技術分野名の修正入力があると、ステップS106において技術分野命名部20は技術分野名DB22における技術分野名を修正する。 The user can input a correction to the technical field name to the software technical field extraction device 101 by operating the technical field confirmation screen with the input device 40. When the user inputs a correction to the technical field name in step S105, the technical field naming unit 20 corrects the technical field name in the technical field name DB 22 in step S106.
 図8は、技術分野抽出部23の動作を示すフローチャートである。技術分野抽出部23は、推定部24および記憶部53を備えて構成される。記憶部53には技術分野名データベース25が格納される。 FIG. 8 is a flowchart showing the operation of the technical field extraction unit 23. The technical field extraction unit 23 is configured with an estimation unit 24 and a storage unit 53. The storage unit 53 stores the technical field name database 25.
 ステップS201において推定部24は、ユーザまたはスキルマップ作成部26が指定した前処理後DB17の前処理後データを技術分野分類モデル21に入力し、成果物が各技術分野を含む確率を計算する。 In step S201, the estimation unit 24 inputs the preprocessed data of the preprocessed DB 17 specified by the user or the skill map creation unit 26 into the technology field classification model 21, and calculates the probability that the deliverable includes each technology field.
 次に、ステップS202において推定部24は、一定の確率を超えた技術分野を成果物の技術分野と推定する。この一定の確率はユーザが設定する。 Next, in step S202, the estimation unit 24 estimates that the technology field that exceeds a certain probability is the technology field of the deliverable. This certain probability is set by the user.
 最後に、ステップS203において推定部24は、技術分野の推定結果を記憶部53の推定結果DB25に登録する。 Finally, in step S203, the estimation unit 24 registers the estimation results of the technology field in the estimation result DB 25 of the storage unit 53.
 図9は、推定結果DB25の一例を示している。図9の推定結果DB25にはリレーショナルDBが使用されている。推定結果DB25は、フィールドとしてデータID、分類結果および技術分野推定結果を備える。推定結果DB25のデータIDは前処理後DB17のデータIDと対応関係にある。分類結果には、ある前処理後データが、どの技術分野IDの技術分野を含んでいるかの確率が保存される。技術分野推定結果には、この確率が一定以上の技術分野の技術分野IDが技術分野の推定結果として保存される。 Figure 9 shows an example of the estimation result DB25. A relational DB is used for the estimation result DB25 in Figure 9. The estimation result DB25 has fields such as data ID, classification result, and technical field estimation result. The data IDs in the estimation result DB25 correspond to the data IDs in the preprocessing DB17. The classification result stores the probability that certain preprocessing data includes a technical field of a certain technical field ID. In the technical field estimation result, the technical field ID of a technical field with this probability above a certain level is stored as the estimation result of the technical field.
 図10は、スキルマップ作成部26の動作を示すフローチャートである。スキルマップ作成部26は、作成処理部27および記憶部54を備えて構成される。記憶部54には個人スキルマップDB28および組織スキルマップDB29が保存される。個人スキルマップDB28は個人ごとのスキルマップデータのデータベースであり、組織スキルマップDB29は、組織ごとのスキルマップデータのデータベースである。 FIG. 10 is a flowchart showing the operation of the skill map creation unit 26. The skill map creation unit 26 is configured with a creation processing unit 27 and a memory unit 54. The memory unit 54 stores an individual skill map DB 28 and an organization skill map DB 29. The individual skill map DB 28 is a database of skill map data for each individual, and the organization skill map DB 29 is a database of skill map data for each organization.
 作成処理部27は、入力装置40から入力された命令、もしくは前処理後DB17の更新時また技術分野分類モデル21の更新時などのスケジュールされたタイミングを契機として、技術分野抽出部23に対して、前処理後DB17の前処理後データを用いて技術分野の抽出を実行するように命令する(ステップS301)。 The creation processing unit 27 instructs the technical field extraction unit 23 to extract technical fields using the preprocessed data in the preprocessed DB 17, triggered by a command input from the input device 40 or a scheduled timing such as when the preprocessed DB 17 is updated or when the technical field classification model 21 is updated (step S301).
 次に、ステップS302において作成処理部27は、推定結果DB25の推定結果データを1つ選択する。 Next, in step S302, the creation processing unit 27 selects one piece of estimation result data from the estimation result DB 25.
 その後、ステップS303において作成処理部27は、選択した推定結果データのデータIDをキーとして、前処理後DB17から各データの所有者および所有者の組織を確認する。そして、作成処理部27は、所有者および組織ごとに抽出された技術分野IDの数をカウントし、個人スキルマップDB28および組織スキルマップDB29を更新する。 Then, in step S303, the creation processing unit 27 uses the data ID of the selected estimation result data as a key to confirm the owner and the owner's organization of each data from the preprocessing DB 17. The creation processing unit 27 then counts the number of technical field IDs extracted for each owner and organization, and updates the individual skill map DB 28 and the organizational skill map DB 29.
 図11は、個人スキルマップDB28の一例を示している。図11の個人スキルマップDB28にはリレーショナルDBが使用されている。個人スキルマップDB28は、フィールドとして技術分野IDとカウントを備えている。個人スキルマップDB28において、テーブルは個人ごとに存在する。個人ごとに、所有する成果物からの抽出回数が技術分野IDごとにカウントされる。 FIG. 11 shows an example of an individual skill map DB 28. A relational DB is used for the individual skill map DB 28 in FIG. 11. The individual skill map DB 28 has technical field ID and count as fields. In the individual skill map DB 28, a table exists for each individual. For each individual, the number of extractions from the deliverables owned is counted for each technical field ID.
 図12は、組織スキルマップDB29の一例を示している。図12の組織スキルマップDB29にはリレーショナルDBが使用されている。組織スキルマップDB29は、フィールドとして技術分野IDとカウントを備えている。組織スキルマップDB29において、テーブルは個人ごとに存在する。組織ごとに、所有する成果物からの抽出回数が技術分野IDごとにカウントされる。 FIG. 12 shows an example of an organization skill map DB 29. A relational DB is used for the organization skill map DB 29 in FIG. 12. The organization skill map DB 29 has technical field ID and count as fields. In the organization skill map DB 29, a table exists for each individual. For each organization, the number of times extractions are made from the deliverables owned is counted for each technical field ID.
 次に、ステップS304において作成処理部27は、未選択の推定結果データが推定結果DB25にあるか否かを判断する。未選択の推定結果データがあれば、作成処理部27の処理はステップS302に戻る。推定結果DB25内の全ての推定結果データについて処理が終われば、ステップS305において出力制御部32は、個人スキルマップデータおよび組織スキルマップデータを出力装置41に表示させる。 Next, in step S304, the creation processing unit 27 determines whether or not there is unselected inference result data in the inference result DB 25. If there is unselected inference result data, the process of the creation processing unit 27 returns to step S302. Once the process has been completed for all the inference result data in the inference result DB 25, in step S305 the output control unit 32 causes the output device 41 to display the individual skill map data and organizational skill map data.
 図13は、出力装置41において個人スキルマップデータが表示される個人スキルマップ表示画面の一例を示している。個人スキルマップ表示画面には、個人選択タブ1301、個人が所有する技術分野1302、個人が所有するデータ数1303、個人の関係プロジェクト1304、および個人が所有する技術分野を表す円グラフ1305が表示される。ユーザは、個人選択タブ1301からスキルマップを確認したい個人を選択することができる。個人が所有する技術分野1302では、個人が所有する技術分野が割合と共に示される。この割合は、個人スキルマップDB28におけるカウント数から算出されたものである。図13の例では、「A」のスキルマップとして、技術分野が組み込み(60%)、WEB(40%)と示され、データ数が100と示され、関係プロジェクトがA,Bと示されている。 FIG. 13 shows an example of an individual skill map display screen on which individual skill map data is displayed on the output device 41. The individual skill map display screen displays an individual selection tab 1301, technical fields owned by the individual 1302, the number of data owned by the individual 1303, related projects 1304 of the individual, and a pie chart 1305 representing the technical fields owned by the individual. The user can select an individual for whom he/she wants to check the skill map from the individual selection tab 1301. The technical fields owned by the individual 1302 show the technical fields owned by the individual along with their percentages. This percentage is calculated from the count number in the individual skill map DB 28. In the example of FIG. 13, the skill map of "A" shows the technical fields as embedded (60%) and WEB (40%), the number of data as 100, and the related projects as A and B.
 図14は、出力装置41において組織スキルマップデータが表示される組織スキルマップ表示画面の一例を示している。組織スキルマップ表示画面には、組織選択タブ1401、組織が所有する技術分野1402、組織が所有するデータ数1403、組織の関係プロジェクト1404、および組織が所有する技術分野を表す円グラフ1405が表示される。ユーザは、組織選択タブ1401からスキルマップを確認したい組織を選択することができる。組織が所有する技術分野1402では、組織が所有する技術分野が割合と共に示される。この割合は、組織スキルマップDB29におけるカウント数から算出されたものである。図14の例では、「A課」のスキルマップとして、技術分野が組み込み(60%)、WEB(40%)と示され、データ数が1000と示され、関係プロジェクトがA,B,Cと示されている。 FIG. 14 shows an example of an organization skill map display screen on which organization skill map data is displayed on the output device 41. The organization skill map display screen displays an organization selection tab 1401, technical fields owned by the organization 1402, the number of data owned by the organization 1403, related projects of the organization 1404, and a pie chart 1405 showing the technical fields owned by the organization. The user can select the organization for which he wants to check the skill map from the organization selection tab 1401. The technical fields owned by the organization 1402 are shown along with the percentages of the technical fields owned by the organization. This percentage is calculated from the count number in the organization skill map DB 29. In the example of FIG. 14, the skill map of "Section A" shows the technical fields as embedded (60%) and WEB (40%), the number of data as 1000, and the related projects as A, B, and C.
 図15は、ソフトウェア技術分野抽出装置101が出力装置41に表示させる技術分野問い合わせ画面の一例を示している。技術分野問い合わせ画面は、設計文書選択ボタン1501と、問い合わせボタン1503とを備えている。ユーザは、設計文書選択ボタン1501を押すことにより、前処理後DB17に登録された設計文書を選択する。設計文書選択ボタン1501の右側に、選択中の設計文書と、選択中の設計文書に関連する関連ソースコードが表示される。ユーザが、問い合わせボタン1503を押すと、出力制御部32は、選択中の設計文書の技術分野の抽出結果を推定結果DB25から取り出し、問い合わせボタン1503の右側に抽出結果として表示する。 Figure 15 shows an example of a technical field inquiry screen that the software technical field extraction device 101 displays on the output device 41. The technical field inquiry screen includes a design document selection button 1501 and an inquiry button 1503. The user selects a design document registered in the preprocessing DB 17 by pressing the design document selection button 1501. The selected design document and the associated source code related to the selected design document are displayed to the right of the design document selection button 1501. When the user presses the inquiry button 1503, the output control unit 32 retrieves the extraction results of the technical field of the selected design document from the estimation result DB 25, and displays them as extraction results to the right of the inquiry button 1503.
 <B.ハードウェア構成>
 上述したソフトウェア技術分野抽出装置101における、前処理部11、分類モデル構築部18、技術分野抽出部23、スキルマップ作成部26および出力制御部32は、図16に示す処理回路81により実現される。すなわち、処理回路81は、前処理部11、分類モデル構築部18、技術分野抽出部23、スキルマップ作成部26および出力制御部32(以下、前処理部11等)を備える。処理回路81には、専用のハードウェアが適用されても良いし、メモリに格納されるプログラムを実行するプロセッサが適用されても良い。プロセッサは、例えば中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、DSP(Digital Signal Processor)等である。
<B. Hardware Configuration>
The preprocessing unit 11, the classification model construction unit 18, the technical field extraction unit 23, the skill map creation unit 26, and the output control unit 32 in the above-mentioned software technical field extraction device 101 are realized by a processing circuit 81 shown in Fig. 16. That is, the processing circuit 81 includes the preprocessing unit 11, the classification model construction unit 18, the technical field extraction unit 23, the skill map creation unit 26, and the output control unit 32 (hereinafter, the preprocessing unit 11, etc.). The processing circuit 81 may be implemented by dedicated hardware or a processor that executes a program stored in a memory. The processor may be, for example, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), etc.
 処理回路81が専用のハードウェアである場合、処理回路81は、例えば、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ASIC(Application Specific Integrated Circuit)、FPGA(Field-Programmable Gate Array)、またはこれらを組み合わせたものが該当する。前処理部11等の各部の機能それぞれは、複数の処理回路81で実現されてもよいし、各部の機能をまとめて一つの処理回路で実現されてもよい。 When the processing circuit 81 is dedicated hardware, the processing circuit 81 corresponds to, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), or a combination of these. Each function of each part such as the pre-processing unit 11 may be realized by multiple processing circuits 81, or the functions of each part may be combined and realized by a single processing circuit.
 処理回路81がプロセッサである場合、前処理部11等の機能は、ソフトウェア等(ソフトウェア、ファームウェアまたはソフトウェアとファームウェア)との組み合わせにより実現される。ソフトウェア等はプログラムとして記述され、メモリに格納される。図17に示すように、処理回路81に適用されるプロセッサ82は、メモリ83に記憶されたプログラムを読み出して実行することにより、各部の機能を実現する。すなわち、ソフトウェア技術分野抽出装置101は、処理回路81により実行されるときに、前処理部11が、ソフトウェア開発の成果物に前処理を行うことにより前処理後データを作成するステップと、分類モデル構築部18が、前処理後データから技術分野の抽出を自動化する分類モデルを作成するステップと、技術分野抽出部23が、分類モデルにより前処理後データから技術分野を抽出するステップと、スキルマップ作成部26が、技術分野抽出部23が抽出した技術分野を個人または組織ごとに集計することにより、個人または組織のスキルマップを作成するステップと、が結果的に実行されることになるプログラムを格納するためのメモリ83を備える。換言すれば、このプログラムは、前処理部11等の手順または方法をコンピュータに実行させるものであるともいえる。ここで、メモリ83は、例えば、RAM(Random Access Memory)、ROM(Read Only Memory)、フラッシュメモリ、EPROM(Erasable Programmable Read Only Memory)、EEPROM(Electrically Erasable Programmable Read Only Memory)などの、不揮発性または揮発性の半導体メモリ、HDD(Hard Disk Drive)、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、DVD(Digital Versatile Disk)及びそのドライブ装置等、または、今後使用されるあらゆる記憶媒体であってもよい。 When the processing circuit 81 is a processor, the functions of the preprocessing unit 11 and the like are realized by a combination of software, etc. (software, firmware, or software and firmware). The software, etc. is written as a program and stored in a memory. As shown in FIG. 17, the processor 82 applied to the processing circuit 81 realizes the functions of each unit by reading and executing the program stored in the memory 83. That is, the software technology field extraction device 101 includes a memory 83 for storing a program that, when executed by the processing circuit 81, results in the execution of the steps of the preprocessing unit 11 performing preprocessing on the software development product to create preprocessed data, the classification model construction unit 18 creating a classification model that automates the extraction of technology fields from the preprocessed data, the technology field extraction unit 23 extracting technology fields from the preprocessed data using the classification model, and the skill map creation unit 26 creating a skill map of an individual or organization by aggregating the technology fields extracted by the technology field extraction unit 23 for each individual or organization. In other words, this program can be said to cause a computer to execute the procedure or method of the preprocessing unit 11 and the like. Here, memory 83 may be, for example, non-volatile or volatile semiconductor memory such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disk) and its drive device, or any storage medium to be used in the future.
 以上、前処理部11等の各機能が、ハードウェア及びソフトウェア等のいずれか一方で実現される構成について説明した。しかしこれに限ったものではなく、前処理部11等の一部を専用のハードウェアで実現し、別の一部をソフトウェア等で実現する構成であってもよい。例えば前処理部については専用のハードウェアとしての処理回路でその機能を実現し、それ以外についてはプロセッサ82としての処理回路81がメモリ83に格納されたプログラムを読み出して実行することによってその機能を実現することが可能である。 The above describes a configuration in which the functions of the pre-processing unit 11, etc. are realized either by hardware or software, etc. However, this is not limited to the above, and a configuration in which part of the pre-processing unit 11, etc. is realized by dedicated hardware and another part is realized by software, etc. For example, the functions of the pre-processing unit can be realized by a processing circuit as dedicated hardware, and the remaining functions can be realized by the processing circuit 81 as the processor 82 reading and executing a program stored in the memory 83.
 以上のように、処理回路は、ハードウェア、ソフトウェア等、またはこれらの組み合わせによって、上述の各機能を実現することができる。なお、記憶部51,52,53,54はメモリ83から構成されるが、それらは単一のメモリ83から構成されてもよいし、それぞれが個別のメモリから構成されてもよい。 As described above, the processing circuit can realize each of the above-mentioned functions by hardware, software, etc., or a combination of these. Note that the storage units 51, 52, 53, and 54 are composed of memory 83, but they may be composed of a single memory 83, or each may be composed of an individual memory.
 ソフトウェア技術分野抽出装置101は、ユーザが使用する端末であるユーザ端末上に構成されてもよいし、管理者が管理する端末である管理者端末上に構成されてもよい。また、ソフトウェア技術分野抽出装置101は、ユーザ端末または管理者端末と、サーバとの組み合わせたシステムとして構成されてもよい。この場合、以上で説明したソフトウェア技術分野抽出装置101の各機能または各構成要素は、システムを構築する各機器に分散して配置されてもよいし、いずれかの機器に集中して配置されてもよい。 The software technology field extraction device 101 may be configured on a user terminal, which is a terminal used by a user, or on an administrator terminal, which is a terminal managed by an administrator. The software technology field extraction device 101 may also be configured as a system that combines a user terminal or an administrator terminal with a server. In this case, each function or each component of the software technology field extraction device 101 described above may be distributed and placed on each device that makes up the system, or may be centrally placed on one of the devices.
 なお、各実施の形態を自由に組み合わせたり、各実施の形態を適宜、変形、省略したりすることが可能である。上記の説明は、すべての態様において、例示である。例示されていない無数の変形例が想定され得るものと解される。 It is possible to freely combine the various embodiments, and to modify or omit the various embodiments as appropriate. The above description is illustrative in all respects. It is understood that countless variations not illustrated can be envisioned.
 11 前処理部、12 設計文書前処理部、13 ソースコード前処理部、14 情報結合部、15 ソースコードキーワード取得規則、16 結合規則、17 前処理後データベース、18 分類モデル構築部、19 モデル構築部、20 技術分野命名部、21 技術分野分類モデル、22 技術分野名データベース、23 技術分野抽出部、24 推定部、25 技術分野名データベース、26 スキルマップ作成部、27 作成処理部、28 個人スキルマップデータベース、29 組織スキルマップデータベース、30 設計文書、31 ソースコード、32 出力制御部、40 入力装置、41 出力装置、51,52,53,54 記憶部、81 処理回路、82 プロセッサ、83 メモリ、101 ソフトウェア技術分野抽出装置。 11 Preprocessing section, 12 Design document preprocessing section, 13 Source code preprocessing section, 14 Information combination section, 15 Source code keyword acquisition rules, 16 Combination rules, 17 Post-preprocessing database, 18 Classification model construction section, 19 Model construction section, 20 Technical field naming section, 21 Technical field classification model, 22 Technical field name database, 23 Technical field extraction section, 24 Estimation section, 25 Technical field name database, 26 Skill map creation section, 27 Creation processing section, 28 Individual skill map database, 29 Organization skill map database, 30 Design document, 31 Source code, 32 Output control section, 40 Input device, 41 Output device, 51, 52, 53, 54 Memory section, 81 Processing circuit, 82 Processor, 83 Memory, 101 Software technical field extraction device.

Claims (7)

  1.  ソフトウェア開発の成果物に前処理を行うことにより前処理後データを作成する前処理部と、
     前記前処理後データから技術分野の抽出を自動化する分類モデルを作成する分類モデル構築部と、
     前記分類モデルにより前記前処理後データから前記成果物の技術分野を抽出する技術分野抽出部と、
     前記技術分野抽出部が抽出した技術分野を個人または組織ごとに集計することにより、前記個人または前記組織が関係する技術分野の割合を表すスキルマップを作成するスキルマップ作成部と、
     出力装置に前記スキルマップを出力させる出力制御部と、を備える、
    ソフトウェア技術分野抽出装置。
    a pre-processing unit that performs pre-processing on a software development deliverable to generate pre-processed data;
    a classification model construction unit that creates a classification model that automates the extraction of technical fields from the preprocessed data;
    a technical field extraction unit that extracts a technical field of the deliverable from the preprocessed data by the classification model;
    a skill map creation unit that creates a skill map representing the proportion of technical fields to which the individual or organization is related by aggregating the technical fields extracted by the technical field extraction unit for each individual or organization;
    An output control unit that causes an output device to output the skill map.
    Software technology field extraction device.
  2.  前記成果物は設計文書およびソースコードを含み、
     前記前処理部は、
     前記設計文書からキーワードを抽出する設計文書前処理部と、
     前記ソースコードからキーワードを抽出するソースコード前処理部と、
     前記設計文書から抽出されたキーワードと前記ソースコードから抽出されたキーワードとに基づき、前記設計文書と前記ソースコードとを対応づけ、対応関係にある前記設計文書と前記ソースコードのキーワードとを結合することにより前記前処理後データを作成する情報結合部とを備える、
    請求項1に記載のソフトウェア技術分野抽出装置。
    The deliverables include design documentation and source code;
    The pre-treatment unit includes:
    a design document preprocessing unit that extracts keywords from the design document;
    a source code preprocessing unit that extracts keywords from the source code;
    an information combining unit that associates the design document with the source code based on keywords extracted from the design document and keywords extracted from the source code, and creates the preprocessed data by combining keywords of the design document and the source code that are in a corresponding relationship,
    The software technology field extraction device according to claim 1 .
  3.  前記分類モデル構築部は、
     前記前処理後データから、抽出するトピック数が互いに異なる複数の分類モデル候補を作成し、前記複数の分類モデル候補から抽出するトピック数が最適な分類モデル候補を分類モデルとして決定するモデル構築部と、
     前記分類モデルが抽出する各前記トピックを構成するキーワードの出現頻度に基づき、他の前記トピックに含まれないキーワードから各前記トピックの技術分野名を決定する技術分野命名部と、を備え、
    請求項1に記載のソフトウェア技術分野抽出装置。
    The classification model construction unit
    a model construction unit that creates a plurality of classification model candidates each having a different number of topics extracted from the preprocessed data, and determines, as a classification model, a classification model candidate having an optimal number of topics extracted from the plurality of classification model candidates;
    a technical field naming unit that determines a technical field name of each of the topics from keywords that are not included in other topics based on the frequency of appearance of keywords that constitute each of the topics extracted by the classification model,
    The software technology field extraction device according to claim 1 .
  4.  前記出力制御部は、前記出力装置に前記トピックの技術分野名を出力させ、
     前記技術分野命名部は、ユーザからの入力情報に基づき、前記トピックの技術分野名を修正する、
    請求項3に記載のソフトウェア技術分野抽出装置。
    The output control unit causes the output device to output a name of the technical field of the topic;
    The technical field naming unit modifies the technical field name of the topic based on input information from a user.
    The software technology field extraction device according to claim 3.
  5.  前記技術分野抽出部は、
     前記前処理後データを前記分類モデルに入力することで、前記成果物が各前記トピックに含まれる確率を計算し、前記確率が予め定められた値を超えた前記トピックを、前記成果物の技術分野と推定する推定部を備える、
    請求項1に記載のソフトウェア技術分野抽出装置。
    The technical field extraction unit
    an estimation unit that calculates a probability that the deliverable is included in each of the topics by inputting the preprocessed data into the classification model, and estimates the topic with the probability exceeding a predetermined value as the technical field of the deliverable;
    The software technology field extraction device according to claim 1 .
  6.  前記スキルマップ作成部は、
     個人または組織が保有する前記成果物の集合ごとに、前記推定部による前記技術分野の推定結果を集計し、スキルマップを作成する作成処理部を備える、
    請求項5に記載のソフトウェア技術分野抽出装置。
    The skill map creation unit,
    a creation processing unit that aggregates the estimation results of the technology fields by the estimation unit for each set of the deliverables owned by an individual or organization and creates a skill map;
    The software technology field extraction device according to claim 5.
  7.  前処理部が、ソフトウェア開発の成果物に前処理を行うことにより前処理後データを作成し、
     分類モデル構築部が、前記前処理後データから技術分野の抽出を自動化する分類モデルを作成し、
     技術分野抽出部が、前記分類モデルにより前記前処理後データから技術分野を抽出し、
     スキルマップ作成部が、前記技術分野抽出部が抽出した技術分野を個人または組織ごとに集計することにより、前記個人または前記組織のスキルマップを作成する、
    ソフトウェア技術分野抽出方法。
    A preprocessing unit performs preprocessing on the software development deliverable to generate preprocessed data;
    A classification model construction unit creates a classification model that automates the extraction of technical fields from the preprocessed data,
    A technical field extraction unit extracts technical fields from the preprocessed data using the classification model;
    a skill map creation unit that creates a skill map of the individual or organization by aggregating the technical fields extracted by the technical field extraction unit for each individual or organization;
    Software technology field extraction methodology
PCT/JP2022/035898 2022-09-27 2022-09-27 Software technological field extraction device and software technological field extraction method WO2024069741A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/035898 WO2024069741A1 (en) 2022-09-27 2022-09-27 Software technological field extraction device and software technological field extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/035898 WO2024069741A1 (en) 2022-09-27 2022-09-27 Software technological field extraction device and software technological field extraction method

Publications (1)

Publication Number Publication Date
WO2024069741A1 true WO2024069741A1 (en) 2024-04-04

Family

ID=90476696

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/035898 WO2024069741A1 (en) 2022-09-27 2022-09-27 Software technological field extraction device and software technological field extraction method

Country Status (1)

Country Link
WO (1) WO2024069741A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902700A (en) * 2012-04-05 2013-01-30 中国人民解放军国防科学技术大学 Online-increment evolution topic model based automatic software classifying method
US20210192421A1 (en) * 2019-12-23 2021-06-24 Microsoft Technology Licensing, Llc Skill determination framework for individuals and groups
WO2021199442A1 (en) * 2020-04-03 2021-10-07 三菱電機株式会社 Information processing device, information processing method, and information processing program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902700A (en) * 2012-04-05 2013-01-30 中国人民解放军国防科学技术大学 Online-increment evolution topic model based automatic software classifying method
US20210192421A1 (en) * 2019-12-23 2021-06-24 Microsoft Technology Licensing, Llc Skill determination framework for individuals and groups
WO2021199442A1 (en) * 2020-04-03 2021-10-07 三菱電機株式会社 Information processing device, information processing method, and information processing program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUTA YAMADA: "A Topic-based Visualization Tool for Archives of Developer Activity", COMPUTER SOFTWARE, vol. 31, no. 2, 1 May 2014 (2014-05-01), pages 144 - 150, XP093153718 *

Similar Documents

Publication Publication Date Title
Chandra et al. Qualitative research using R: A systematic approach
Cady The data science handbook
CN111177569B (en) Recommendation processing method, device and equipment based on artificial intelligence
US11264023B2 (en) Using multiple modality input to feedback context for natural language understanding
CN111401066B (en) Artificial intelligence-based word classification model training method, word processing method and device
Viescas et al. SQL Queries for Mere Mortals: a hands-on guide to data manipulation in SQL
US20060179050A1 (en) Probabilistic model for record linkage
CA3033108A1 (en) Systems and methods for contextual retrieval of electronic records
CN1670733A (en) Rendering tables with natural language commands
Perkovic Introduction to computing using python: An application development focus
US11861308B2 (en) Mapping natural language utterances to operations over a knowledge graph
US20190295199A1 (en) Intelligent legal simulator
CN116541752B (en) Metadata management method, device, computer equipment and storage medium
Drass Text analysis and text-analysis software: A comparison of assumptions
CN114254129A (en) Method, device and readable storage medium for updating knowledge graph
CN112199951A (en) Event information generation method and device
McFee et al. A plan for sustainable mir evaluation
Rahmi Dewi et al. Software Requirement-Related Information Extraction from Online News using Domain Specificity for Requirements Elicitation: How the system analyst can get software requirements without constrained by time and stakeholder availability
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
Barton Talend open studio cookbook
WO2024069741A1 (en) Software technological field extraction device and software technological field extraction method
Nguyen et al. A novel approach for automatic extraction of semantic data about football transfer in sport news
CN113228004A (en) Intelligent document management in a computing system
CN114676155A (en) Code prompt information determining method, data set determining method and electronic equipment
Li Language technologies for understanding law, politics, and public policy