CN103744788B - The characteristic positioning method analyzed based on multi-source software data - Google Patents

The characteristic positioning method analyzed based on multi-source software data Download PDF

Info

Publication number
CN103744788B
CN103744788B CN201410031303.XA CN201410031303A CN103744788B CN 103744788 B CN103744788 B CN 103744788B CN 201410031303 A CN201410031303 A CN 201410031303A CN 103744788 B CN103744788 B CN 103744788B
Authority
CN
China
Prior art keywords
information
characteristic
feature location
technology
analyzed based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410031303.XA
Other languages
Chinese (zh)
Other versions
CN103744788A (en
Inventor
孙小兵
吴鹏
李云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou Gezhi Photoelectric Technology Co ltd
Original Assignee
Yangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou University filed Critical Yangzhou University
Priority to CN201410031303.XA priority Critical patent/CN103744788B/en
Publication of CN103744788A publication Critical patent/CN103744788A/en
Application granted granted Critical
Publication of CN103744788B publication Critical patent/CN103744788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of characteristic positioning method analyzed based on multi-source software data in technical field of software engineering, it is intended to solve feature location result inaccuracy, incomplete technical problem in prior art.Present invention incorporates information retrieval technique, data mining technology and dynamic analysis technology and respectively Current software system, history of evolution storehouse and execution track are carried out feature location, and the result of three kinds of technical characteristics location is carried out intersection operation draw final feature location result, achieve the feature location analyzed based on multi-source software data, there is higher accuracy, integrity and high efficiency;And three technology Maturity of the present invention is higher so that the easily operated realization of the present invention.The present invention can be used for class hierarchy, the feature location of method level, is combined into the realities such as this analysis, selects corresponding granularity level to carry out feature location, provides Selection Framework flexibly for actual many granularities level feature location.

Description

The characteristic positioning method analyzed based on multi-source software data
Technical field
The present invention relates to a kind of characteristic positioning method, particularly to a kind of characteristic positioning method analyzed based on multi-source software data, belong to technical field of software engineering.
Background technology
Along with software dependence is grown with each passing day by information-intensive society, user is more and more higher to the requirement of existing software system, more and more, therefore it is accomplished by constantly software system being upgraded and safeguarding, and these upgradings and the amendment request safeguarded are also commonly referred to as certain feature.In software system, a feature can represent a kind of function, and this function is that the requirement according to developer and user defines with acceptable degree.Software maintenance and evolution potentially include various amendment activity, as increased new function, improvement existing capability and patching bugs.Determine that known specific function position in source code is called feature location.The process of feature location is: determine start node;Select next node to be accessed;Access this node;Judge that this node is the most relevant to the feature investigated;Check whether that oneself is through having obtained all relevant nodes.Implementing certain amendment request in Current software, first have to accurately find the start node of amendment request, if this feature position cannot be found, the most whole amendment process cannot smoothly complete.
Current feature location research mainly includes characteristic positioning method based on Program Static Structure and characteristic positioning method based on program dynamic profile.Both approaches only carries out feature location by static analysis target program with by dynamically analyzing object code, cannot monitor some important characteristic informations such as the history amendment information of target program, cause the accuracy of feature location and comprehensive reduction.It addition, software data not only includes static information and multidate information, also include the procedural information of Software Evolution, if only using the information of a certain type therein to may result in the inaccuracy of feature location result and imperfect.
Having a kind of Java platform to debug architecture in prior art, English full name is: Java Platform Debugger Architecture, is abbreviated as JPDA.JPDA be virtual machine a whole set of for debugging instrument and interface, the interface provided by JPDA and agreement, debugger developer just can extend customization Java debugging utility according to the demand of specific development person, develop the debugging acid attracting developer to use.JPDA is mainly made up of three parts: 1, Java Virtual Machine tool interface (JVMTI): the service that definition virtual machine (VM) must provide for when debugging, including Debugging message (such as stack information), debugging behavior (as client arranges a breakpoint) and notice (notice client during as arrived certain breakpoint);2, Java debugging wire protocol (JDWP): be defined between debugging process and debugger front end information and the form of request of transmission;3, Java debugging interface (JDI): define the debugging interface that tuner can use, interacts with long-range debugging services to facilitate.
A kind of testing tool and performance tool platform in prior art, English full name is: Test and Performance Tools Platform, is abbreviated as TPTP.TPTP is a top project of Eclipse foundation, provide increase income test and the performance tool of a set of complete function, cover whole test and performance life cycle, from test in early days to the supervision of production application program, write including test and perform, monitoring, following the tracks of and analyze and log analysis characteristic.
Summary of the invention
It is an object of the invention to provide a kind of characteristic positioning method analyzed based on multi-source software data, solve in prior art only characteristic information to single type and be analyzed excavating, cause feature location result inaccuracy, incomplete technical problem.
The object of the present invention is achieved like this: the characteristic positioning method analyzed based on multi-source software data, comprises the following steps:
Step one: retrieved Current software system by information retrieval technique: program code relevant to described current amendment request in the source code of inquiry Current software system, is designated as characteristic information a by described program code;
Step 2: excavate Historical Evolution information by data mining technology: history relevant to described current amendment request in inquiry history of evolution storehouse revises request, amendment element in relevant history amendment request is carried out union operation, and the program code of output is designated as characteristic information b;
Step 3: perform track by dynamic analysis technology analysis, described execution track includes labelling and performs information and complete execution information, with labelling, the information that completely performs is performed information and carries out subtraction, and output result is designated as execution information undetermined;Then execution information undetermined being carried out static analysis, obtain information collection relevant to described current amendment request in execution information undetermined, perform described information collection, labelling information and carry out union operation, the program code of output is designated as characteristic information c;
Step 4: carrying out tri-kinds of characteristic informations of a, b, c occuring simultaneously calculates, output characteristic positioning result m.
The searching step of described information retrieval technique is as follows:
A) corpus is set up: define file granularity and set up the corpus of described file granularity level;
B) natural language processing: utilizing natural language processing technique that described corpus is implemented pretreatment, described pretreatment includes: delete source code operator and programming language keyword, isolating identifier and compound phrase, cutting stem are root;
C) index corpus: comprise the source code of the keyword of described current amendment request in retrieval corpus.
Described labelling performs the collection of information and uses JPDA technology, and the collection of described complete execution information uses TPTP technology.
Compared with prior art, the invention has the beneficial effects as follows: 1, combine information retrieval technique, data mining technology and dynamic analysis technology and respectively Current software system, history of evolution storehouse and execution track have been carried out feature location, achieve the feature location analyzed based on multi-source software data, relative to the method only information of single type being carried out feature location in prior art, inventive feature positioning result has higher accuracy, integrity and high efficiency;The Maturity of 2, information retrieval of the present invention, data mining and dynamic analysis three technology is higher so that the easily operated realization of the present invention;3, the present invention can be used for class hierarchy, the feature location of method level, is combined into the realities such as this analysis, selects corresponding granularity level to carry out feature location, provides Selection Framework flexibly for actual many granularities level feature location.
Accompanying drawing explanation
Fig. 1 is the flow chart of the present invention.
Fig. 2 is information retrieval technique operation principle block diagram of the present invention.
Fig. 3 is data mining technology operation principle block diagram of the present invention.
Fig. 4 is dynamic analysis technology operation principle block diagram of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawings the present invention is described in further detail.
As it is shown in figure 1, the characteristic positioning method analyzed based on multi-source software data, comprise the following steps:
Building and analyze feature location model based on multi-source software data: setting four-tuple m, a, b, c, wherein, m is final feature location result, a, b, c are the characteristic information extracted from three kinds of different pieces of information sources respectively.
The localization method of characteristic information a is as follows: retrieved Current software system by information retrieval technique, program code relevant to current amendment request in the source code of inquiry Current software system, and program code is designated as characteristic information a.
As in figure 2 it is shown, be information retrieval technique operation principle block diagram of the present invention.The concrete searching step of information retrieval technique is as follows:
A) corpus is set up: define file granularity and set up the corpus of file granularity level.The file granularity of definition can be bag, class, method;
B) natural language processing: utilize natural language processing technique that corpus is implemented pretreatment, pretreatment includes: delete source code operator and programming language keyword, isolating identifier and compound phrase, such as: " impactAnalysis " is separated into " impact " and " Analysis ";Cutting stem is root, such as: " impacted " is cut to " impact ";
C) index corpus: comprise the source code of the keyword of current amendment request, if this source code is: e1, e2, e4, e6, e8, e10, then characteristic information a={ e1, e2, e4, e6, e8, e10} in retrieval corpus.
The localization method of characteristic information b is as follows: excavate Historical Evolution information by data mining technology.It is illustrated in figure 3 data mining technology operation principle block diagram of the present invention.First from history of evolution storehouse, extract history amendment request and to carry out information integrated to history amendment request, it is assumed that after information is integrated, history amendment request and corresponding amendment element are as shown in table 1:
Then, history amendment request relevant to current amendment request in inquiry history of evolution storehouse, i.e. history amendment request, current amendment request are carried out similarity matrix computing;Finally, the amendment element in relevant history amendment request being carried out union operation, the program code of output is designated as characteristic information b.If history amendment request c2 with c5 is relevant to current amendment request, then characteristic information b=c2 ∪ c5={ e2, e3, e5, e7, e4, e12}.
The localization method of characteristic information c is as follows: performs track by dynamic analysis technology analysis, is illustrated in figure 4 dynamic analysis technology operation principle block diagram of the present invention.First, using JPDA technology to collect labelling and perform information, use TPTP technology to collect and completely perform information, with labelling, the information that completely performs is performed information and carries out subtraction, output result is designated as execution information undetermined;Then execution information undetermined being carried out static analysis, obtain information collection relevant to current amendment request in execution information undetermined, perform information collection, labelling information and carry out union operation, the program code of output is designated as characteristic information c.Assuming that labelling performs information is g1={ e1, e3, e4}, complete execution information is g2={ e1, e2, e3, e4, e5, e6, e7, e8 }, execution information g3=g2-g1={ e2, e5, e6, e7, e8} the most undetermined, g3 is carried out static analysis, obtain information collection g4 relevant to current amendment request in g3, if g4={ is e2, e5}, g1 and g4 carried out union operation and obtains characteristic information c, then characteristic information c=g1 ∪ g4={e1, e2, e3, e4, e5}.
The computational methods of feature location result m are as follows: carrying out tri-kinds of characteristic informations of a, b, c occuring simultaneously calculates, output characteristic positioning result m, i.e. m=a ∩ b ∩ c={ e1, e2, e4, e6, e8, e10} ∩ { e2, e3, e5, e7, e4, e12} ∩ { e1, e2, e3, e4, e5}={ e2, e4}.
The invention is not limited in above-described embodiment, such as: characteristic information a, characteristic information b, the positioning action step of characteristic information c three do not exist sequencing, can overturn mutually.On the basis of technical scheme disclosed by the invention; those skilled in the art is according to disclosed technology contents; need not performing creative labour and some of which technical characteristic just can be made some replacements and deformation, these are replaced and deformation is the most within the scope of the present invention.

Claims (3)

1. the characteristic positioning method analyzed based on multi-source software data, it is characterised in that comprise the following steps:
Step one: retrieved Current software system by information retrieval technique: program code relevant to current amendment request in the source code of inquiry Current software system, is designated as characteristic information a by described program code;
Step 2: excavate Historical Evolution information by data mining technology: ask relevant history amendment request in inquiry history of evolution storehouse to current amendment, amendment element in relevant history amendment request is carried out union operation, and the program code of output is designated as characteristic information b;
Step 3: perform track by dynamic analysis technology analysis, described execution track includes labelling and performs information and complete execution information, with labelling, the information that completely performs is performed information and carries out subtraction, and output result is designated as execution information undetermined;Then execution information undetermined being carried out static analysis, obtain information collection relevant to described current amendment request in execution information undetermined, perform described information collection, labelling information and carry out union operation, the program code of output is designated as characteristic information c;
Step 4: carrying out tri-kinds of characteristic informations of a, b, c occuring simultaneously calculates, output characteristic positioning result m.
The characteristic positioning method analyzed based on multi-source software data the most according to claim 1, it is characterised in that the searching step of described information retrieval technique is as follows:
A) corpus is set up: define file granularity and set up the corpus of described file granularity level;The file granularity of definition is bag, class or method;
B) natural language processing: utilizing natural language processing technique that described corpus is implemented pretreatment, described pretreatment includes: delete source code operator, deletion programming language keyword, isolating identifier, separation is combined phrase and cutting stem is root;
C) index corpus: comprise the source code of the keyword of described current amendment request in retrieval corpus.
The characteristic positioning method analyzed based on multi-source software data the most according to claim 1, it is characterised in that described labelling performs the collection of information and uses JPDA technology, and the collection of described complete execution information uses TPTP technology.
CN201410031303.XA 2014-01-22 2014-01-22 The characteristic positioning method analyzed based on multi-source software data Active CN103744788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410031303.XA CN103744788B (en) 2014-01-22 2014-01-22 The characteristic positioning method analyzed based on multi-source software data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410031303.XA CN103744788B (en) 2014-01-22 2014-01-22 The characteristic positioning method analyzed based on multi-source software data

Publications (2)

Publication Number Publication Date
CN103744788A CN103744788A (en) 2014-04-23
CN103744788B true CN103744788B (en) 2016-08-31

Family

ID=50501807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410031303.XA Active CN103744788B (en) 2014-01-22 2014-01-22 The characteristic positioning method analyzed based on multi-source software data

Country Status (1)

Country Link
CN (1) CN103744788B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106663003A (en) * 2014-06-13 2017-05-10 查尔斯斯塔克德拉珀实验室公司 Systems and methods for software analysis
CN105930162B (en) * 2016-04-24 2019-05-03 复旦大学 A kind of characteristic positioning method based on subgraph search
CN107491299B (en) * 2017-07-04 2021-09-10 扬州大学 Multi-source software development data fusion-oriented developer portrait modeling method
CN107832781B (en) * 2017-10-18 2021-09-14 扬州大学 Multi-source data-oriented software defect representation learning method
CN112699253B (en) * 2019-10-23 2024-10-01 广州彩熠灯光股份有限公司 Source code positioning method, system, medium and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102508767A (en) * 2011-09-30 2012-06-20 东南大学 Software maintenance method based on formal concept analysis
CN103136332A (en) * 2013-01-28 2013-06-05 福州新锐同创电子科技有限公司 Method for achieving making, management and retrieval of knowledge points

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070260628A1 (en) * 2006-05-02 2007-11-08 Tele Atlas North America, Inc. System and method for providing a virtual database environment and generating digital map information

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102508767A (en) * 2011-09-30 2012-06-20 东南大学 Software maintenance method based on formal concept analysis
CN103136332A (en) * 2013-01-28 2013-06-05 福州新锐同创电子科技有限公司 Method for achieving making, management and retrieval of knowledge points

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
软件演化过程中运行实例的在线可信演化;杨帅等;《计算机应用研究》;20131130;全文 *

Also Published As

Publication number Publication date
CN103744788A (en) 2014-04-23

Similar Documents

Publication Publication Date Title
CN109739755B (en) Fuzzy test system based on program tracking and mixed execution
CN103744788B (en) The characteristic positioning method analyzed based on multi-source software data
CN103092761B (en) Method and device of recognizing and checking modifying code blocks based on difference information file
Saha et al. Evaluating code clone genealogies at release level: An empirical study
CN107016018B (en) Database index creation method and device
CN110147235B (en) Semantic comparison method and device between source code and binary code
CN102789413A (en) System and method for debugging parallel program
CN106843840A (en) A kind of version evolving annotation multiplexing method of source code based on similarity analysis
CN111538731A (en) Industrial data automatic generation report system
CN113609008B (en) Test result analysis method and device and electronic equipment
CN112068981B (en) Knowledge base-based fault scanning recovery method and system in Linux operating system
CN104572474A (en) Dynamic slicing based lightweight error locating implementation method
KR102099069B1 (en) Hybrid ERD Management System, and method thereof
CN106294136B (en) The online test method and system of performance change between the concurrent program runtime
CN102298552A (en) Method for performing source code instrumentation on the basis of code inquiry
CN115905031A (en) Test case recommendation method based on accurate quality assurance system
CN103176905A (en) Defect association method and device
US7844601B2 (en) Quality of service feedback for technology-neutral data reporting
CN111913874B (en) Software defect tracing method based on syntactic structure change analysis
Hattori et al. Mining software repositories for software change impact analysis: a case study
Kauhanen et al. Regression test selection tool for python in continuous integration process
Agrawal et al. Ruffle: Extracting co-change information from software project repositories
CN106873956B (en) Code completion method and device based on continuous keywords
CN115640438A (en) Visual custom crawler execution method and system
CN104199649A (en) Path profiling method for interaction information between parent process and child process

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210528

Address after: Yangzhou xiaonaxiong robot Co., Ltd., 3 / F, building 9, 20 Hongyang Road, Yangzhou Economic Development Zone, Jiangsu Province, 225000

Patentee after: Yangzhou xiaonaxiong robot Co.,Ltd.

Address before: 225009 No. 88, South University Road, Jiangsu, Yangzhou

Patentee before: YANGZHOU University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210709

Address after: Yangzhou Gezhi Optoelectronic Technology Co., Ltd., 2601, 217 Kaifa West Road, high tech Industrial Development Zone, Yangzhou City, Jiangsu Province, 225000

Patentee after: Yangzhou Gezhi Photoelectric Technology Co.,Ltd.

Address before: Yangzhou xiaonaxiong robot Co., Ltd., 3 / F, building 9, 20 Hongyang Road, Yangzhou Economic Development Zone, Jiangsu Province, 225000

Patentee before: Yangzhou xiaonaxiong robot Co.,Ltd.