CN103559025A - Software refactoring method through clustering - Google Patents
Software refactoring method through clustering Download PDFInfo
- Publication number
- CN103559025A CN103559025A CN201310495785.XA CN201310495785A CN103559025A CN 103559025 A CN103559025 A CN 103559025A CN 201310495785 A CN201310495785 A CN 201310495785A CN 103559025 A CN103559025 A CN 103559025A
- Authority
- CN
- China
- Prior art keywords
- source code
- entity
- program entity
- correlation coefficient
- software
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Stored Programmes (AREA)
Abstract
The invention discloses a software refactoring method through clustering and belongs to the technical field of software engineering. The software refactoring method through clustering is characterized by comprising the following steps of inputting source code information into a source code parser; parsing the source code information and extracting program entities and related attributes of the program entities; calling a filter, screening out redundant information in the source code information, and utilizing a field rule base to establish a system fact base; determining correlation coefficients among the entities through similarity calculation; automatically decomposing a core business concern module based on functions through a directed graph cluster analysis method; verifying the correctness of the refactored system and adjusting the field rule base according to the verified results. According to the software refactoring method through clustering, a large complex software system is automatically decomposed into smaller and more manageable subsystems, and the system is easy to comprehend and maintain; meanwhile, by modifying the correlation coefficients of the attributes of the field rule base, the software refactoring method through clustering is applicable to different application fields, thereby having good universality.
Description
Technical field
The present invention relates to a kind of method that adopts cluster mode to carry out software reconfiguration, belong to technical field of software engineering.
Background technology
Reconstruct is the process of program conversion, this process improvement the realization of software requirement, and keep program behavior constant.Software, in its life cycle, inevitably changes, and these variations may be the changes of user's request, also may be in order to correct the mistake of software itself.In order to reduce the maintenance cost of software, extend its serviceable life, software maintenance personnel often face the problem of software reconfiguration.But along with scale and the complexity of software systems increases, influencing each other between each functional module of system becomes more complicated.Particularly those lack the Legacy System of document, how to be reconstructed, and are current software maintenance urgent problems.
In order to address this problem, China's patent of invention, the patent No. is 200810163396.6 " the code level component assembly method based on grammer reconstruct ", it is using member as having on relatively independent function and reusable software module basis, code level component assembly method based on grammer reconstruct is disclosed in a kind of technical field of software engineering.The method is isolated abstract syntax and concrete syntax from the syntax gauge of program language, and sets up the code level member that meets described new syntax standard, then carries out Components Composition, has independently advantage of language while making software repeated usage.But this reconstruct is code level, lack the improvement to program structure.Particularly, have a large amount of core business focus based on function in system, its logical relation has dividing of power, and these core business focus may be distributed in different modules.For example, in program, needing that the stock in manufacturing is controlled to function modifies, and this part function intersperses among a plurality of modules such as material inventory administration module, finished goods inventory administration module and material supply module, according to above-mentioned patented method, the modification of code will relate to this plurality of modules, and this will increase the cost of modification and the probability of makeing mistakes undoubtedly.Just because of core business focus is crossed over a plurality of modules possibly, be both and must be convenient to revise, safeguard and upgrade, software maintenance personnel are when carrying out software reconfiguration, answer dependence and coupled characteristic between routine analyzer core business focus, simple based on grammer reconfiguration code level member, the system after reconstruct cannot obtain essence and improve.
Summary of the invention
The present invention puts forward for the problems referred to above, and object is to provide a kind of method that adopts cluster mode to carry out software reconfiguration, by large-scale, complicated software systems automatic classifying Cheng Geng little, more manageable subsystem.
For achieving the above object, the technical scheme of technical solution problem of the present invention is:
(1) input source code file, and by source code information analysis, according to program syntax rule, intactly representation program semantic information;
(2) structure filtrator, screen out unnecessary information, according to Program Semantics information determining system program entity and association attributes thereof, and according to the dependence between program entity attribute and coupled characteristic, the correlation coefficient of given each association attributes in domain-planning storehouse, generates factbase;
(3) similarity is calculated, and has a plurality of association attributeses between program entity, calculates the correlation coefficient between determine procedures entity by similarity;
(4) set up cluster, by digraph clustering methodology, by program entity cluster to similar in system or that the degree of correlation is high bunch, each bunch forms a new module;
(5) result is visual, by the result obtaining after cluster analysis, with the form of easily understanding and use, offers system maintenance personnel to complete software reconfiguration.
The present invention compared with prior art has following advantageous effect:
(1) adopt cluster mode to carry out software reconfiguration, the target of its improvement is the core business focus based on function, and the module of system is rebuild to post code will have good reusability;
(2) after cluster analysis, the similar or degree of correlation of each module internal program entity is higher, solves code and disperses and chaotic problem, system easy to understand and maintenance;
(3) different application, the same attribute of program entity has different linked characters, can be by revising the correlation coefficient of attribute in domain-planning storehouse, the cluster result producing is like this also by difference, and this makes this patent have better versatility.
Accompanying drawing explanation
Fig. 1 adopts cluster mode to carry out software reconfiguration process schematic diagram.。
Fig. 2 source code resolver structural representation.
Fig. 3 digraph clustering methodology example schematic A---digraph.
Fig. 4 digraph clustering methodology example schematic B---dendrogram.
Embodiment
Referring to accompanying drawing of the present invention, also in conjunction with specific embodiments the present invention is further elaborated, but protection scope of the present invention is not limited by specific embodiment, with claims, is as the criterion.In addition, under the prerequisite without prejudice to the present invention program, within any change that those of ordinary skills made for the present invention easily realize or change all will fall into claim scope of the present invention.
Referring to accompanying drawing 1, the present invention includes following steps:
The first step, referring to accompanying drawing 2, calls source code resolver, and source code is resolved and filtered, and sets up factbase.The detailed process of this step is described below:
(1) referring to accompanying drawing 2, source code file is scanned, and by source code input information source code resolver;
(2) referring to accompanying drawing 2, source code information is resolved, extract program entity and association attributes thereof in code information.Detailed process is: source code is carried out to syntax parsing; Extract the syntax tree of code information; Syntax tree is carried out to semanteme resolves; Obtain program entity and association attributes thereof in code information.Program entity comprises: class, function, operation flow; Entity attribute comprises: bag, file, function, database, test case etc.;
(3) referring to accompanying drawing 2, call filtrator, screen out information unnecessary in source code information, in conjunction with the correlation coefficient of each given attribute of domain-planning storehouse, set up factbase.
The foundation in syntax rule storehouse.For syntax analyzer provides syntax rule, this syntax analyzer can be translated into the context-free grammar of certain programmed language the syntax tree of this programming language.
The foundation in domain-planning storehouse.Different applications, the same attribute of program entity has different linked characters.With reference to field factor, according to dependence, the coupled characteristic between native system program entity attribute, the correlation coefficient of given each association attributes of native system.The attribute associated with core business focus wherein, its coefficient value is higher, to guarantee that core business focus obtains higher aggregation.Domain knowledge is the description collection of this field function, and the description of each function is comprised: program entity numbering, affiliated field, version number, functional description, business object, backup, the association attributes having and correlation coefficient.
The foundation of factbase, under domain-planning to obtaining after source code information filtering.The core business focus that is this system is described collection, the description of each program entity is comprised: program entity numbering, interface name, core business focus, input parameter, output parameter, rreturn value, program entity supplier, version number, key word, the association attributes having and correlation coefficient.
Second step, similarity is calculated.Between program entity, there are a plurality of association attributeses, according to formula 1, carry out similarity calculating, the correlation coefficient between determine procedures entity;
Wherein,
x, yrepresentation program entity,
drepresent number of attributes;
s (x, y)correlation coefficient between representation program entity x, y,
s(x k , y k )for program entity
x, y kthe correlation coefficient of individual attribute;
w(x k , y k )get
0or
1, representation program entity
x, y kwhether individual attribute is relevant.
The 3rd step, cluster analysis.According to the program entity dependence in factbase, set up digraph, then according to the similarity result of calculation of entity, carry out cluster analysis.The detailed process of this step is described below:
(1) set up digraph.Referring to accompanying drawing 3, be example, suppose to exist in factbase 10 program entity (numbering: 1---10), set up digraph.The solid line with arrow in this figure, represents that 2 entities have dependence.As shown in Figure 3, we can say that entity 2 relies on entity 1.
(2) digraph cluster analysis.Referring to accompanying drawing 4, it is example, according to the similarity of entity, carry out cluster analysis, obtain 2 bunches 1,2,3}, 4,6,10}, in this figure, solid line represents to have the higher similar or degree of correlation (correlation coefficient value is high), dotted line represents the lower similar or degree of correlation (correlation coefficient value is low).Wherein, the similar or degree of correlation of entity 4 and entity 10 is higher, and this is because they have quoted entity 7 jointly; And similar or the degree of correlation is lower between entity 5 and entity 6, non-core services focus when this may their common child node 8.
The 4th step, reconstructed module.By in program entity cluster to similar in system or that the degree of correlation is high bunch, each bunch forms a new module.These modules are offered to system maintenance personnel to complete software reconfiguration with the form of easily understanding and use;
The 5th step, verifying correctness.System after reconstruct is submitted to user or domain expert, opinion collection, and carry out Completeness, consistency check and nonredundancy check.
The 6th step, adjusts domain-planning storehouse.According to collecting assay or suggestion, adjust the correlation coefficient of Zhong, this area, domain-planning storehouse entity association attributes.Re-start again system reconfiguration.
Complete above step, can realize software reconfiguration, software systems large-scale, complexity go out automatic classifying the module of the core business focus based on function, and system architecture is manageability more.
Claims (4)
1. adopt cluster mode to carry out a method for software reconfiguration, it is characterized in that: described method contains successively following steps and is:
Step 1, calls source code resolver, and source code is resolved and filtered, and sets up factbase;
the detailed process of this step is described below:
(1) source code file is scanned, and by source code input information source code resolver;
(2) source code information is resolved, extract program entity and association attributes thereof in code information;
Detailed process is: source code is carried out to syntax parsing; Extract the syntax tree of code information; Syntax tree is carried out to semanteme resolves; Obtain program entity and association attributes thereof in code information;
Program entity comprises: class, function, operation flow; Entity attribute comprises: bag, file, function, database, test case etc.;
(3) call filtrator, screen out information unnecessary in source code information, in conjunction with the correlation coefficient of each given attribute of domain-planning storehouse, set up factbase;
Step 2, similarity is calculated;
By the association attributes existing between program entity is carried out to similarity calculating, the correlation coefficient between determine procedures entity;
Step 3, cluster analysis;
According to the program entity dependence in factbase, set up digraph, then according to the similarity result of calculation of entity, carry out cluster analysis;
The detailed process of this step is described below:
(1) set up digraph;
The solid line with arrow in this figure, represents that 2 entities have dependence;
(2) digraph cluster analysis;
According to the similarity of entity, carry out cluster analysis, in figure, solid line represents to have the higher similar or degree of correlation (correlation coefficient value is high), and dotted line represents the lower similar or degree of correlation (correlation coefficient value is low);
The node clustering that solid line is connected is bunch;
Step 4, reconstructed module;
By in program entity cluster to similar in system or that the degree of correlation is high bunch, each bunch forms a new module;
These modules are offered to system maintenance personnel to complete software reconfiguration with the form of easily understanding and use;
Step 5, verifying correctness;
System after reconstruct is submitted to user or domain expert, opinion collection, and carry out Completeness, consistency check and nonredundancy check;
Step 6, adjusts domain-planning storehouse;
According to collecting assay or suggestion, adjust the correlation coefficient of Zhong, this area, domain-planning storehouse entity association attributes;
Re-start again system reconfiguration.
2. a kind of method that adopts cluster mode to carry out software reconfiguration according to claim 1, it is characterized in that: by the knowledge of grammar, set up syntax rule storehouse, for syntax analyzer provides syntax rule, this syntax analyzer can be translated into the context-free grammar of certain programmed language the syntax tree of this programming language.
3. a kind of method that adopts cluster mode to carry out software reconfiguration according to claim 1, is characterized in that: by domain knowledge, set up domain-planning storehouse, and different applications, the same attribute of program entity has different linked characters;
With reference to field factor, according to dependence, the coupled characteristic between native system program entity attribute, the correlation coefficient of given each association attributes of native system;
Domain knowledge is the description collection of this field function, and the description of each function is comprised: program entity numbering, affiliated field, version number, functional description, business object, backup, the association attributes having and correlation coefficient.
4. a kind of method that adopts cluster mode to carry out software reconfiguration according to claim 1, it is characterized in that: under domain-planning to obtaining factbase after source code information filtering, a kind of method that adopts cluster mode to carry out software reconfiguration described in claim 1, the core business focus that is this system is described collection, the description of each program entity is comprised: program entity numbering, interface name, core business focus, input parameter, output parameter, rreturn value, program entity supplier, version number, key word, the association attributes having and correlation coefficient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310495785.XA CN103559025B (en) | 2013-10-21 | 2013-10-21 | Software refactoring method through clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310495785.XA CN103559025B (en) | 2013-10-21 | 2013-10-21 | Software refactoring method through clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103559025A true CN103559025A (en) | 2014-02-05 |
CN103559025B CN103559025B (en) | 2017-01-25 |
Family
ID=50013281
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310495785.XA Expired - Fee Related CN103559025B (en) | 2013-10-21 | 2013-10-21 | Software refactoring method through clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103559025B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593182A (en) * | 2013-10-27 | 2014-02-19 | 沈阳建筑大学 | Method for reconfiguring software by using clustering mode |
CN104391964A (en) * | 2014-12-01 | 2015-03-04 | 南京大学 | Method for storing source codes into graph database |
CN107678968A (en) * | 2017-10-18 | 2018-02-09 | 北京奇虎科技有限公司 | Sample extraction method, apparatus, computing device and the storage medium of source code function |
CN109165155A (en) * | 2018-06-20 | 2019-01-08 | 扬州大学 | A kind of software defect recovery template extracting method based on clustering |
CN110659063A (en) * | 2019-08-08 | 2020-01-07 | 平安科技(深圳)有限公司 | Software project reconstruction method and device, computer device and storage medium |
CN111475158A (en) * | 2020-03-16 | 2020-07-31 | 咪咕文化科技有限公司 | Sub-domain dividing method and device, electronic equipment and computer readable storage medium |
CN113190269A (en) * | 2021-04-16 | 2021-07-30 | 南京航空航天大学 | Code reconstruction method based on programming context information |
CN113238796A (en) * | 2021-05-17 | 2021-08-10 | 北京京东振世信息技术有限公司 | Code reconstruction method, device, equipment and storage medium |
CN113504972A (en) * | 2021-07-26 | 2021-10-15 | 京东科技控股股份有限公司 | Service deployment method and device, electronic equipment and storage medium |
US11269625B1 (en) | 2020-10-20 | 2022-03-08 | International Business Machines Corporation | Method and system to identify and prioritize re-factoring to improve micro-service identification |
CN114237774A (en) * | 2022-02-14 | 2022-03-25 | 北京安盟信息技术股份有限公司 | Internal calling method for removing dependence of functional module |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090276757A1 (en) * | 2008-04-30 | 2009-11-05 | Fraunhofer Usa, Inc. | Systems and methods for inference and management of software code architectures |
CN103235877A (en) * | 2013-04-12 | 2013-08-07 | 北京工业大学 | Robot control software module partitioning method |
-
2013
- 2013-10-21 CN CN201310495785.XA patent/CN103559025B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090276757A1 (en) * | 2008-04-30 | 2009-11-05 | Fraunhofer Usa, Inc. | Systems and methods for inference and management of software code architectures |
CN103235877A (en) * | 2013-04-12 | 2013-08-07 | 北京工业大学 | Robot control software module partitioning method |
Non-Patent Citations (1)
Title |
---|
方晨 等: "主成分分析和聚类分析在软件重构中的应用", 《计算机工程与设计》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593182A (en) * | 2013-10-27 | 2014-02-19 | 沈阳建筑大学 | Method for reconfiguring software by using clustering mode |
CN104391964A (en) * | 2014-12-01 | 2015-03-04 | 南京大学 | Method for storing source codes into graph database |
CN107678968A (en) * | 2017-10-18 | 2018-02-09 | 北京奇虎科技有限公司 | Sample extraction method, apparatus, computing device and the storage medium of source code function |
CN109165155A (en) * | 2018-06-20 | 2019-01-08 | 扬州大学 | A kind of software defect recovery template extracting method based on clustering |
CN109165155B (en) * | 2018-06-20 | 2021-06-22 | 扬州大学 | Software defect repairing template extraction method based on cluster analysis |
CN110659063A (en) * | 2019-08-08 | 2020-01-07 | 平安科技(深圳)有限公司 | Software project reconstruction method and device, computer device and storage medium |
CN111475158A (en) * | 2020-03-16 | 2020-07-31 | 咪咕文化科技有限公司 | Sub-domain dividing method and device, electronic equipment and computer readable storage medium |
US11269625B1 (en) | 2020-10-20 | 2022-03-08 | International Business Machines Corporation | Method and system to identify and prioritize re-factoring to improve micro-service identification |
CN113190269A (en) * | 2021-04-16 | 2021-07-30 | 南京航空航天大学 | Code reconstruction method based on programming context information |
CN113238796A (en) * | 2021-05-17 | 2021-08-10 | 北京京东振世信息技术有限公司 | Code reconstruction method, device, equipment and storage medium |
CN113504972A (en) * | 2021-07-26 | 2021-10-15 | 京东科技控股股份有限公司 | Service deployment method and device, electronic equipment and storage medium |
CN114237774A (en) * | 2022-02-14 | 2022-03-25 | 北京安盟信息技术股份有限公司 | Internal calling method for removing dependence of functional module |
Also Published As
Publication number | Publication date |
---|---|
CN103559025B (en) | 2017-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103559025A (en) | Software refactoring method through clustering | |
CN111712809A (en) | Learning ETL rules by example | |
CN105912594B (en) | SQL statement processing method and system | |
CN105373469A (en) | Interface based software automation test method | |
CN105912595A (en) | Data origin collection method of relational databases | |
US20110314060A1 (en) | Markup language based query and file generation | |
CN106919612A (en) | A kind of processing method and processing device of SQL script of reaching the standard grade | |
CN103902269B (en) | System and method for generating MIB files through XML files | |
CN107291450A (en) | A kind of quick code automatic generation method for programming friendly | |
CN109446221A (en) | A kind of interactive data method for surveying based on semantic analysis | |
CN107491476B (en) | Data model conversion and query analysis method suitable for various big data management systems | |
CN102591777A (en) | Unit test code generation method and device | |
CN109902117A (en) | Operation system analysis method and device | |
CN109992271B (en) | Layered architecture recognition method based on code vocabulary and structure dependence | |
CN103593182A (en) | Method for reconfiguring software by using clustering mode | |
CN102902818A (en) | Method and device for upgrading database | |
US9652478B2 (en) | Method and apparatus for generating an electronic document schema from a relational model | |
CN103020318A (en) | Method for maintenance of database tables in database | |
CN108256820A (en) | A kind of PBOM methods of adjustment under three-dimensional assembled view based on MBD | |
Sanchez et al. | Bigraphical modelling of architectural patterns | |
CN111984826B (en) | XML-based data automatic warehousing method, system, device and storage medium | |
CN112130849B (en) | Code automatic generation method and device | |
CN103678349A (en) | Method and device for filtering useless data | |
Lu et al. | Zen-CC: An automated and incremental conformance checking solution to support interactive product configuration | |
CN111008011A (en) | System builder for power platform application development |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170125 Termination date: 20171021 |
|
CF01 | Termination of patent right due to non-payment of annual fee |