CN112698866A - Code line life cycle tracing method based on Git and electronic device - Google Patents

Code line life cycle tracing method based on Git and electronic device Download PDF

Info

Publication number
CN112698866A
CN112698866A CN202110013631.7A CN202110013631A CN112698866A CN 112698866 A CN112698866 A CN 112698866A CN 202110013631 A CN202110013631 A CN 202110013631A CN 112698866 A CN112698866 A CN 112698866A
Authority
CN
China
Prior art keywords
commit
row
file
line
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110013631.7A
Other languages
Chinese (zh)
Other versions
CN112698866B (en
Inventor
朱家鑫
陈伟
吴国全
窦文生
魏峻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN202110013631.7A priority Critical patent/CN112698866B/en
Publication of CN112698866A publication Critical patent/CN112698866A/en
Application granted granted Critical
Publication of CN112698866B publication Critical patent/CN112698866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a code line life cycle tracing method and an electronic device based on Git, which extract the information of each commit in a Git warehouse; according to the ID of the commit and the ID of the corresponding father commit, establishing a directed acyclic graph formed by the commit, traversing the directed acyclic graph according to the breadth priority order, and tracking, extracting and recording the change history information of the code file line according to the code change content of the commit; for a row of codes to be queried of a file in a commit, obtaining the commit information of the generation, displacement or extinction of the row of codes to be queried according to the change history information record of the code file row. The invention traces the code life cycle data of line granularity, including the generation point, the displacement point and the extinction point of the code line; the complete change history of the code line on a plurality of branches can be traced back; any row of any file in any commit snapshot can return its full lifecycle data directly.

Description

Code line life cycle tracing method based on Git and electronic device
Technical Field
The invention relates to the field of software code change history tracing, in particular to a code line life cycle tracing method based on Git and an electronic device.
Background
Version control is an important practice method in software development, and a software project records the history of software code changes by using a version control system, including the developer, time and content of the code changes of each change.
Git is the most popular version control system at present, commit is used for recording each code change, each commit has a unique ID, there are parent-child relations among the commits, the logical precedence order of the commits is represented, and the parent-child relations connect the commits in a version library into a directed acyclic graph.
The format of the content of the code change in Git is the same as the unified format of the diff tool in the Linux system.
When a developer or a researcher wants to research the life cycle of a code line, namely the process from appearance to disappearance of a line of codes in a code library, the related data cannot be directly obtained, namely the life cycle data of the code line in the Git version library cannot be obtained.
Disclosure of Invention
In order to solve the problems, the invention provides a code line life cycle tracing method and an electronic device based on Git.
The technical content of the invention comprises:
a code line life cycle tracing method based on Git comprises the following steps:
1) extracting information of each commit in the Git warehouse, wherein the information of the commit comprises: the ID of the commit, the ID of the corresponding parent commit, the author of the commit, the time of the commit, the description of the commit, and the code change content of the commit;
2) according to the ID of the commit and the ID of the corresponding father commit, establishing a directed acyclic graph formed by the commit, traversing the directed acyclic graph according to the breadth priority order, and tracking, extracting and recording the change history information of the code file line according to the code change content of the commit, wherein the change history information of the code file line comprises:
a) a file containing one or more commit for which the file was created;
b) a commit where the file was created, containing zero or more rows;
c) a row containing commit when one or more rows are displaced;
d) the commit when the line is displaced comprises the ID of the commit when the line is displaced last time and the line number after the displacement;
3) for a row of to-be-queried codes of a file in a commit, obtaining the commit information of the generation, displacement or extinction of the row of to-be-queried codes according to the change history information record of the code file row, wherein the commit information of the generation, displacement or extinction of the row of to-be-queried codes comprises the following steps: commit ID, commit author, commit time, and commit description.
Further, using the Git command: git log-pretty ═ format: "% H; % P; % an; % ae; % at; % s; % b ", obtaining ID of commit and ID of corresponding parent commit, author of commit, time of commit and description of commit.
Further, using the Git command: gitdiff < parent commit ID > < commit ID >, and the code change content of commit is acquired.
Further, the data structure for recording the directed acyclic graph formed by the commit comprises: a doubly linked list.
Further, the data structure for recording the change history information of the code file line includes: a dictionary structure.
Further, the code change content of the commit includes: changing the file path and the file changing content of the file.
Further, the file takes the file path as a unique identifier.
Further, the commit for which the file is created has the ID of the commit as the unique identification.
Further, the rows are numbered in the order they are counted as unique identifiers.
Further, commit at the time of the line displacement is uniquely identified by the ID of commit.
Further, the strategy for recording the shifted line number includes:
1) if the line is the added line, directly recording the line number after the change in the code change content;
2) if the row is deleted, directly recording the row as 0;
3) and if the number of the rows is the reserved row, reading the number of the rows added before the reserved row and the number of the rows deleted before the reserved row from the code change content, and accumulating the number of the rows added before the reserved row and accumulating the number of the rows deleted before the reserved row by using the row number of the reserved row after the reserved row is shifted last time.
A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above-mentioned method when executed.
An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer to perform the method as described above.
Compared with the prior art, the invention has the following advantages:
1) tracing code life cycle data of line granularity, including generation points, displacement points and extinction points of code lines;
2) the complete change history of the code line on a plurality of branches can be traced back;
3) any row of any file in any commit snapshot can return its full lifecycle data directly.
Drawings
FIG. 1 is a flowchart illustrating steps of an embodiment of a code line lifecycle tracing method based on Git according to the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in fig. 1, a flowchart of steps of an embodiment of a code line lifecycle tracing method based on Git according to the present invention may include the following steps:
step 1, extracting information of each commit in the Git warehouse, wherein the information comprises an ID of the commit, an ID of a parent commit, an author, time and description of the commit and a code change content of the commit;
code change content example:
Figure BDA0002886106930000031
in the example, the file abc.c is changed, and the file content displayed thereafter is 5 lines starting to continue from the 6 th line before the change, 6 lines starting to continue from the 6 th line after the change, and the line with the + sign starting indicates the added line and the-sign indicates the deleted line, as suggested by "@ -6,5+6,6 @".
Preferably, using the Git command: git log-pretty ═ format: "% H; % P; % an; % ae; % at; % s; % b ", obtaining ID, parent commit ID, commit author, time, description;
preferably, using the Git command: git diff < parent commit ID > < commit ID >, and the code change content is acquired.
Step 2; establishing a directed acyclic graph formed by the commit according to the ID of the commit extracted in the step 1 and the ID of the corresponding parent commit, traversing the directed acyclic graph according to the breadth priority order, tracking, extracting and recording the change history information of the following code file rows:
file: taking a file path as a unique identifier, wherein the file comprises one or more created commit of the file, and when the file is deleted and created again, a plurality of created commit of the file exist;
commit where the file was created: taking the ID of the commit as a unique identifier, wherein the created commit of the file comprises zero or more rows, and when the created file is an empty file, the commit comprises zero rows;
line: according to the recorded sequence numbers, the numbers are used as unique identifiers, and the rows contain commit when one or more rows are displaced;
commit when a row is displaced: the ID of the commit is used as a unique identifier, the commit when the line is displaced comprises the ID of the commit when the line is displaced last and the line number after the displacement, and the lines are considered to be displaced when added and deleted:
(1) if the line is added, directly recording the line number after the change in the code change content, and for the example in the step 1:
int b ═ 1; in the case of the 8 th row of the drawing,
return a; in the case of the 9 th row of the drawing,
(2) if it is a deleted row, the direct record is 0, for the example in step 1:
return 0; is row 0;
(3) if the line is reserved, reading the added line number before the line and the deleted line number from the code change content, accumulating the added line number before the line by using the line number after the line is shifted for the last time, and subtracting the deleted line number before the line, for the example in the step 4:
int main () { line 6,
int a is 0; is line 7;
row 9+2-1 ═ 10;
preferably, in order to facilitate quick query, a doubly linked list is used for storing the directed acyclic graph formed by the commit, and a dictionary is used for storing the change history information of the code file line.
And 3, giving a row of codes of a file in one commit, inquiring the record formed in the step 2, and returning commit information of the row generation, displacement and extinction, wherein the commit information comprises the ID of the commit, the ID of the parent commit, the author, time and description of the commit, the code change content of the commit, the commit of the row generation is 'the commit when the row number is changed from 0 to non-0', and the commit of the row extinction is 'the commit when the row number is changed from non-0 to 0'.
The experimental steps are as follows:
1) randomly selecting 10 software project Git libraries from 1000 Git libraries with the maximum star number on the GitHub;
2) extracting line granularity code change history from each Git library by using the method;
3) respectively randomly selecting 10 commit from the 10 Git libraries, randomly selecting one row in a file from each commit, and inquiring the life cycle data of the rows by using the method;
4) and (4) randomly selecting 10 manual walkers from the results of the step (3) to check whether the manual walkers are accurate.
The experimental results are as follows:
experimental results show that the line life cycle data can be inquired by 100% and the obtained data is 100% accurate.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (10)

1. A code line life cycle tracing method based on Git comprises the following steps:
1) extracting information of each commit in the Git warehouse, wherein the information of the commit comprises: the ID of the commit, the ID of the corresponding parent commit, the author of the commit, the time of the commit, the description of the commit, and the code change content of the commit;
2) according to the ID of the commit and the ID of the corresponding father commit, establishing a directed acyclic graph formed by the commit, traversing the directed acyclic graph according to the breadth priority order, and tracking, extracting and recording the change history information of the code file line according to the code change content of the commit, wherein the change history information of the code file line comprises:
a) a file containing one or more commit for which the file was created;
b) a commit where the file was created, containing zero or more rows;
c) a row containing commit when one or more rows are displaced;
d) the commit when the line is displaced comprises the ID of the commit when the line is displaced last time and the line number after the displacement;
3) for a row of to-be-queried codes of a file in a commit, obtaining the commit information of the generation, displacement or extinction of the row of to-be-queried codes according to the change history information record of the code file row, wherein the commit information of the generation, displacement or extinction of the row of to-be-queried codes comprises the following steps: commit ID, commit author, commit time, and commit description.
2. The method of claim 1, using a Git command: git log-pretty ═ format: "% H; % P; % an; % ae; % at; % s; % b ", obtaining ID of commit and ID of corresponding parent commit, author of commit, time of commit and description of commit.
3. The method of claim 1, using a Git command: gitdiff < parent commit ID > < commit ID >, and the code change content of commit is acquired.
4. The method as recited in claim 1, wherein recording a data structure of a directed acyclic graph comprised of said commit comprises: a doubly linked list.
5. The method of claim 1, wherein recording a data structure of change history information for the code file line comprises: a dictionary structure.
6. The method of claim 1, wherein the code change content of the commit comprises: changing the file path and the file changing content of the file.
7. The method of claim 6, wherein a file has a file path as a unique identifier; the commit of the file is created with the ID of the commit as a unique identifier; the line takes the counted sequence number as a unique identifier; the commit when the row is displaced has the ID of the commit as the unique identifier.
8. The method of claim 1, wherein the strategy of recording the shifted line number comprises:
1) if the line is the added line, directly recording the line number after the change in the code change content;
2) if the row is deleted, directly recording the row as 0;
3) and if the number of the rows is the reserved row, reading the number of the rows added before the reserved row and the number of the rows deleted before the reserved row from the code change content, and accumulating the number of the rows added before the reserved row and accumulating the number of the rows deleted before the reserved row by using the row number of the reserved row after the reserved row is shifted last time.
9. A storage medium having a computer program stored thereon, wherein the computer program is arranged to, when run, perform the method of any of claims 1-8.
10. An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the method according to any of claims 1-8.
CN202110013631.7A 2021-01-06 2021-01-06 Code line life cycle tracing method based on Git and electronic device Active CN112698866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110013631.7A CN112698866B (en) 2021-01-06 2021-01-06 Code line life cycle tracing method based on Git and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110013631.7A CN112698866B (en) 2021-01-06 2021-01-06 Code line life cycle tracing method based on Git and electronic device

Publications (2)

Publication Number Publication Date
CN112698866A true CN112698866A (en) 2021-04-23
CN112698866B CN112698866B (en) 2022-06-17

Family

ID=75514890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110013631.7A Active CN112698866B (en) 2021-01-06 2021-01-06 Code line life cycle tracing method based on Git and electronic device

Country Status (1)

Country Link
CN (1) CN112698866B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112011A (en) * 2023-08-16 2023-11-24 北京冠群信息技术股份有限公司 Version management method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140289280A1 (en) * 2013-03-15 2014-09-25 Perforce Software, Inc. System and Method for Bi-directional Conversion of Directed Acyclic Graphs and Inter-File Branching
CN105956087A (en) * 2016-04-29 2016-09-21 清华大学 Data and code version management system and method
CN109800018A (en) * 2019-01-10 2019-05-24 郑州云海信息技术有限公司 A kind of code administration method and system based on Gerrit
CN110286880A (en) * 2019-06-17 2019-09-27 中国科学院软件研究所 A kind of complete continuous integrating method of data capture towards GitHub Yu Travis CI
CN111290777A (en) * 2020-01-23 2020-06-16 复旦大学 Evolution history slicing method oriented to software code unit and code measurement

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140289280A1 (en) * 2013-03-15 2014-09-25 Perforce Software, Inc. System and Method for Bi-directional Conversion of Directed Acyclic Graphs and Inter-File Branching
CN105956087A (en) * 2016-04-29 2016-09-21 清华大学 Data and code version management system and method
CN109800018A (en) * 2019-01-10 2019-05-24 郑州云海信息技术有限公司 A kind of code administration method and system based on Gerrit
CN110286880A (en) * 2019-06-17 2019-09-27 中国科学院软件研究所 A kind of complete continuous integrating method of data capture towards GitHub Yu Travis CI
CN111290777A (en) * 2020-01-23 2020-06-16 复旦大学 Evolution history slicing method oriented to software code unit and code measurement

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112011A (en) * 2023-08-16 2023-11-24 北京冠群信息技术股份有限公司 Version management method and device

Also Published As

Publication number Publication date
CN112698866B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
CN110321344B (en) Information query method and device for associated data, computer equipment and storage medium
CN105630863B (en) Transaction control block for multi-version concurrent commit status
EP2784665B1 (en) Program and version control method
CN105373541A (en) Processing method and system for data operation request of database
CN111241389A (en) Sensitive word filtering method and device based on matrix, electronic equipment and storage medium
CN111061742B (en) Method and device for marking data and service system thereof
WO2020056977A1 (en) Knowledge point pushing method and device, and computer readable storage medium
CN106126486A (en) Temporal information coded method, encoded radio search method, coding/decoding method and device
CN103678342A (en) Starting item recognition method and device
CN108846069B (en) Document execution method and device based on markup language
CN113961794A (en) Book recommendation method and device, computer equipment and storage medium
CN114138784A (en) Information tracing method and device based on storage library, electronic equipment and medium
CN109597707A (en) Clone volume data copying method, device and computer readable storage medium
CN112698866B (en) Code line life cycle tracing method based on Git and electronic device
CN108073595B (en) Method and device for realizing data updating and snapshot in OLAP database
CN111858581A (en) Page query method and device, storage medium and electronic equipment
CN103177026A (en) Data management method and data management system
US20080033949A1 (en) Electronic apparatus and method therefor
CN114816247A (en) Logic data acquisition method and device
CN114296978A (en) Software toolkit address positioning method and device
CN107656868B (en) Debugging method and system for acquiring thread name by using thread private data
CN111259003A (en) Database establishing method and device
CN113657076B (en) Page operation record table generation method and device, electronic equipment and storage medium
CN115098459A (en) Data sharing method and device, terminal equipment and storage medium
CN112402955B (en) Game log recording method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant