CN113778501B - Code task processing method and device - Google Patents

Code task processing method and device Download PDF

Info

Publication number
CN113778501B
CN113778501B CN202010575040.4A CN202010575040A CN113778501B CN 113778501 B CN113778501 B CN 113778501B CN 202010575040 A CN202010575040 A CN 202010575040A CN 113778501 B CN113778501 B CN 113778501B
Authority
CN
China
Prior art keywords
code
task
data
processing
code data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010575040.4A
Other languages
Chinese (zh)
Other versions
CN113778501A (en
Inventor
王梦津
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010575040.4A priority Critical patent/CN113778501B/en
Publication of CN113778501A publication Critical patent/CN113778501A/en
Application granted granted Critical
Publication of CN113778501B publication Critical patent/CN113778501B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a code task processing method and device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: acquiring a code task, wherein the code task indicates code data and corresponding code data operation information thereof; performing data mining processing on the code task based on a data mining algorithm to obtain a processing result; and determining a code data attribute according to the processing result, wherein the code data attribute indicates the maintenance value and development efficiency of the code data. According to the embodiment, the historical behavior of the system can be accurately and reasonably interpreted according to the processing result; meanwhile, the maintenance value and the development efficiency corresponding to the code data can be determined, so that the subsequent system development and maintenance efficiency is improved, and the development and maintenance cost is reduced.

Description

Code task processing method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for processing a code task.
Background
Often, a plurality of developers are required to write corresponding service codes in respective corresponding personal code libraries, and after the writing is completed, code data responsible for the service codes are merged into a main branch (such as a master branch and a production branch) through submitting (commit) and merging request (pull request) and other operation modes, so that development or update maintenance of one project is finally completed.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:
1. The existing code processing method cannot accurately and reasonably explain the historical behavior of a code corresponding system;
2. the maintenance value and the development efficiency corresponding to the code data cannot be identified, so that the subsequent system development and maintenance are low in efficiency and high in cost.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a method and a device for processing code tasks, which can accurately and reasonably explain the historical behavior of a system according to processing results; meanwhile, the maintenance value and the development efficiency corresponding to the code data can be determined, so that the subsequent system development and maintenance efficiency is improved, and the development and maintenance cost is reduced.
To achieve the above object, according to a first aspect of an embodiment of the present invention, there is provided a code task processing method, including:
Acquiring a code task, wherein the code task indicates code data and corresponding code data operation information thereof;
Performing data mining processing on the code task based on a data mining algorithm to obtain a processing result;
And determining a code data attribute according to the processing result, wherein the code data attribute indicates the maintenance value and development efficiency of the code data.
Further, the code data operation information indicates an operation mode and an operation frequency corresponding to the code data, and the operation mode includes at least one of the following modes: commit, modify, and merge.
Further, before the step of performing data mining processing on the code task based on the data mining algorithm, the code task processing method further includes: setting a first task cleaning rule, and cleaning data of the code task according to the first task cleaning rule.
Further, before the step of performing data cleaning on the code task according to the first task cleaning rule, the code task processing method further includes: and merging the code tasks with the same name according to the code task alias table corresponding to the code library.
Further, the code tasks include a code development task and a verification task, the verification task is obtained from the code version information, and the code task processing method further includes: and adjusting the weight coefficients corresponding to the code development task and the verification task in the data mining process according to the processing results corresponding to the code development task and the verification task.
Further, the code task processing method further comprises the following steps: setting a second task cleaning rule, and cleaning data of the processing result according to the second task cleaning rule.
Further, before the step of determining the attribute of the code data according to the processing result, the code task processing method further includes: the following steps are circularly executed until the threshold value of the processing times is reached:
the processing result is used as a code task for data mining processing;
and carrying out data mining processing on the code task based on a data mining algorithm to obtain a processing result.
According to a second aspect of an embodiment of the present invention, there is provided a code task processing device including:
the code task acquisition module is used for acquiring a code task, wherein the code task indicates code data and corresponding code data operation information thereof;
the processing module is used for carrying out data mining processing on the code task based on a data mining algorithm to obtain a processing result;
And the code data attribute determining module is used for determining the code data attribute according to the processing result, wherein the code data attribute indicates the maintenance value and the development efficiency of the code data.
According to a third aspect of an embodiment of the present invention, there is provided an electronic apparatus including:
one or more processors;
Storage means for storing one or more programs,
When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement any of the code task processing methods described above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a code task processing method as any one of the above.
One embodiment of the above invention has the following advantages or benefits: because the code task is acquired, the code task indicates the code data and the corresponding code data operation information thereof; performing data mining processing on the code task based on a data mining algorithm to obtain a processing result; determining a code data attribute according to a processing result, wherein the code data attribute indicates a technical means of maintenance value and development efficiency of the code data, so that the technical problems that in the existing code processing method, historical behaviors of a system cannot be accurately and reasonably interpreted, and maintenance value and development efficiency corresponding to the code data cannot be identified, and subsequent system development and maintenance are low in efficiency and high in cost are solved, and further the historical behaviors of the code system can be accurately and reasonably interpreted according to the processing result; meanwhile, the maintenance value and the development efficiency corresponding to the code data can be determined, so that the subsequent system development and maintenance efficiency is improved, and the technical effects of development and maintenance cost are reduced.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main flow of a code task processing method provided according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of the main flow of a code task processing method according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram of the main modules of a code task processing device provided in accordance with an embodiment of the present invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 5 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
FIG. 1 is a schematic diagram of the main flow of a code task processing method provided according to a first embodiment of the present invention; as shown in fig. 1, the method for processing a code task provided by the embodiment of the present invention mainly includes:
Step S101, a code task is obtained, wherein the code task indicates code data and corresponding code data operation information thereof.
According to an embodiment of the present invention, the code data operation information indicates an operation mode and an operation frequency corresponding to the code data, where the operation mode includes at least one of the following modes: commit, modify, and merge.
Specifically, according to an embodiment of the present invention, the code tasks are obtained from a code library, and mainly include code data of a developer commit/pull request (including a code file and an event indicating a specific operation mode in a development process, such as a corresponding method) and commitlog (change record) corresponding to the developer. Through the arrangement, all code tasks of a developer in the code development process are acquired, and accurate and reasonable interpretation of historical behaviors of the system is facilitated. In the development process of an application system, a plurality of developers are often required to respectively construct a personal code library, respectively responsible for development of part of projects, and then code data responsible for the development is merged into a main branch (such as a master branch and a production branch) through an operation mode such as a commit (commit) and a merge request (pull request). Thus, the code tasks described above may be obtained from multiple personal code libraries using a code version management system call application program interface.
Step S102, data mining processing is carried out on the code task based on a data mining algorithm, and a processing result is obtained.
Specifically, according to an embodiment of the present invention, the step of performing data mining processing on a code task based on the data mining algorithm includes: and carrying out data mining processing on the code task by combining a data mining algorithm according to the code data modification frequency threshold value and/or the code data submitting frequency indicated by the operation frequency corresponding to the code data in the code task.
According to an embodiment of the present invention, the data mining algorithm includes: text similarity algorithms, clustering algorithms, etc. (by way of example only, other types of data mining algorithms are within the scope of the present invention). Through the arrangement, the place where the developer understands the system model to be developed is different can be determined according to the data mining algorithm, the code data modification frequency threshold value and the code data submitting frequency, and meanwhile, the hot spot modification code data (which means that the code data is repeatedly modified, and the fact that different developers understand the functions to be realized by the code with large difference or errors easily occur in the development process of the code, and the like) can be determined.
Preferably, according to an embodiment of the present invention, before the step of performing data mining processing on a code task based on the data mining algorithm, the code task processing method further includes: setting a first task cleaning rule, and cleaning data of the code task according to the first task cleaning rule.
Specifically, according to a specific implementation of the embodiment of the present invention, the above-mentioned first task cleaning rule indicates a criterion of a non-valuable code task, that is, any one of the following situations is only involved in the code data related to the code task, and it is determined that the non-valuable code task needs to be cleaned: improvements in blank characters, including only formatted content, no alteration of any code logic, or temporary code belonging to a pressure measurement correlation (i.e., committed to only temporary branches, not merged into main branches).
Through the arrangement, the data cleaning is carried out on the code task, so that the accuracy of the data mining processing on the code task according to the data mining algorithm can be remarkably improved, and the accuracy of the code data attribute determined according to the processing result is improved, thereby improving the accuracy of the maintenance value and the development efficiency indicated by the code data attribute, improving the efficiency of the subsequent system development and maintenance, and reducing the development and maintenance cost.
Further, according to an embodiment of the present invention, before the step of performing data cleansing on the code task according to the first task cleansing rule, the code task processing method further includes: and merging the code tasks with the same name according to the code task alias table corresponding to the code library.
Specifically, the code task name table corresponding to the code library may also be a synonym dictionary established by each code library, for example, the code task name table may be expressed as giftCard (english name corresponding to the gift card) and liPinKa (pinyin corresponding to the gift card) for the gift card, but the code task name table and the gift card are expressed as a meaning. The code tasks with the same actual names are combined through the setting, so that the code tasks with the same names can be compared when being processed, and the attribute of the code data can be determined better.
Preferably, according to an embodiment of the present invention, the code tasks include a code development task and a verification task, the verification task is obtained from code version information, and the code task processing method further includes: and adjusting the weight coefficients corresponding to the code development task and the verification task according to the processing results corresponding to the code development task and the verification task.
Because the software development is essentially implemented by codes, the abstract knowledge relates to personal understanding, and different developers have different understanding on mapping relation between the development field and business logic, code data in code tasks often represent individuation of the developers. Through the above arrangement, the obtained code tasks include not only the original code development task (i.e., the code data of the developer commit/pull request (including the code file and the corresponding method and other events indicating the specific operation mode in the development process) and the corresponding commitlog of the developer), but also the verification task (which may be a code task summarized according to the code modification record and the code version information (such as the code version announcement) record input by the developer, and the verification task obtained through the above summary is beneficial to ensuring the quality of the code task because the quality of the modification record input by the developer is inconvenient to control. The verification task is used as the supplement of the original development task, and the data mining processing process is executed, so that the influence of personalized change records of developers on the accuracy of processing results is overcome. And the weight coefficient of the code development task and the verification task in the data mining processing process is adjusted according to the processing result, so that the influence of individuation of developers on the subsequent analysis of the code data attribute can be further reduced, the development and maintenance efficiency of a subsequent system is improved, and the development and maintenance cost is reduced.
Further, according to an embodiment of the present invention, the above-mentioned code task processing method further includes: setting a second task cleaning rule, and cleaning data of the processing result according to the second task cleaning rule.
Specifically, according to a specific implementation manner of the embodiment of the present invention, the above-mentioned second task cleaning rule indicates a feature extraction algorithm, which means that the data mining processing result is optimized according to the code feature corresponding to the specific system (such as an e-commerce system, a logistics system, etc.), so that the data mining processing result is closer to the function of the specific system, so as to improve the accuracy of the code data attribute determined according to the code task processing result. According to an embodiment of the present invention, the feature extraction algorithm may be a Random Forest (Random Forest), and other existing feature extraction algorithms may be used.
Step S103, determining a code data attribute according to the processing result, wherein the code data attribute indicates the maintenance value and the development efficiency of the code data.
The maintenance value of the code data is used for representing maintainability of the code data, because knowledge abstracted by the code may have different cognitions in a development team, and processing of corresponding real problems is inconsistent, the higher the maintenance value is, the higher the understanding similarity of each member of the development team to the abstract problem corresponding to the code data is, the better the solving effect of the code to the specific real problem is, the health degree of the code data can be used for representing, and meanwhile, the higher the health degree of the code data is, the higher the indicated development efficiency is. And by determining the code data attribute, the subsequent improvement of the system development and maintenance efficiency is facilitated, and the system development and maintenance cost is reduced.
Further, according to an embodiment of the present invention, before the step of determining the attribute of the code data according to the processing result, the above-mentioned code task processing method further includes: the following steps are circularly executed until the threshold value of the processing times is reached:
the processing result is used as a code task for data mining processing;
and carrying out data mining processing on the code task based on a data mining algorithm to obtain a processing result.
According to a specific implementation manner of the embodiment of the present invention, the processing results corresponding to the code development task and the verification task, which are subjected to the adjustment of the weight coefficients, may be subjected to data cleaning according to the second data cleaning rule, and the cleaned processing results are used as the execution object (i.e., code data) of the data mining processing, and the data mining processing is performed on the cleaned processing results, so as to obtain the processing results until the processing frequency threshold is reached. It is apparent that within a limited number of processes, the resulting code task processing results are increasingly accurate as the number of processes increases. In practical application, the threshold of the processing times can be set according to the practical processing conditions.
According to the technical scheme of the embodiment of the invention, the code task is acquired, wherein the code task indicates the code data and the corresponding code data operation information thereof; performing data mining processing on the code task based on a data mining algorithm to obtain a processing result; determining a code data attribute according to a processing result, wherein the code data attribute indicates a technical means of maintenance value and development efficiency of the code data, so that the technical problems that in the existing code processing method, historical behaviors of a system cannot be accurately and reasonably interpreted, and maintenance value and development efficiency corresponding to the code data cannot be identified, and subsequent system development and maintenance are low in efficiency and high in cost are solved, and further the historical behaviors of the code system can be accurately and reasonably interpreted according to the processing result; meanwhile, the maintenance value and the development efficiency corresponding to the code data can be determined, so that the subsequent system development and maintenance efficiency is improved, and the technical effects of development and maintenance cost are reduced.
FIG. 2 is a schematic diagram of the main flow of a code task processing method according to a second embodiment of the present invention; an application scenario of the embodiment of the present invention is processing code data of an e-commerce system, as shown in fig. 2, and the code task processing method provided by the embodiment of the present invention mainly includes:
step S201, a code task is obtained, wherein the code task indicates code data and corresponding code data operation information thereof.
Specifically, according to the embodiment of the present invention, a code management tool may be used to obtain corresponding code tasks in a code library, for example, a centralized code management tool SVN (Subversion is a version control system of an open source code), and CVS (Concurrent Version System is a C/S system, which is a commonly used code version control software); or a distributed version management tool Git (a distributed version control system that is an open source). Taking the Git open source platform Github as an example, submitted code tasks under the code repository netty (a java open source framework provided by JBOSS, now a separate item on Github) are obtained by the Curl instruction (an instruction to obtain resources).
In the actual production process, after the code data in the code tasks are combined to the main branches, a developer only pays attention to the code tasks corresponding to the main branches, but ignores the original code development tasks, so that event tracing cannot be performed, and reasonable and accurate explanation of the behavior of the electronic commerce development system cannot be performed. If the e-commerce system sets a full-reduction promotion scheme, most users can adjust partial commodities after adding the wanted commodities to the shopping cart so that the total amount of the orders can hit the wanted full-reduction ladder, if the user wants to know the difference of the full 200-element reduction 20-element and the full 300-element reduction 30-element in the full-reduction promotion scheme on the attraction of the users, or when the total amount of the orders in the shopping cart is in a specific amount interval of 200-300-element, the user is more prone to hit the full-reduction ladder of the full 300-element reduction 30-element. At this time, if each operation data of the user during the commodity purchase can be obtained, the above problem can be solved.
Step S202, merging the code tasks with the same name according to the code task alias table corresponding to the code library.
According to the embodiment of the invention, the gift card in the e-commerce system can be expressed in two forms, giftCard and liPinKa, but the two are expressed in one meaning. This is also caused by personalization of the developer, according to a specific implementation manner of the embodiment of the present invention, an independent synonym dictionary may be established for each code library, all synonyms are listed for the code task names corresponding to the common functions (coupons, gift cards, etc.) indicated by the system to be developed (e.g. the e-commerce system involved in the embodiment), and then after the code tasks are acquired, the code tasks with the same actual name are merged according to the indication of the synonym dictionary.
Step S203, a first task cleaning rule is set, and data cleaning is performed on the code task according to the first task cleaning rule.
Specifically, the above-described first task cleaning rule indicates a non-valuable code task criterion, i.e., a determination as to a non-valuable code task as long as any one of the following cases is involved in the code data involved in the code task: improvements in blank characters, including only formatted content, no modification to any code logic, or temporary code belonging to the pressure measurement correlation (i.e., committed to only temporary branches, not merged into main branches), require data cleansing. The reason why the value-free code task standard is established instead of the value-free code task standard is that when the preliminary judgment is carried out, the value-free code standard is set simply, and a plurality of code tasks which cannot judge the corresponding value through the simple standard exist, so that the code tasks can be reserved for carrying out the subsequent data mining processing flow, and the more stereoscopic processing result can be obtained later.
Through the arrangement, the data of the code task is cleaned, so that the accuracy of the data mining processing of the code task according to a data mining algorithm can be remarkably improved.
And step S204, performing data mining processing on the code task based on a data mining algorithm to obtain a processing result.
According to an embodiment of the present invention, the above-mentioned data mining process may further be: defining a processing mode (such as a hot code processing mode set according to the code data modification frequency threshold) according to the code data modification frequency threshold indicated by the operation frequency corresponding to the code data in the code task and/or the code data submitting frequency, and then carrying out data mining processing on the processing mode by combining a data mining algorithm (such as a text similarity algorithm, a clustering algorithm and the like) to output a processing result.
Through the arrangement, the place where the developer understands the system model to be developed is different can be determined according to the data mining algorithm, the code data modification frequency threshold value and the code data submitting frequency, and meanwhile, the hot spot modification code data (which means that the code data is repeatedly modified, and the fact that different developers understand the functions to be realized by the code with large difference or errors easily occur in the development process of the code, and the like) can be determined.
Specifically, according to the embodiment of the present invention, in the process of performing data mining processing on the code tasks based on the data mining algorithm, a processing result output condition may also be set, for example, the first ten code tasks that are most frequently modified according to the number of times of code data modification may be taken as the output processing result.
Step S205, a second task cleaning rule is set, and data cleaning is performed on the processing result according to the second task cleaning rule.
Specifically, according to a specific implementation manner of the embodiment of the present invention, the above-mentioned second task cleaning rule indicates a feature extraction algorithm, which means that the data mining processing result is optimized according to the code feature corresponding to the specific system (such as an e-commerce system, a logistics system, etc.), so that the data mining processing result is closer to the function of the specific system, so as to improve the accuracy of the code data attribute determined according to the code task processing result. According to an embodiment of the present invention, the feature extraction algorithm may be a Random Forest (Random Forest), and other existing feature extraction algorithms may be used. In addition, human intervention can be carried out on the processing result to judge whether the processing result has guiding value for the maintenance value and development efficiency of the code data to be judged later, so that the data of the processing result can be cleaned.
According to a specific implementation manner of the embodiment of the present invention, some task data with higher relevance may be determined through text similarity, but after human intervention judgment, it is found that only a large number of identical open source components are introduced into the task data with higher relevance, so that data cleaning may be performed on a processing result depending on the universal open source components.
Step S206, the code tasks comprise a code development task and a verification task, wherein the verification task is obtained from the code version information; and adjusting the weight coefficients corresponding to the code development task and the verification task in the data mining process according to the processing results corresponding to the code development task and the verification task.
Specifically, according to the embodiment of the present invention, the verification task may be automatically generated by ChangeScribe (a tool for automatically generating the submitted task, or other task generating tools may be adopted, which is only an example herein) according to the code modification record and the code version information (such as the code version announcement) record input by the developer, so as to be used as the additional information of the developer commitlog to verify the processing result. The weight coefficients of the code development task and the verification task in the data mining processing process are adjusted according to the processing results, so that the influence of individuation of developers on subsequent analysis of the code data attribute can be further reduced, the development and maintenance efficiency of subsequent systems is improved, and the development and maintenance cost is reduced.
Step S207, judging whether the threshold of the processing times is reached. If yes, the processing number threshold is reached, step S209 is executed; if not, that is, if the threshold number of processing times is not reached, the process proceeds to step S208.
It is apparent that within a limited number of processes, the resulting code task processing results are increasingly accurate as the number of processes increases. In practical application, the threshold of the processing times can be set according to the practical processing conditions.
Step S208, the code data processing result is used as the code data input in the construction code data processing model.
According to a specific implementation manner of the embodiment of the present invention, the processing results corresponding to the code development task and the verification task, which are subjected to the adjustment of the weight coefficients, may be subjected to data cleaning according to the second data cleaning rule, and the cleaned processing results are used as the execution object (i.e., code data) of the data mining processing, and the data mining processing is performed on the cleaned processing results, so as to obtain the processing results until the processing frequency threshold is reached.
Step S209, determining a code data attribute according to the processing result, wherein the code data attribute indicates a maintenance value and development efficiency of the code data.
The maintenance value of the code data refers to maintainability of the code data, because knowledge abstracted by the code may have different cognitions in a development team, and processing of corresponding real problems is inconsistent, the higher the maintenance value is, the higher the understanding similarity of each member of the development team to the abstract problem corresponding to the code data is, the better the solving effect of the code to the specific real problem is, and the health degree of the code data can be used for representing the code data. And by determining the code data attribute, the subsequent improvement of the system development and maintenance efficiency is facilitated, and the system development and maintenance cost is reduced.
It should be noted that, the order of steps in the embodiments of the present invention is not limited to the present invention, and it is understood that, in practical application, it is also within the scope of the embodiments of the present invention to properly adjust the steps.
According to the technical scheme of the embodiment of the invention, the code task is acquired, wherein the code task indicates the code data and the corresponding code data operation information thereof; performing data mining processing on the code task based on a data mining algorithm to obtain a processing result; determining a code data attribute according to a processing result, wherein the code data attribute indicates a technical means of maintenance value and development efficiency of the code data, so that the technical problems that in the existing code processing method, historical behaviors of a system cannot be accurately and reasonably interpreted, and maintenance value and development efficiency corresponding to the code data cannot be identified, and subsequent system development and maintenance are low in efficiency and high in cost are solved, and further the historical behaviors of the code system can be accurately and reasonably interpreted according to the processing result; meanwhile, the maintenance value and the development efficiency corresponding to the code data can be determined, so that the subsequent system development and maintenance efficiency is improved, and the technical effects of development and maintenance cost are reduced.
FIG. 3 is a schematic diagram of the main modules of a code task processing device provided in accordance with an embodiment of the present invention; as shown in fig. 3, the code task processing device 300 provided in the embodiment of the present invention mainly includes:
the code task obtaining module 301 is configured to obtain a code task, where the code task indicates code data and corresponding code data operation information thereof.
According to an embodiment of the present invention, the code data operation information indicates an operation mode and an operation frequency corresponding to the code data, where the operation mode includes at least one of the following modes: commit, modify, and merge.
Specifically, according to an embodiment of the present invention, the code tasks are obtained from a code library, and mainly include code data of a developer commit/pull request (including a code file and an event indicating a specific operation mode in a development process, such as a corresponding method) and commitlog (change record) corresponding to the developer. In the development process of an application system, a plurality of developers are often required to respectively construct a personal code library, respectively responsible for development of part of projects, and then code data responsible for the development is merged into a main branch (such as a master branch and a production branch) through an operation mode such as a commit (commit) and a merge request (pull request). Thus, the code tasks described above may be obtained from multiple personal code libraries using a code version management system call application program interface.
And the processing module 302 is configured to perform data mining processing on the code task based on a data mining algorithm, so as to obtain a processing result.
Specifically, according to an embodiment of the present invention, the processing module 302 is further configured to: and carrying out data mining processing on the code task by combining a data mining algorithm according to the code data modification frequency threshold value and/or the code data submitting frequency indicated by the operation frequency corresponding to the code data in the code task.
According to an embodiment of the present invention, the data mining algorithm includes: text similarity algorithms, clustering algorithms, etc. (by way of example only, other types of data mining algorithms are within the scope of the present invention). Through the arrangement, the place where the developer understands the system model to be developed is different can be determined according to the data mining algorithm, the code data modification frequency threshold value and the code data submitting frequency, and meanwhile, the hot spot modification code data (which means that the code data is repeatedly modified, and the fact that different developers understand the functions to be realized by the code with large difference or errors easily occur in the development process of the code, and the like) can be determined.
Preferably, according to an embodiment of the present invention, the code task processing device 300 further includes a first task cleaning rule setting module, where the first task cleaning rule setting module is configured to: setting a first task cleaning rule, and cleaning data of the code task according to the first task cleaning rule.
Specifically, according to a specific implementation of the embodiment of the present invention, the above-mentioned first task cleaning rule indicates a non-valuable code task standard, that is, a non-valuable code task is determined as long as any one of the following cases is involved in the code data involved in the code task: improvements in blank characters, including only formatted content, no modification to any code logic, or temporary code belonging to the pressure measurement correlation (i.e., committed to only temporary branches, not merged into main branches), require data cleansing.
Through the arrangement, the data cleaning is carried out on the code task, so that the accuracy of the data mining processing on the code task according to the data mining algorithm can be remarkably improved, and the accuracy of the code data attribute determined according to the processing result is improved, thereby improving the accuracy of the maintenance value and the development efficiency indicated by the code data attribute, improving the efficiency of the subsequent system development and maintenance, and reducing the development and maintenance cost.
Further, according to an embodiment of the present invention, the code task processing device 300 further includes a merging module, before the step of performing data cleaning on the code task according to the first task cleaning rule, the merging module is further configured to: and merging the code tasks with the same name according to the code task alias table corresponding to the code library.
Specifically, the code task alias table corresponding to the code library can also be a synonym dictionary established by each code library, for example, the code task alias table can be expressed as giftCard and liPinKa for a gift card, but the code task alias table and the gift card have one meaning. The code tasks with the same actual names are combined through the setting, so that the code tasks with the same names can be compared when being processed, and the attribute of the code data can be determined better.
Preferably, according to an embodiment of the present invention, the code task further includes a code development task and a verification task, the verification task is obtained from code version information, and the code task processing device 300 further includes a weight coefficient adjustment module for: and adjusting the weight coefficients corresponding to the code development task and the verification task according to the processing results corresponding to the code development task and the verification task.
Because the software development is essentially implemented by codes, the abstract knowledge relates to personal understanding, and different developers have different understanding on mapping relation between the development field and business logic, code data in code tasks often represent individuation of the developers. Through the above arrangement, the obtained code tasks include not only the original code development task (i.e., the code data of the developer commit/pull request (including the code file and the corresponding method and other events indicating the specific operation mode in the development process) and the corresponding commitlog of the developer), but also the verification task (which may be a code task summarized according to the code modification record and the code version information (such as the code version announcement) record input by the developer, and the verification task obtained through the above summary is beneficial to ensuring the quality of the code task because the quality of the modification record input by the developer is inconvenient to control. The verification task is used as the supplement of the original development task, and the data mining processing process is executed, so that the influence of personalized change records of developers on the accuracy of processing results is overcome. And the weight coefficient of the code development task and the verification task in the data mining processing process is adjusted according to the processing result, so that the influence of individuation of developers on the subsequent analysis of the code data attribute can be further reduced, the development and maintenance efficiency of a subsequent system is improved, and the development and maintenance cost is reduced.
Further, according to an embodiment of the present invention, the above-mentioned code task processing device 300 further includes a second task cleaning rule setting module, configured to: setting a second task cleaning rule, and cleaning data of the processing result according to the second task cleaning rule.
Specifically, according to a specific implementation manner of the embodiment of the present invention, the above-mentioned second task cleaning rule indicates a feature extraction algorithm, which means that the data mining processing result is optimized according to the code feature corresponding to the specific system (such as an e-commerce system, a logistics system, etc.), so that the data mining processing result is closer to the function of the specific system, so as to improve the accuracy of the code data attribute determined according to the code task processing result. According to an embodiment of the present invention, the feature extraction algorithm may be a Random Forest (Random Forest), and other existing feature extraction algorithms may be used.
A code data attribute determining module 303, configured to determine a code data attribute according to the processing result, where the code data attribute indicates a maintenance value and development efficiency of the code data.
The maintenance value of the code data refers to maintainability of the code data, because knowledge abstracted by the code may have different cognitions in a development team, and processing of corresponding real problems is inconsistent, the higher the maintenance value is, the higher the understanding similarity of each member of the development team to the abstract problem corresponding to the code data is, the better the solving effect of the code to the specific real problem is, and the health degree of the code data can be used for representing the code data. And by determining the code data attribute, the subsequent improvement of the system development and maintenance efficiency is facilitated, and the system development and maintenance cost is reduced.
Further, according to an embodiment of the present invention, the code task processing device 300 further includes a loop module, before the step of determining the code data attribute according to the processing result, the loop module is configured to: the following steps are circularly executed until the threshold value of the processing times is reached:
the processing result is used as a code task for data mining processing;
and carrying out data mining processing on the code task based on a data mining algorithm to obtain a processing result.
According to a specific implementation manner of the embodiment of the present invention, the processing results corresponding to the code development task and the verification task, which are subjected to the adjustment of the weight coefficients, may be subjected to data cleaning according to the second data cleaning rule, and the cleaned processing results are used as the execution object (i.e., code data) of the data mining processing, and the data mining processing is performed on the cleaned processing results, so as to obtain the processing results until the processing frequency threshold is reached. It is apparent that within a limited number of processes, the resulting code task processing results are increasingly accurate as the number of processes increases. In practical application, the threshold of the processing times can be set according to the practical processing conditions.
According to the technical scheme of the embodiment of the invention, the code task is acquired, wherein the code task indicates the code data and the corresponding code data operation information thereof; performing data mining processing on the code task based on a data mining algorithm to obtain a processing result; determining a code data attribute according to a processing result, wherein the code data attribute indicates a technical means of maintenance value and development efficiency of the code data, so that the technical problems that in the existing code processing method, historical behaviors of a system cannot be accurately and reasonably interpreted, and maintenance value and development efficiency corresponding to the code data cannot be identified, and subsequent system development and maintenance are low in efficiency and high in cost are solved, and further the historical behaviors of the code system can be accurately and reasonably interpreted according to the processing result; meanwhile, the maintenance value and the development efficiency corresponding to the code data can be determined, so that the subsequent system development and maintenance efficiency is improved, and the technical effects of development and maintenance cost are reduced.
Fig. 4 illustrates an exemplary system architecture 400 to which the code task processing method or code task processing device of embodiments of the present invention may be applied.
As shown in fig. 4, a system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405 (this architecture is merely an example, and the components contained in a particular architecture may be tailored to the application specific case). The network 404 is used as a medium to provide communication links between the terminal devices 401, 402, 403 and the server 405. The network 404 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 405 via the network 404 using the terminal devices 401, 402, 403 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 401, 402, 403.
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 405 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using the terminal devices 401, 402, 403. The background management server may analyze and process the received data such as the code task, and feed back the processing result (e.g., the processing result, the code data attribute—only an example) to the terminal device.
It should be noted that, the code task processing method provided in the embodiment of the present invention is generally executed by the server 405, and accordingly, the code task processing device is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 5, there is illustrated a schematic diagram of a computer system 500 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 5 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU) 501, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 501.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a code task acquisition module, a processing module, and a code data attribute determination module. The names of these modules do not constitute limitations on the module itself in some cases, and for example, the code task acquisition module may also be described as "a module for acquiring code tasks indicating code data and corresponding code data operation information thereof".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: acquiring a code task, wherein the code task indicates code data and corresponding code data operation information thereof; performing data mining processing on the code task based on a data mining algorithm to obtain a processing result; and determining a code data attribute according to the processing result, wherein the code data attribute indicates the maintenance value and development efficiency of the code data.
According to the technical scheme of the embodiment of the invention, the code task is acquired, wherein the code task indicates the code data and the corresponding code data operation information thereof; performing data mining processing on the code task based on a data mining algorithm to obtain a processing result; determining a code data attribute according to a processing result, wherein the code data attribute indicates a technical means of maintenance value and development efficiency of the code data, so that the technical problems that in the existing code processing method, historical behaviors of a system cannot be accurately and reasonably interpreted, and maintenance value and development efficiency corresponding to the code data cannot be identified, and subsequent system development and maintenance are low in efficiency and high in cost are solved, and further the historical behaviors of the code system can be accurately and reasonably interpreted according to the processing result; meanwhile, the maintenance value and the development efficiency corresponding to the code data can be determined, so that the subsequent system development and maintenance efficiency is improved, and the technical effects of development and maintenance cost are reduced.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. A code task processing method, comprising:
acquiring a code task, wherein the code task indicates code data and corresponding code data operation information thereof; the code data operation information indicates an operation mode and an operation frequency corresponding to the code data, wherein the operation mode comprises at least one of the following modes: submitting, modifying and merging;
performing data mining processing on the code task based on a data mining algorithm to obtain a processing result;
determining a code data attribute according to the processing result, wherein the code data attribute indicates maintenance value and development efficiency of the code data; the maintenance value of the code data is used to characterize the maintainability of the code data.
2. The code task processing method according to claim 1, characterized in that before the step of performing data mining processing on the code task based on the data mining algorithm, the code task processing method further comprises: setting a first task cleaning rule, and cleaning data of the code task according to the first task cleaning rule.
3. The code task processing method according to claim 2, characterized in that before the step of data-cleaning the code task according to the first task-cleaning rule, the code task processing method further comprises: and merging the code tasks with the same name according to the code task alias table corresponding to the code library.
4. The code task processing method according to claim 1, wherein the code task includes a code development task and a verification task, the verification task being obtained from code version information, the code task processing method further comprising: and adjusting the weight coefficients corresponding to the code development task and the verification task in the data mining process according to the processing results corresponding to the code development task and the verification task.
5. The code task processing method according to claim 1, characterized in that the code task processing method further comprises: setting a second task cleaning rule, and cleaning data of the processing result according to the second task cleaning rule.
6. The code task processing method according to claim 1, characterized in that before the step of determining a code data attribute from the processing result, the code task processing method further comprises: the following steps are circularly executed until the threshold value of the processing times is reached:
the processing result is used as a code task for data mining processing;
and carrying out data mining processing on the code task based on a data mining algorithm to obtain a processing result.
7. A code task processing device, comprising:
The code task acquisition module is used for acquiring a code task, wherein the code task indicates code data and corresponding code data operation information thereof; the code data operation information indicates an operation mode and an operation frequency corresponding to the code data, wherein the operation mode comprises at least one of the following modes: submitting, modifying and merging;
the processing module is used for carrying out data mining processing on the code task based on a data mining algorithm to obtain a processing result;
A code data attribute determining module, configured to determine a code data attribute according to the processing result, where the code data attribute indicates a maintenance value and development efficiency of the code data; the maintenance value of the code data is used to characterize the maintainability of the code data.
8. An electronic device, comprising:
one or more processors;
Storage means for storing one or more programs,
When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
9. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.
CN202010575040.4A 2020-06-22 2020-06-22 Code task processing method and device Active CN113778501B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010575040.4A CN113778501B (en) 2020-06-22 2020-06-22 Code task processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010575040.4A CN113778501B (en) 2020-06-22 2020-06-22 Code task processing method and device

Publications (2)

Publication Number Publication Date
CN113778501A CN113778501A (en) 2021-12-10
CN113778501B true CN113778501B (en) 2024-05-17

Family

ID=78835178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010575040.4A Active CN113778501B (en) 2020-06-22 2020-06-22 Code task processing method and device

Country Status (1)

Country Link
CN (1) CN113778501B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095408A (en) * 2016-05-31 2016-11-09 浙江网新恒天软件有限公司 A kind of system and method for data monitoring and Code automatic build and deployment
CN109583476A (en) * 2018-11-02 2019-04-05 中国科学院上海高等研究院 Software metrics method, system and electronic equipment based on software development process
US10324822B1 (en) * 2015-06-30 2019-06-18 EMC IP Holding Company LLC Data analytics in a software development cycle
CN109947462A (en) * 2019-03-15 2019-06-28 武汉大学 A kind of decision support method and device that the change of software-oriented code is integrated
CN110928930A (en) * 2020-02-10 2020-03-27 北京东方通科技股份有限公司 Software development behavior monitoring system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009089294A2 (en) * 2008-01-08 2009-07-16 Teamstudio, Inc. Methods and systems for generating software quality index

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10324822B1 (en) * 2015-06-30 2019-06-18 EMC IP Holding Company LLC Data analytics in a software development cycle
CN106095408A (en) * 2016-05-31 2016-11-09 浙江网新恒天软件有限公司 A kind of system and method for data monitoring and Code automatic build and deployment
CN109583476A (en) * 2018-11-02 2019-04-05 中国科学院上海高等研究院 Software metrics method, system and electronic equipment based on software development process
CN109947462A (en) * 2019-03-15 2019-06-28 武汉大学 A kind of decision support method and device that the change of software-oriented code is integrated
CN110928930A (en) * 2020-02-10 2020-03-27 北京东方通科技股份有限公司 Software development behavior monitoring system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于源代码挖掘的软件质量改进方法研究;楚燕婷;王丽琼;;电脑知识与技术;20091205(34);全文 *
面向软件工程数据挖掘的开发测试技术;王洋;;信息系统工程;20170220(02);全文 *

Also Published As

Publication number Publication date
CN113778501A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
CN109947788B (en) Data query method and device
WO2019072091A1 (en) Method and apparatus for use in determining tags of interest to user
US10042921B2 (en) Robust and readily domain-adaptable natural language interface to databases
US20170083569A1 (en) Natural language interface to databases
CN109359194B (en) Method and apparatus for predicting information categories
CN110689268B (en) Method and device for extracting indexes
CN110705271B (en) System and method for providing natural language processing service
CN111324786A (en) Method and device for processing consultation problem information
CN111104479A (en) Data labeling method and device
CN111125064A (en) Method and device for generating database mode definition statement
CN113626223A (en) Interface calling method and device
CN116594683A (en) Code annotation information generation method, device, equipment and storage medium
CN107291923B (en) Information processing method and device
CN112433713A (en) Application program design graph processing method and device
CN110895761A (en) Method and device for processing after-sale service application information
CN113742564A (en) Target resource pushing method and device
CN113778501B (en) Code task processing method and device
US10699329B2 (en) Systems and methods for document to order conversion
CN113296912B (en) Task processing method, device, system, storage medium and electronic equipment
CN111833085A (en) Method and device for calculating price of article
CN113760240B (en) Method and device for generating data model
CN109857838B (en) Method and apparatus for generating information
CN112819619A (en) Transaction processing method and device
CN113763083A (en) Information processing method and device
CN109101473B (en) Method and apparatus for processing two-dimensional data table

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant