WO2024078000A1 - 一种代码管理方法及相关设备 - Google Patents

一种代码管理方法及相关设备 Download PDF

Info

Publication number
WO2024078000A1
WO2024078000A1 PCT/CN2023/101370 CN2023101370W WO2024078000A1 WO 2024078000 A1 WO2024078000 A1 WO 2024078000A1 CN 2023101370 W CN2023101370 W CN 2023101370W WO 2024078000 A1 WO2024078000 A1 WO 2024078000A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
subtask
code
descriptions
function
Prior art date
Application number
PCT/CN2023/101370
Other languages
English (en)
French (fr)
Inventor
申博
梁广泰
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024078000A1 publication Critical patent/WO2024078000A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/72Code refactoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the field of artificial intelligence (AI) technology, and in particular to a code management method, system, computing device cluster, computer-readable storage medium, and computer program product.
  • AI artificial intelligence
  • CLM causal language model
  • PLM pre-trained language model
  • AI-based code generation tools are highly dependent on the input natural language description.
  • the quality, level and details of the natural language description have a huge impact on the generation effect, resulting in the overall low accuracy of the code generated by the AI-based code generation tools, which is difficult to meet business needs.
  • the present application provides a code management method, which introduces the decomposition of task descriptions, thereby decomposing tasks into multiple general tasks or atomic tasks (tasks that do not support further decomposition), thereby improving the accuracy of code generation on complex multi-step tasks, having a good code generation effect, and being able to meet business needs.
  • the present application also provides a code management system, a computing device cluster, a computer-readable storage medium, and a computer program product corresponding to the method.
  • the present application provides a code management method.
  • the method can be executed by a code management system.
  • the code management system can be a software system, and the software system can be deployed in a computing device cluster.
  • the computing device cluster executes the code management method of the present application by executing the program code of the software system.
  • the code management system can also be a hardware system with a code management function, and when the hardware system is running, the code management method of the present application is executed.
  • the code management system can be a computing device cluster with a code management function.
  • the code management system can receive a task description input by a user, decompose the task description into multiple subtask descriptions, and then generate multiple subtask codes according to the multiple subtask descriptions, wherein the multiple subtask codes correspond to the multiple subtask descriptions one by one.
  • the code management system may decompose the task description into reference descriptions of multiple subtasks through a task description decomposition model, and obtain multiple subtask descriptions based on user feedback on the reference descriptions of the multiple subtasks.
  • This method automatically decomposes the task description through the task description decomposition model, generating finer-grained and more imperative subtask descriptions, thereby further improving the accuracy of code generation.
  • the user's feedback on the reference descriptions of multiple subtasks may include: confirmation, modification or supplement.
  • This method introduces user feedback, and the user confirms, modifies or supplements the decomposition result, so that the decomposition of the subtask description is more accurate, and the accuracy of the generated code is further improved.
  • the code management system can extract task description examples and subtask description examples from the programming language corpus, and train the task description decomposition model through a generative pre-training method based on the task description examples and subtask description examples.
  • the task description decomposition model takes the task description as input and the subtask description as output.
  • this method extracts samples from the programming language corpus to train the task description decomposition model, so as to obtain better training results and thus improve the accuracy of task description decomposition.
  • the code management system may decompose the task description into multiple first subtask descriptions and present the multiple first subtask descriptions to the user.
  • the code management system may decompose the target subtask description in the multiple first subtask descriptions into multiple second subtask descriptions.
  • this method introduces human-computer interaction, allowing the user to determine whether to continue decomposing the subtask description, thereby achieving more accurate task description decomposition.
  • the code management system may also present comments on the subtask code to the user.
  • the comments on the subtask code may include a subtask description. This method allows the user to intuitively obtain the subtask description corresponding to the subtask code, thereby facilitating the user to provide feedback on the subtask code.
  • the code management system can generate one or more of code snippets, calls to library functions, or calls to custom functions based on multiple subtask descriptions. This method is based on reverse generation of code and uses a code snippet completion algorithm to automatically select the generated code form based on the granularity of the subtask description.
  • the code management system can also generate a declaration and implementation code of the custom function based on the call to the custom function and the context.
  • This method uses a function reverse generation algorithm to reversely generate declarations and definitions for functions that are called but do not yet exist, thereby making the code generation process more in line with the best practices of software development and refactoring.
  • the code management system can generate a declaration of a custom function based on the call and context of the custom function.
  • the declaration of the custom function can include one or more of the annotation, parameter list, parameter type, and return value type of the custom function.
  • the code management system can generate the implementation code of the custom function based on the declaration of the custom function.
  • the method can automatically generate the declaration of the custom function from the function call statement and its context, thereby generating the implementation code of the custom function to complete code generation.
  • the code management system may also receive user feedback on the declaration of the custom function, and update the declaration of the custom function according to the user feedback on the declaration of the custom function. This method introduces user feedback so that the user can confirm, modify or supplement the generated declaration of the custom function, thereby ensuring the accuracy of the declaration of the custom function.
  • the code management system can decompose the declaration of a custom function and generate implementation code for the custom function based on the decomposition result. This method can trigger code generation again using the declaration of the custom function as a subtask, thereby gradually making the code more complete.
  • the code management system may store the codes of the multiple subtasks in an output path specified by the user, thereby facilitating the user to manage the generated codes.
  • the code management system may be an integrated development environment (IDE).
  • the IDE may include a local IDE or a cloud IDE.
  • the IDE has a code generation capability or plug-in based on task description decomposition. When the above capability or plug-in of the IDE is triggered, the IDE executes the steps of receiving the task description input by the user, decomposing the task description into multiple subtask descriptions, and generating the code of the subtasks according to the multiple subtask descriptions. In this way, it is convenient for developers to carry out software development and improve development efficiency.
  • the code management system may be a cloud service that has a code generation interface.
  • the cloud service may perform the steps of receiving a task description input by a user, decomposing the task description into multiple subtask descriptions, and generating subtask codes according to the multiple subtask descriptions.
  • the cloud service can provide code generation services for a large number of developers to meet business needs.
  • the present application provides a code management system.
  • the system comprises:
  • An interaction module used for receiving a task description input by a user
  • a decomposition module used for decomposing the task description into multiple subtask descriptions
  • a generating module is used to generate codes of multiple subtasks according to the multiple subtask descriptions, and the codes of the multiple subtasks are consistent with the multiple subtask descriptions.
  • the subtask descriptions correspond one to one.
  • the decomposition module is specifically used to:
  • the multiple subtask descriptions are obtained according to the user's feedback on the reference descriptions of the multiple subtasks.
  • the user's feedback on the reference descriptions of the multiple subtasks includes: confirmation, modification, or supplementation.
  • the task description decomposition model is trained in the following manner:
  • the task description decomposition model is trained according to the task description examples and the subtask description examples through a generative pre-training method, wherein the task description decomposition model takes the task description as input and takes the subtask description as output.
  • the decomposition module is specifically used to:
  • the target subtask description in the plurality of first subtask descriptions is decomposed into a plurality of second subtask descriptions.
  • the interaction module is further configured to:
  • An annotation of the code of the subtask is presented to the user, the annotation of the code of the subtask including the subtask description.
  • the generating module is specifically used to:
  • One or more of a code snippet, a call to a library function, or a call to a user-defined function is generated according to the multiple subtask descriptions.
  • the generating module is further configured to:
  • a declaration and an implementation code of the user-defined function are generated.
  • the generating module is specifically used to:
  • the declaration of the custom function includes one or more of a comment, a parameter list, a parameter type, and a return value type of the custom function;
  • an implementation code of the user-defined function is generated.
  • the interaction module is further configured to:
  • the declaration of the user-defined function is updated according to the user's feedback on the declaration of the user-defined function.
  • the generating module is specifically used to:
  • the implementation code of the user-defined function is generated.
  • the present application provides a computing device cluster.
  • the computing device cluster includes at least one computing device, and the at least one computing device includes at least one processor and at least one memory.
  • the at least one processor and the at least one memory communicate with each other.
  • the at least one processor is used to execute instructions stored in the at least one memory, so that the computing device or the computing device cluster executes the code management method as described in the first aspect or any implementation of the first aspect.
  • the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores instructions, wherein the instructions instruct a computing device or a computing device cluster to execute the code management method described in the first aspect or any one of the implementations of the first aspect.
  • the present application provides a computer program product comprising instructions, which, when executed on a computing device or a computing device cluster, enables the computing device or the computing device cluster to execute the code management method described in the first aspect or any one of the implementations of the first aspect.
  • FIG1 is a schematic diagram of the architecture of a code management system provided in an embodiment of the present application.
  • FIG2 is a flow chart of a code management method provided in an embodiment of the present application.
  • FIG3 is a schematic diagram of an application scenario of a code management method provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of a front-end interface provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of a human-computer interaction interface provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of a modified or filled human-computer interaction interface provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of another front-end interface provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of a code generation result provided in an embodiment of the present application.
  • FIG9 is a schematic diagram of triggering reverse generation of an undefined function provided by an embodiment of the present application.
  • FIG10 is a schematic diagram of an import statement or a re-implementation to generate a function definition of an undefined function provided by an embodiment of the present application;
  • FIG11 is a schematic diagram of an application scenario of another code management method provided in an embodiment of the present application.
  • FIG12 is a schematic diagram of the structure of a computing device provided in an embodiment of the present application.
  • FIG13 is a schematic diagram of the structure of a computing device cluster provided in an embodiment of the present application.
  • FIG14 is a schematic diagram of the structure of a computing device cluster provided in an embodiment of the present application.
  • FIG. 15 is a schematic diagram of the structure of a computing device cluster provided in an embodiment of the present application.
  • first and second in the embodiments of the present application are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features.
  • Code generation refers to the technology that uses artificial intelligence (AI) to assist developers in developing code.
  • Code generation can be divided into the following categories: code generation from code (also known as code2code) and code generation from text (also known as text2code).
  • text2code specifically, generates code in a specific programming language from a natural language description, thereby realizing the needs expressed by the user in a natural language description.
  • the working process of the code generator corresponding to text2code is similar to the developer first writing a code comment, and then the code generator generates a code snippet corresponding to the function described in the comment, which is presented in the form of a recommendation. The developer decides to accept or reject the recommendation, or accept it and make further modifications.
  • the present application provides a code management method.
  • the method can be executed by a code management system.
  • the code management system can be a software system, which can be deployed in a computing device cluster, and the computing device cluster executes the program code of the software system, thereby executing the code management method of the present application.
  • the code management system is used for scenarios where developers write code in a code editor or an integrated development environment (IDE). Based on this, the code management system can directly serve users in the form of an IDE plug-in.
  • the code management system can also be provided to other tools in the form of cloud services or capabilities, and called in the form of an application programming interface (API).
  • the code management system can also be a hardware system, and the code management method of the present application is executed when the hardware system is running.
  • the following is an example of a software system using a code management system.
  • the code management system receives a task description input by a user, decomposes the task description into multiple subtask descriptions, and then generates multiple subtask codes according to the multiple subtask descriptions, wherein the multiple subtask codes correspond to the multiple subtask descriptions one by one.
  • This method introduces the decomposition of task descriptions, thereby decomposing tasks into multiple general tasks or atomic tasks (tasks that do not support further decomposition), improving the accuracy of code generation on complex multi-step tasks, having good code generation effects, and being able to meet business needs. Furthermore, this method can also introduce user feedback, allowing users to confirm, modify or supplement the decomposition results, so that the decomposition of subtask descriptions is more accurate, and the accuracy of the generated code can be further improved.
  • the code management system 100 includes an interaction module 102, a decomposition module 104, and a generation module 106. The functions of each module are introduced below.
  • the interaction module 102 is used to receive a task description input by a user, which can be a natural language description. Since the task description is originally input by the user, it is also called an original description. For example, the original description can be "Get the default branch of a repo on GitHub".
  • the interaction module 102 can receive the task description input by the user through a code editing interface in the interaction interface.
  • the interaction interface can be a graphical user interface (GUI) or a command user interface (CUI).
  • the decomposition module 104 is used to decompose the task description into multiple subtask descriptions.
  • the subtask description is obtained by decomposing the task description, so the subtask description can also be called a decomposed description.
  • the task description can usually be used as a file, class or function level comment, and the subtask description can usually be used as a code block or line level comment.
  • the subtask description can be "clone the repo", "run git command" and "print branch name”.
  • the decomposition module 104 may decompose the task description into multiple task sub-descriptions through the task description decomposition model.
  • the decomposition module 104 may decompose the task description into multiple reference descriptions of sub-tasks through the task description decomposition model, and the interaction module 102 is also used to present the reference descriptions of multiple sub-tasks to the user, receive the user's feedback on the reference descriptions of multiple sub-tasks, and the decomposition module 104 obtains the descriptions of the multiple sub-tasks based on the user's feedback on the reference descriptions of the multiple sub-tasks.
  • the user's feedback on the reference descriptions of the multiple sub-tasks may include confirmation, modification, or supplementation.
  • the generation module 106 is used to generate the codes of the multiple subtasks according to the multiple subtask descriptions.
  • the codes of the multiple subtasks correspond to the multiple subtask descriptions one by one.
  • the interaction module 102 is also used to present the codes of the subtasks generated by the generation module 106 to the user, receive the user's feedback on the codes of the subtasks, and the feedback may include confirmation, modification or supplementation, and then the generation module 106 may update the codes of the multiple subtasks according to the user's feedback on the codes of the multiple subtasks.
  • the embodiment of the present application further provides a code management method, which will be described in detail below in conjunction with the accompanying drawings.
  • the method includes:
  • S202 The code management system 100 receives a task description input by a user.
  • the task description input by the user may be a natural language description, which may be a higher-level task description, such as a file, class, or function-level biological description, used to generate a file, function, or class.
  • the task description uses natural language to describe the function or specific implementation of the task, and thus may serve as a comment for the code.
  • the task description may include a keyword that represents the comment, such as "#".
  • the code management system 100 can present a code editing interface to the user, which can be a GUI or a CUI.
  • the user can enter a natural language description starting with the keyword representing the annotation through the GUI or CUI, and the code management system 100 can receive the above natural language description.
  • S204 The code management system 100 decomposes the task description into a plurality of first subtask descriptions.
  • the code management system 100 can decompose the task description through the task description decomposition model.
  • the task description decomposition model automatically decomposes the task description (ie, the original description, the original annotation) to generate a more fine-grained and more imperative subtask description.
  • the code management system 100 can directly use the description automatically decomposed by the task description decomposition model as the first subtask description, or can use the description automatically decomposed by the task description decomposition model as a reference description, introduce human-computer interaction, and allow the user to determine the first subtask description based on the reference description.
  • S206 The code management system 100 presents the multiple first subtask descriptions to the user. When the user triggers the decomposition operation, S208 is executed; otherwise, S210 is executed.
  • the code management system 100 when the code management system 100 directly uses the description automatically decomposed by the task description decomposition model as the first subtask description, the code management system 100 presents the description automatically decomposed by the task description decomposition model to the user.
  • the code management system 100 introduces human-computer interaction, the code management system 100 presents the description automatically decomposed by the task description decomposition model to the user, receives user feedback on the description, such as confirmation, modification or supplement to the description, and the code management system 100 can obtain multiple first subtask descriptions based on the user's feedback on the description, and present the multiple first subtask descriptions to the user.
  • the user can judge whether multiple first subtask descriptions can be further decomposed.
  • the user can trigger the decomposition operation.
  • the code management system 100 can execute S208 in response to the decomposition operation.
  • the code management system 100 can directly execute S210.
  • the code management system 100 decomposes the target subtask description in the plurality of first subtask descriptions into a plurality of second subtask descriptions.
  • the target subtask description can be further decomposed, that is, the target subtask description is a high-level task description, and the code management system 100 can input the target subtask description into the task description decomposition model to obtain a more fine-grained subtask description. Similar to decomposing the task description into multiple first subtask descriptions, the code management system 100 can directly use the description automatically decomposed by the task description decomposition model as the second subtask description, or introduce human-computer interaction, and the user can provide feedback on the description automatically decomposed to obtain the second subtask description.
  • the code management system 100 can execute S210, otherwise the code management system 100 can continue to decompose the descriptions that support decomposition in the second subtask descriptions.
  • S210 The code management system 100 generates codes for multiple subtasks according to the multiple subtask descriptions.
  • the multiple subtask descriptions can be multiple first subtask descriptions.
  • the multiple subtask descriptions can be the first subtask description other than the target subtask description and the second subtask description decomposed from the target subtask description.
  • the code management system 100 can determine the implementation form of the subtask, and then generate the code of the subtask in the corresponding form. For example, for simple steps, the code management system 100 directly generates code snippets. Among them, the code snippets may include simple statements or code blocks, and the code blocks include but are not limited to variable declarations, assignment statements, branches or loop structures. For another example, for steps that can be implemented using library functions (including but not limited to APIs of standard libraries or third-party libraries), the code management system 100 generates calls to corresponding library functions. For another example, for more complex steps, the code management system 100 generates calls to custom functions, specifically generating one or more function call statements. The called custom function may come from other locations in the project, or it may not exist.
  • library functions including but not limited to APIs of standard libraries or third-party libraries
  • the code management system 100 can generate the declaration and implementation code of the custom function based on the call to the custom function and the context. Specifically, the code management system 100 can apply software analysis technology to convert the context information of the function call (such as the comments that modify the function, the actual parameters passed into the function, the use of the function return value, etc.) into the information in the function declaration (such as function-level comments, parameter list and type, return value type, etc.), and generate the signature part of the function definition accordingly.
  • the context information of the function call such as the comments that modify the function, the actual parameters passed into the function, the use of the function return value, etc.
  • the information in the function declaration such as function-level comments, parameter list and type, return value type, etc.
  • This part can be used as a subtask and continue to be input into the task description decomposition model for further decomposition; if the granularity is atomic enough, it can be directly input into the code generation model for implementation, and the generated code can be added to the function definition as its function body.
  • S212 The code management system 100 presents the codes of the multiple subtasks to the user.
  • S214 The code management system 100 receives user feedback on the codes of the multiple subtasks.
  • S216 The code management system 100 updates the codes of the subtasks according to the user's feedback on the codes of the multiple subtasks.
  • the code management system 100 may also introduce human-computer interaction, where users may provide feedback on automatically generated codes, such as confirming, modifying or supplementing the automatically generated codes, thereby ensuring the accuracy of the codes.
  • steps S212 to S216 are optional steps of the embodiment of the present application, and the code management method of the embodiment of the present application may not perform the above steps S212 to S216.
  • the code management system 100 may directly automatically generate the code as the implementation code of the subtask.
  • the code management method provided in the embodiment of the present application automatically decomposes the abstract high-level task description into subtask descriptions, and generates corresponding codes for the fine-grained subtask descriptions, which can improve the accuracy of the code.
  • the method can introduce human-computer interaction, and the user can confirm the automatically generated subtask descriptions, allowing the user to modify and supplement them, so that the input of the AI model used to generate code is more accurate, further improving the accuracy of the code.
  • This method can dynamically determine the generation of code snippets, library function calls, or custom function calls based on the subtask granularity confirmed by the user.
  • the final code naturally has a clearer structure and comments by using different structures, without the need for manual annotations by users, thus reducing interaction costs.
  • the comments, form, parameters, return type, etc. of the function call expected by the user in the generated code can be analyzed to automatically generate a function declaration, and then the function declaration is decomposed as a task description.
  • the decomposed subtask granularity is more atomic, which is conducive to maximizing the advantages of the AI model in generating general code.
  • the embodiment of the present application innovatively introduces the decomposition of task descriptions, user interaction and feedback, and reverse generation technology of functions before and after the general code generation process, which is more effective than similar tools when facing complex multi-step tasks.
  • the process of code generation by the code management system 100 can be divided into the following two stages: a task description decomposition stage of human-computer interaction and a code generation stage with reverse generation as the core.
  • the user provides the initial task description (file/class/function-level comments), and the task description decomposition model automatically decomposes the original comments to generate more fine-grained, more imperative, subtask descriptions, and presents them to the user in the form of code block/line-level comments.
  • the user can read the generated subtask descriptions and choose to confirm, modify or supplement. When the modification or supplement is completed, the user can continue to confirm. After the user completes the confirmation, the code generation phase begins.
  • the code management system 100 implements the code for the subtask according to the subtask description confirmed by the user.
  • the step-by-step subtask description can be gradually input into a code snippet completion algorithm from top to bottom, and the algorithm can automatically select the generated code form according to the granularity of the step.
  • the code form can include code snippets, calls to library functions, or calls to custom functions. The user can read the code and choose to confirm, modify, or supplement.
  • the called custom function may come from other locations in the project, or it may not exist.
  • the code generation system 100 can use a function reverse generation algorithm to reversely generate its declaration and definition for the called but non-existent function.
  • the code generation system 100 can use a function reverse generation algorithm to first generate a function declaration, and the function declaration includes at least one of a function-level annotation, a parameter list and type, and a return value type.
  • the code generation system 100 can generate a signature of a function definition based on the function declaration.
  • the signature of the function definition can be input as a subtask description into the task description decomposition model for further decomposition.
  • the custom function is atomic enough and can also be directly input into the code generation model, and the code generated by the code generation model can be added to the function definition as a function body.
  • the main implementation form of this embodiment is as an extended function or plug-in of a code editor or IDE, so the front-end interface is a code generation auxiliary tool mainly embedded in the IDE.
  • the code generation tool thinks that the repo variable has an object and simply returns its branch name attribute; however, the function actually accepts the full name of a GitHub repository, which may not exist locally in the user, but this is not reflected in the existing signature and comments. Therefore, it is necessary to further clarify the user's needs and conditions through human-computer interaction. For example, the user can click "Generate" to trigger code generation and enter the human-computer interaction interface.
  • the code management system 100 will first try to generate a number of more specific step-by-step comments (such as block-level and line-level comments in the function body) from it.
  • the task of obtaining the default branch name of a GitHub repository is divided into three steps: downloading the repository to the local computer, running the Git command, and printing out the default branch name.
  • this decomposition solution is too heavy for the lightweight operation of only obtaining the default branch name, and still does not meet user needs.
  • the tool will try to generate an unsigned comment again.
  • Different decomposition schemes In order to make the generated decomposition scheme closer to the actual needs, users can modify the function signature and function annotations to add more information and conditions, such as adding parameter and return value types, specifying conditions that need to be met, etc.
  • the tool will generate another scheme, which includes three steps: creating a request in GitHub GraphQL format, sending the request and obtaining a response, and parsing the response and returning the branch name field contained therein.
  • decomposing the task description into subtask descriptions can be achieved through a task description decomposition model.
  • the task description decomposition model is obtained by training an AI model.
  • comments and codes often appear alternately, and there is a nested relationship between comments at different levels, which records the step decomposition process of the developer when writing code.
  • comments at the file, class, and function levels often introduce the overall function or usage of the part of the code from a higher level of abstraction, while line-level and block-level comments are mostly explanations of the functions of the modified code snippets, as step descriptions or detailed supplements to the function comments in which they are located.
  • the code management system 100 can first extract task description samples and subtask description samples from the program language corpus, for example, extracting comments at all levels from a massive source code, high-level comments are task description samples, and sub-comments of high-level comments are subtask description samples.
  • the task description decomposition model is trained by a generative pre-training method.
  • the task description decomposition model takes task descriptions as input and subtask descriptions as output.
  • FIG8 shows a schematic diagram of a code generation result.
  • the IDE prompts several error messages at this time because the generated code contains a call to an undefined function.
  • the user expects that a certain function can be implemented through a part of the code inside the function.
  • each function can be regarded as a subtask of the original task.
  • the code management system 100 introduces a reverse generation function. From an interactive perspective, the working process of the reverse generation function is: after the user confirms that the called code is correct, he can press a shortcut key (such as Alt+Enter) at the function call or select a comment and right-click to select "Generate Code" (as shown in Figure 9). At this time, the code management system 100 can automatically generate a function declaration for the function, including the function's signature, parameter list and type, return value type, comments and other elements.
  • a shortcut key such as Alt+Enter
  • the code management system 100 automatically generates an import statement for the function instead of re-implementing it; otherwise, the function declaration part of the function can be input into the task description decomposition model for further decomposition, or directly input into the code generation model to generate the implementation code.
  • the key to the reverse generation of the code management system 100 is to analyze the function call to obtain the information that the signature part of its definition should contain.
  • the various elements of the signature part and the corresponding implementation methods are as follows:
  • Function comments By parsing the function call statement, locate the line-level comments that modify the statement, format the comments as function-level comments, copy them to the function definition location and adapt them to the context code format;
  • Function name and parameter type Use the program slicing technology in software analysis to extract the context code that has data flow and control flow dependencies with the function call, especially the statements related to the function parameters; use type inference technology to derive the function parameter type, and correspond it to the parameter name one by one, and bring it into the function definition position in a grammatically correct form;
  • Return type Use backward program slicing and type inference techniques to infer the type of return that the function is expected to give based on the definition and use of the return value at the calling site, and bring it into the function definition location in a grammatically correct form.
  • the IDE of the Jetbrains series provides detection, prompts and automatic generation of signatures for non-existent functions, but can only simply generate function signature parts according to rules, and cannot include important information such as comments and data types.
  • the reverse generation function in the embodiment of the present application is compatible with and reuses this ability provided by IDE in the form of embodiment, and further introduces software analysis technology to supplement more important information, so that the input of the code generation model or the task description decomposition model is more accurate.
  • the implementation code i.e., function body
  • the decomposed subtask is easier to describe clearly for the developer, and it is also easier to generate correct code for the model.
  • the user can also modify the result generated by the debugging model by the unit test for the subtask during the generation process, so as to achieve the effect of divide and conquer and gradually generate.
  • the code management system 100 is provided to other tools in the form of cloud services and is called through an API interface.
  • the API can provide the following capabilities:
  • Code completion based on task granularity The input is the decomposed comments, and the output is the generated code.
  • the code can include code snippets (such as simple statements, code blocks), statements, calls to library functions, and calls to custom functions.
  • the cloud service of the embodiment of the present application is actually independent of the specific code generation technology, the above capabilities are common among different code generation technologies and can be integrated by different tools to improve the user experience of code generation as a whole.
  • the capability 1 provided in the embodiment of the present application can be used before actually triggering code generation: first decompose the task description through user interaction, and after confirming the subtask description, the user can select one or more code generation tools (such as Copilot, Tabnine, etc.) or completion tools (such as IDE built-in completion and recommendation tools, etc.) to implement the code.
  • code generation tools such as Copilot, Tabnine, etc.
  • completion tools such as IDE built-in completion and recommendation tools, etc.
  • capability 2 can be used to select the tool and sort the results (for example, for the part that can be implemented with a simple line of code snippet, directly call the line-level completion and sort the results first; for the part that needs to be implemented using the library API, sort the API association results in the IDE first).
  • capability 3 can automatically generate function declarations and comments from the function call statement and its context, and trigger code generation again as a subtask, thereby gradually making the code more complete, and the correctness of each step can be verified through unit testing, thereby ultimately solving the original problem.
  • the embodiment of the present application also provides a code management system 100 as described above.
  • the code management system 100 is introduced below in conjunction with the accompanying drawings.
  • the system 100 includes:
  • An interaction module 102 is used to receive a task description input by a user
  • a decomposition module 104 used to decompose the task description into multiple subtask descriptions
  • the generating module 106 is configured to generate a plurality of subtask codes according to the plurality of subtask descriptions, wherein the plurality of subtask codes correspond to the plurality of subtask descriptions in a one-to-one manner.
  • the above-mentioned interaction module 102, decomposition module 104 and generation module 106 can be implemented by hardware modules or by software modules. Among them, the interaction module 102 can be implemented by a transceiver or software on a transceiver. The decomposition module 104 and the generation module 106 can be implemented by a computing device or a computing engine on a computing device. Below, the decomposition module 104 is taken as an example for description.
  • the decomposition module 104 may be an application or an application module, such as a computing engine, running on a computing device or a computing device cluster.
  • the decomposition module 104 may include at least one computing device, such as a server, etc.
  • the decomposition module 104 may also be a device implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD), etc.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL generic array logic
  • the decomposition module 104 is specifically configured to:
  • the task description is decomposed into reference descriptions of multiple subtasks
  • the user's feedback on the reference descriptions of the multiple subtasks includes: confirmation, modification, or supplementation.
  • the task description decomposition model is trained in the following way:
  • the task description decomposition model is trained through the generative pre-training method.
  • the task description decomposition model takes the task description as input and the subtask description as output.
  • the decomposition module 104 is specifically configured to:
  • the target subtask description in the plurality of first subtask descriptions is decomposed into a plurality of second subtask descriptions.
  • the interaction module 102 is further configured to:
  • An annotation of the code of the subtask is presented to the user, wherein the annotation of the code of the subtask includes a subtask description.
  • the generating module 106 is specifically configured to:
  • One or more of a code snippet, a call to a library function, or a call to a user-defined function is generated according to the multiple subtask descriptions.
  • the generating module 106 is further configured to:
  • the generating module 106 is specifically configured to:
  • the declaration of the custom function includes one or more of a comment, a parameter list, a parameter type, and a return value type of the custom function;
  • the interaction module 102 is further configured to:
  • the generating module 106 is specifically configured to:
  • the present application also provides a computing device 1200.
  • the computing device 1200 includes: a bus 1202, a processor 1204, a memory 1206, and a communication interface 1208.
  • the processor 1204, the memory 1206, and the communication interface 1208 communicate with each other through the bus 1202.
  • the computing device 1200 may be a server or a terminal device. It should be understood that the present application does not limit the number of processors and memories in the computing device 1200.
  • the bus 1202 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • the bus may be divided into an address bus, a data bus, a control bus, etc.
  • FIG. 12 is represented by only one line, but does not mean that there is only one bus or one type of bus.
  • the bus 1202 may include a path for transmitting information between various components of the computing device 1200 (e.g., the memory 1206, the processor 1204, and the communication interface 1208).
  • Processor 1204 may include any one or more of a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP), or a digital signal processor (DSP).
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • the memory 1206 may include a volatile memory (volatile memory), such as a random access memory (RAM).
  • the memory 1206 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD) or a solid state drive (SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • the memory 1206 stores executable program code, and the processor 1204 executes the executable program code to implement the aforementioned code management method.
  • the memory 1206 stores instructions for the code management system 100 to execute the code management method.
  • the communication interface 1208 uses a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 1200 and other devices or a communication network.
  • a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 1200 and other devices or a communication network.
  • the embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device can be a server, such as a central server, an edge server, or a local server in a local data center.
  • the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smart phone.
  • the computing device cluster includes at least one computing device 1200.
  • the memory 1206 in one or more computing devices 1200 in the computing device cluster may store the same code management system 100 for executing instructions of the code management method.
  • one or more computing devices 1200 in the computing device cluster may also be used to execute some instructions of the code management system 100 for executing the code management method.
  • a combination of one or more computing devices 1200 may jointly execute instructions of the code management system 100 for executing the code management method.
  • the memory 1206 in different computing devices 1200 in the computing device cluster may store different instructions for executing partial functions of the code management system 100 .
  • FIG14 shows a possible implementation.
  • two computing devices 1200A and 1200B are connected via a communication interface 1208.
  • the memory in the computing device 1200A stores instructions for executing the functions of the interaction module 102.
  • the memory in the computing device 1200B stores instructions for executing the functions of the decomposition module 104 and the generation module 106.
  • the memories 1206 of the computing devices 1200A and 1200B jointly store instructions for the code management system 100 to execute the code management method.
  • connection mode between the computing device clusters shown in FIG14 may be considered that the code management method provided in this application needs to receive the task description input by the user and decompose the task description to generate code. Therefore, it is considered that the functions implemented by the interaction module 102 are executed by the computing device 1200A, and the functions implemented by the decomposition module 104 and the generation module 106 are executed by the computing device 1200B.
  • computing device 1200A shown in FIG14 may also be accomplished by multiple computing devices 1200.
  • functionality of the computing device 1200B may also be accomplished by multiple computing devices 1200.
  • one or more computing devices in the computing device cluster can be connected via a network.
  • the network can be a wide area network or a local area network, etc.
  • FIG. 15 shows a possible implementation. As shown in FIG. 15 , two computing devices 1200C and 1200D are connected via a network. Specifically, the network is connected via a communication interface in each computing device.
  • the memory 1206 in the computing device 1200C stores instructions for executing the functions of the interaction module 102.
  • the memory 1206 in the computing device 1200D stores instructions for executing the functions of the decomposition module 104 and the generation module 106.
  • connection mode between the computing device clusters shown in FIG15 may be based on the fact that the code management method provided in the present application needs to receive the task description input by the user and decompose the task description to generate code. Therefore, it is considered that the functions implemented by the interaction module 102 are handed over to the computing device 1200C for execution, and the functions implemented by the decomposition module 104 and the generation module 106 are executed by the computing device 1200D. It should be understood that the functions of the computing device 1200C shown in FIG15 can also be completed by multiple computing devices 1200. Similarly, the functions of the computing device 1200D can also be completed by multiple computing devices 1200.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium can be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media.
  • the available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk).
  • the computer-readable storage medium includes instructions that instruct the computing device to execute the above-mentioned application to the code management system for executing the code management method.
  • the embodiment of the present application also provides a computer program product including instructions.
  • the computer program product may be software or a program product including instructions that can be run on a computing device or stored in any available medium.
  • the at least one computing device executes the above code management method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Multi Processors (AREA)
  • Stored Programmes (AREA)

Abstract

本申请提供了一种代码管理方法,包括:接收用户输入的任务描述,将该任务描述分解为多个子任务描述,根据多个子任务描述生成多个子任务的代码,其中,多个子任务的代码与多个子任务描述一一对应。该方法通过引入对任务描述的分解,从而将任务分解为多个通用任务或原子任务(不支持进一步分解的任务),提高了在复杂的多步骤任务上代码生成的正确率,具有较好的代码生成效果,能够满足业务需求。

Description

一种代码管理方法及相关设备
本申请要求于2022年10月13日提交中国国家知识产权局、申请号为202211255384.2、发明名称为“一种代码管理方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能(artificial intelligence,AI)技术领域,尤其涉及一种代码管理方法、系统、计算设备集群、计算机可读存储介质、计算机程序产品。
背景技术
代码生成(code generation)或程序合成(program synthesis)技术,一直是软件工程(software engineering,SE)和人工智能领域学术研究中的热点,并且因其巨大的商业价值而备受工业界关注。近年来,得益于人工智能研究在自然语言处理(natural language processing,NLP)和程序语言处理(programming language processing,PLP)方面取得的成果,两个领域技术的结合将代码生成相关技术从学术研究逐步推向实际应用。为了提高软件开发效率,各种基于AI的代码生成工具应运而生。
目前,基于AI的代码生成工具通常依赖基于大规模的预训练语言模型(pre-trained language model,PLM),在海量程序语言语料(如代码)上继续训练得到的因果语言模型(causal language model,CLM)。CLM可以根据用户输入的自然语言描述,生成特定程序语言的代码,从而实现用户以自然语言描述所表达的需求。
然而,上述基于AI的代码生成工具对所输入的自然语言描述高度依赖,自然语言描述的质量、层次、细节对生成效果影响巨大,导致基于AI的代码生成工具生成的代码的正确率整体较低,难以满足业务需求。
发明内容
本申请提供了一种代码管理方法,该方法通过引入对任务描述的分解,从而将任务分解为多个通用任务或原子任务(不支持进一步分解的任务),提高了在复杂的多步骤任务上代码生成的正确率,具有较好的代码生成效果,能够满足业务需求。本申请还提供了该方法对应的代码管理系统、计算设备集群、计算机可读存储介质以及计算机程序产品。
第一方面,本申请提供了一种代码管理方法。该方法可以由代码管理系统执行。代码管理系统可以是软件系统,该软件系统可以部署在计算设备集群中。计算设备集群通过执行软件系统的程序代码,从而执行本申请的代码管理方法。在一些实施例中,代码管理系统也可以是具有代码管理功能的硬件系统,该硬件系统运行时,执行本申请的代码管理方法。例如,代码管理系统可以是具有代码管理功能的计算设备集群。
具体地,代码管理系统可以接收用户输入的任务描述,将该任务描述分解为多个子任务描述,然后根据多个子任务描述生成多个子任务的代码,其中,多个子任务的代码与多个子任务描述一一对应。
区别于传统的代码生成工具按照自左向右的顺序生成代码,该方法在开发过程中更自然地遵循了从整体到部分的分治思维,分解后的自然语言描述和相应子任务,对于用户更容易理解,也更容易生成,而且还能进行代码复用,提高了代码生成效率和质量。
在一些可能的实现方式中,代码管理系统可以通过任务描述分解模型,将任务描述分解为多个子任务的参考描述,并根据用户对多个子任务的参考描述的反馈,获得多个子任务描述。
该方法通过任务描述分解模型,自动分解任务描述,生成更细粒度、更加命令式的子任务描述,从而进一步提升代码生成的正确率。
在一些可能的实现方式中,用户对多个子任务的参考描述的反馈可以包括:确认、修改或补充。该方法引入用户反馈,由用户对分解结果进行确认、修改或补充,如此使得分解得到子任务描述更加精准,由此生成的代码的正确率得以进一步提升。
在一些可能的实现方式中,代码管理系统可以通过从程序语言语料中抽取任务描述样例以及子任务描述样例,并根据任务描述样例以及子任务描述样例,通过生成式预训练方法训练任务描述分解模型。其中,任务描述分解模型以任务描述为输入,以子任务描述为输出。
由于在软件开发过程中,注释与代码经常交替出现,且不同层次的注释之间存在嵌套关系,该方法从程序语言语料中抽取样例,以进行任务描述分解模型的训练,以此获得更好的训练效果,从而提升任务描述分解的准确度。
在一些可能的实现方式中,代码管理系统可以将任务描述分解为多个第一子任务描述,并向用户呈现多个第一子任务描述,当用户触发分解操作,代码管理系统可以将多个第一子任务描述中的目标子任务描述分解为多个第二子任务描述。
考虑到高层级的任务可以存在嵌套,该方法引入人机交互,由用户判断是否进行子任务描述的继续分解,从而实现更加精准的任务描述分解。
在一些可能的实现方式中,代码管理系统还可以向用户呈现子任务的代码的注释。其中,子任务的代码的注释可以包括子任务描述。该方法使得用户可以直观地获取子任务的代码对应的子任务描述,从而便于用户对子任务的代码进行反馈。
在一些可能的实现方式中,代码管理系统可以根据多个子任务描述,生成代码片段、对库函数的调用或对自定义函数的调用的一种或多种。该方法基于代码的反向生成,利用代码片段补全算法,能够根据子任务描述的粒度,实现自动选择生成的代码形式的功能。
在一些可能的实现方式中,当自定义函数未定义时,代码管理系统还可以根据对自定义函数的调用以及上下文,生成自定义函数的声明和实现代码。该方法通过函数反向生成算法,为被调用但尚不存在的函数反向生成声明和定义,从而使得代码生成过程更加符合软件开发和重构的最佳实践。
在一些可能的实现方式中,代码管理系统可以根据自定义函数的调用以及上下文,生成自定义函数的声明。其中,自定义函数的声明可以包括自定义函数的注释、参数列表、参数类型、返回值类型中的一种或多种。接着,代码管理系统可以根据自定义函数的声明,生成自定义函数的实现代码。该方法可以自动从函数调用语句及其上下文中生成自定义函数的声明,从而生成自定义函数的实现代码,以完成代码生成。
在一些可能的实现方式中,代码管理系统还可以接收用户对自定义函数的声明的反馈,并根据用户对自定义函数的声明的反馈,更新自定义函数的声明。该方法引入用户反馈,使得用户可以对生成的自定义函数的声明进行确认、修改或补充,从而确保自定义函数的声明的准确度。
在一些可能的实现方式中,当用户触发分解操作,代码管理系统可以将自定义函数的声明分解,并根据分解结果,生成自定义函数的实现代码。该方法可以将自定义函数的声明作为子任务再次触发代码生成,从而逐步使代码更加完整。
在一些可能的实现方式中,代码管理系统可以将多个子任务的代码存储在用户指定的输出路径,从而便于用户对生成的代码进行管理。
在一些可能的实现方式中,代码管理系统可以为集成开发环境(integrated development environment,IDE)。其中,IDE可以包括本地的IDE或云端IDE。IDE具有基于任务描述分解的代码生成能力或插件,IDE的上述能力或插件被触发时,IDE执行接收用户输入的任务描述,将任务描述分解为多个子任务描述,根据多个子任务描述生成子任务的代码的步骤。如此,可以方便开发者进行软件开发,提高了开发效率。
在一些可能的实现方式中,代码管理系统可以为云服务,该云服务具有代码生成接口,当代码生成接口被调用时,云服务可以执行接收用户输入的任务描述,将任务描述分解为多个子任务描述,根据多个子任务描述生成子任务的代码的步骤。如此可以通过云服务为大量开发者提供代码生成服务,满足业务需求。
第二方面,本申请提供了一种代码管理系统。所述系统包括:
交互模块,用于接收用户输入的任务描述;
分解模块,用于将所述任务描述分解为多个子任务描述;
生成模块,用于根据所述多个子任务描述生成多个子任务的代码,所述多个子任务的代码与所述多 个子任务描述一一对应。
在一些可能的实现方式中,所述分解模块具体用于:
通过任务描述分解模型,将所述任务描述分解为多个子任务的参考描述;
根据所述用户对所述多个子任务的参考描述的反馈,获得所述多个子任务描述。
在一些可能的实现方式中,所述用户对所述多个子任务的参考描述的反馈包括:确认、修改或补充。
在一些可能的实现方式中,所述任务描述分解模型通过如下方式训练得到:
从程序语言语料中抽取任务描述样例以及子任务描述样例;
根据所述任务描述样例以及所述子任务描述样例,通过生成式预训练方法训练所述任务描述分解模型,所述任务描述分解模型以任务描述为输入,以子任务描述为输出。
在一些可能的实现方式中,所述分解模块具体用于:
将所述任务描述分解为多个第一子任务描述;
向所述用户呈现所述多个第一子任务描述;
当所述用户触发分解操作,将所述多个第一子任务描述中的目标子任务描述分解为多个第二子任务描述。
在一些可能的实现方式中,所述交互模块还用于:
向所述用户呈现所述子任务的代码的注释,所述子任务的代码的注释包括所述子任务描述。
在一些可能的实现方式中,所述生成模块具体用于:
根据所述多个子任务描述,生成代码片段、对库函数的调用或对自定义函数的调用的一种或多种。
在一些可能的实现方式中,当所述自定义函数未定义时,所述生成模块还用于:
根据对所述自定义函数的调用以及上下文,生成所述自定义函数的声明和实现代码。
在一些可能的实现方式中,所述生成模块具体用于:
根据对所述自定义函数的调用以及上下文,生成所述自定义函数的声明,所述自定义函数的声明包括所述自定义函数的注释、参数列表、参数类型、返回值类型中的一种或多种;
根据所述自定义函数的声明,生成所述自定义函数的实现代码。
在一些可能的实现方式中,所述交互模块还用于:
接收所述用户对所述自定义函数的声明的反馈;
根据所述用户对所述自定义函数的声明的反馈,更新所述自定义函数的声明。
在一些可能的实现方式中,所述生成模块具体用于:
当所述用户触发分解操作,将所述自定义函数的声明分解;
根据分解结果,生成所述自定义函数的实现代码。
第三方面,本申请提供一种计算设备集群。所述计算设备集群包括至少一台计算设备,所述至少一台计算设备包括至少一个处理器和至少一个存储器。所述至少一个处理器、所述至少一个存储器进行相互的通信。所述至少一个处理器用于执行所述至少一个存储器中存储的指令,以使得计算设备或计算设备集群执行如第一方面或第一方面的任一种实现方式所述的代码管理方法。
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,所述指令指示计算设备或计算设备集群执行上述第一方面或第一方面的任一种实现方式所述的代码管理方法。
第五方面,本申请提供了一种包含指令的计算机程序产品,当其在计算设备或计算设备集群上运行时,使得计算设备或计算设备集群执行上述第一方面或第一方面的任一种实现方式所述的代码管理方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。
图1为本申请实施例提供的一种代码管理系统的架构示意图;
图2为本申请实施例提供的一种代码管理方法的流程图;
图3为本申请实施例提供的一种代码管理方法的应用场景示意图;
图4为本申请实施例提供的一种前端界面的界面示意图;
图5为本申请实施例提供的一种人机交互界面的界面示意图;
图6为本申请实施例提供的一种修改或填充后的人机交互界面的界面示意图;
图7为本申请实施例提供的另一种前端界面的界面示意图;
图8为本申请实施例提供的一种代码生成结果的示意图;
图9为本申请实施例提供的一种触发反向生成未定义函数的示意图;
图10为本申请实施例提供的一种导入语句或者重新实现以生成未定义函数的函数定义的示意图;
图11为本申请实施例提供的另一种代码管理方法的应用场景示意图;
图12为本申请实施例提供的一种计算设备的结构示意图;
图13为本申请实施例提供的一种计算设备集群的结构示意图;
图14为本申请实施例提供的一种计算设备集群的结构示意图;
图15为本申请实施例提供的一种计算设备集群的结构示意图。
具体实施方式
本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
首先对本申请实施例中所涉及到的一些技术术语进行介绍。
代码生成(code generation),是指通过人工智能(artificial intelligence,AI)辅助开发者开发代码的技术。代码生成可以分为如下类别:根据代码生成代码(也称作code2code)、根据文本生成代码(也称作text2code)。
text2code,具体是从自然语言描述中生成特定编程语言的代码,从而实现用户以自然语言描述所表达的需求。类比于开发者编写代码的过程,text2code对应的代码生成器(code generator)的工作过程类似于开发者先编写代码注释,再由代码生成器生成对应于该注释所描述功能的代码片段,以推荐的形式呈现,由开发者决定接受或拒绝其推荐,或接受后再进行进一步修改。
然而,上述代码生成工具对所输入的自然语言描述高度依赖,自然语言描述的质量、层次、细节对生成效果影响巨大。在需要多个步骤完成的任务上的生成效果普遍较差,相关研究表明,增加一个步骤,代码生成工具的正确率下降幅度可以达到70%。代码生成工具生成的代码的正确率整体较低,难以满足业务需求。
有鉴于此,本申请提供一种代码管理方法。该方法可以由代码管理系统执行。代码管理系统可以是软件系统,该软件系统可以部署在计算设备集群,计算设备集群执行软件系统的程序代码,从而执行本申请的代码管理方法。其中,代码管理系统用于开发者在代码编辑器或集成开发环境(integrated development environment,IDE)中编写代码的场景,基于此,代码管理系统可以以IDE插件的形式直接服务于用户。代码管理系统也可以以云服务或能力的形式提供给其他工具,通过应用程序编程接口(application programming interface,API)的形式进行调用。在一些可能的实现方式中,代码管理系统也可以是硬件系统,硬件系统运行时执行本申请的代码管理方法。为了便于描述,下文以代码管理系统为软件系统示例说明。
具体地,代码管理系统接收用户输入的任务描述,将该任务描述分解为多个子任务描述,然后根据多个子任务描述生成多个子任务的代码,其中,多个子任务的代码与多个子任务描述一一对应。
该方法通过引入对任务描述的分解,从而将任务分解为多个通用任务或原子任务(不支持进一步分解的任务),提高了在复杂的多步骤任务上代码生成的正确率,具有较好的代码生成效果,能够满足业务需求。进一步地,该方法还可以引入用户反馈,由用户对分解结果进行确认、修改或补充,如此使得分解得到子任务描述更加精准,由此生成的代码的正确率得以进一步提升。
此外,区别于传统的代码生成工具按照自左向右的顺序生成代码,该方法在开发过程中更自然地遵循了从整体到部分的分治思维,分解后的自然语言描述和相应子任务,对于用户更容易理解,也更容易生成,而且还能进行代码复用,提高了代码生成效率和质量。
为了使得本申请的技术方案更加清楚、易于理解,下面结合附图对本申请的代码管理系统的架构进行介绍。
参见图1所示的代码管理系统的架构示意图,代码管理系统100包括交互模块102、分解模块104和生成模块106。下面分别对各个模块的功能进行介绍。
交互模块102用于接收用户输入的任务描述,该任务描述可以是自然语言描述,由于任务描述是用户原始输入的,也称作原始描述。例如,原始描述可以为“Get the default branch of a repo on GitHub”。其中,交互模块102可以通过交互界面中的代码编辑界面接收用户输入的任务描述。交互界面可以是图形用户界面(graphical user interface,GUI),或者是命令用户界面(command user interface,CUI)。
分解模块104用于将所述任务描述分解为多个子任务描述。子任务描述是由任务描述分解得到,因此子任务描述也可以称作分解后描述。其中,任务描述通常可以作为文件、类或函数级注释,子任务描述通常可以作为代码块、行级注释。例如,子任务描述可以为“clone the repo”、“run git command”以及“print branch name”。
具体实现时,分解模块104可以通过任务描述分解模型,将任务描述分解为多个任务子描述。在一些可能的实现方式中,分解模块104可以通过任务描述分解模型,将所述任务描述分解为多个子任务的参考描述,交互模块102还用于向用户呈现多个子任务的参考描述,接收用户对多个子任务的参考描述的反馈,分解模块104根据用户对多个子任务的参考描述的反馈,获得多个子任务的描述。其中,用户对多个子任务的参考描述的反馈可以包括确认、修改或补充。
生成模块106用于根据多个子任务描述生成多个子任务的代码。多个子任务的代码与多个子任务描述一一对应。进一步地,与分解模块104类似,交互模块102还用于向用户呈现生成模块106生成的子任务的代码,接收用户对子任务的代码的反馈,该反馈可以包括确认、修改或补充,然后生成模块106可以根据用户对多个子任务的代码的反馈,更新多个子任务的代码。
基于本申请实施例提供的上述代码管理系统100,本申请实施例还提供了一种代码管理方法,下面结合附图对本申请实施例的代码管理方法进行详细介绍。
参见图2所示的代码管理方法的流程图,该方法包括:
S202:代码管理系统100接收用户输入的任务描述。
具体地,用户输入的任务描述可以是自然语言描述,该任务描述可以是较高层级的任务描述,例如为文件、类、函数级生物描述,用于生成文件、函数或类。该任务描述采用了自然语言描述任务的功能或具体实现,因而可以作为代码的注释。为了将代码与注释区分,任务描述可以包括表征注释的关键字,例如为“#”。
具体实现时,代码管理系统100可以向用户呈现代码编辑界面,该代码编辑界面可以是GUI或者是CUI,用户可以通过GUI或CUI,输入以表征注释的关键字为起始字符的自然语言描述,代码管理系统100可以接收上述自然语言描述。
S204:代码管理系统100将所述任务描述分解为多个第一子任务描述。
具体地,代码管理系统100可以通过任务描述分解模型对任务描述进行分解。任务描述分解模型自动分解任务描述(也即原始描述、原注释),生成更细粒度的、更加命令式的子任务描述。
需要说明,代码管理系统100可以直接将任务描述分解模型自动分解得到的描述作为第一子任务描述,也可以将任务描述分解模型自动分解得到的描述作为参考描述,引入人机交互,由用户基于参考描述确定第一子任务描述。
S206:代码管理系统100向所述用户呈现所述多个第一子任务描述。当用户触发分解操作,执行S208;否则执行S210。
具体地,当代码管理系统100直接将任务描述分解模型自动分解得到的描述作为第一子任务描述时,代码管理系统100向用户呈现任务描述分解模型自动分解得到的描述。当代码管理系统100引入人机交互时,代码管理系统100向用户呈现任务描述分解模型自动分解得到的描述,接收用户对该描述的反馈,例如是对该描述的确认、修改或补充,代码管理系统100可以根据用户对该描述的反馈,获得多个第一子任务描述,向用户呈现多个第一子任务描述。
考虑到高层级的任务可以存在嵌套,用户可以对多个第一子任务描述是否能继续分解进行判断。当多个第一子任务描述中存在能够继续分解的目标子任务描述,用户可以触发分解操作。相应地,代码管理系统100可以响应于分解操作,执行S208。当多个第一子任务描述中不存在能够继续分解的目标子任务描述,也即第一子任务描述均为原子任务描述,代码管理系统100可以直接执行S210。
S208:代码管理系统100将多个第一子任务描述中的目标子任务描述分解为多个第二子任务描述。
目标子任务描述能够继续分解,也即目标子任务描述为高层级的任务描述,代码管理系统100可以将目标子任务描述输入任务描述分解模型,获得更细粒度的子任务描述。与将任务描述分解为多个第一子任务描述类似,代码管理系统100可以直接将任务描述分解模型自动分解得到的描述作为第二子任务描述,也可以引入人机交互,由用户对自动分解得到的描述进行反馈,从而获得第二子任务描述。
当多个第二子任务描述不存在能够继续分解的目标子任务描述,也即第二子任务描述均为原子任务描述,代码管理系统100可以执行S210,否则代码管理系统100可以对第二子任务描述中支持分解的描述继续分解。
上述S204至S208为代码管理系统100将任务描述分解为多个子任务描述的一些具体实现方式,在本申请实施例其他可能的实现方式中,也可以采用其他方式进行任务描述的分解。
S210:代码管理系统100根据多个子任务描述生成多个子任务的代码。
当第一子任务描述不存在能够继续分解的目标子任务描述,多个子任务描述可以为多个第一子任务描述,当第一子任务描述存在能够继续分解的目标子任务描述,多个子任务描述可以为除了目标子任务描述以外的第一子任务描述以及由目标子任务描述分解得到的第二子任务描述。
代码管理系统100可以确定子任务的实现形式,然后按照相应形式生成子任务的代码。例如,对于简单的步骤,代码管理系统100直接生成代码片段。其中,代码片段可以包括简单语句或代码块,代码块包括但不限于变量声明、赋值语句、分支或循环结构。又例如,对于可使用库函数(包括但不限于标准库或三方库的API)实现的步骤,代码管理系统100生成对相应的库函数的调用。还例如,对于较为复杂的步骤,代码管理系统100生成对自定义函数的调用,具体是生成一个或多个函数调用语句。被调用的自定义函数可能来源于项目其他位置,也可以不存在。
当自定义函数未定义时,代码管理系统100可以根据对所述自定义函数的调用以及上下文,生成所述自定义函数的声明和实现代码。具体地,代码管理系统100可以应用软件分析技术,将函数调用的上下文信息(如修饰该函数的注释、函数传入的实际参数、函数返回值的使用等)转换为函数声明中的信息(如函数级注释、参数列表和类型、返回值类型等),并据此生成函数定义的签名(function signature)部分。该部分可以作为一个子任务,继续输入到任务描述分解模型中进一步分解;若粒度已经足够原子,则可以直接输入到代码生成模型中进行实现,将生成的代码作为其函数体(function body)补充到函数定义中。
S212:代码管理系统100向用户呈现多个子任务的代码。
S214:代码管理系统100接收用户对多个子任务的代码的反馈。
S216:代码管理系统100根据用户多个子任务的代码的反馈,更新子任务的代码。
与对子任务描述进行反馈类似,代码管理系统100也可以引入人机交互,由用户对自动生成的代码进行反馈,例如是对自动生成的代码进行确认、修改或补充,从而保障代码的正确率。
对于某一子任务的实现,用户可随时进行修改、测试、调试等常规开发操作,从而保证部分代码的正确性。最终,当所有的子任务都实现后,所有代码即构成了对原始描述所描述的任务的一个实现方案。
需要说明的是,上述S212至S216为本申请实施例的可选步骤,执行本申请实施例的代码管理方法也可以不执行上述S212至S216。例如,代码管理系统100可以直接自动生成的代码作为子任务的实现代码。
基于上述内容,本申请实施例提供的代码管理方法将抽象的高层任务描述自动分解为子任务描述,通过对细粒度的子任务描述分别生成相应的代码,可以提高代码的正确率。而且,该方法可以引入人机交互,由用户对自动生成的子任务描述进行确认,允许用户加以修改和补充,使得用于生成代码的AI模型的输入更加精确,进一步提高代码的正确率。
该方法可以根据用户确认的子任务粒度,动态决定生成代码片段、库函数调用、或者自定义函数调 用等不同结构,最终形成的代码天然具有较清晰的结构和注释,无需用户手动注释,降低交互成本。进一步地,当自定义函数调用存在未定义的函数时,可以分析生成代码中用户期望的函数调用的注释、形式、参数、返回类型等,自动生成函数声明,然后对将函数声明作为任务描述进行分解,分解后的子任务粒度更加原子化,有利于最大化发挥AI模型在通用性代码上的生成优势。
接下来,对代码管理方法在不同场景中的应用分别示例说明。
首先,参见图3所示的一种代码管理方法的应用场景示意图,该场景中,代码管理系统100以IDE插件的形式直接服务用户。区别于同类代码生成工具,本申请实施例在一般的代码生成流程前后,创新性地引入了对任务描述的分解、用户的交互和反馈、以及函数的反向生成技术,在面对复杂多步骤任务时比同类工具更为有效。
具体而言,代码管理系统100生成代码的过程可以分为如下两个阶段:人机交互的任务描述分解阶段和以反向生成为核心的代码生成阶段。
在人机交互的任务描述分解阶段,用户提供初始的任务描述(文件/类/函数级注释),由任务描述分解模型自动分解原注释,生成更细粒度的、更加命令式的、子任务描述,并以代码块/行级注释的形式呈现给用户。用户可以阅读所生成的子任务描述,选择确认、修改或补充,当修改或补充完成后可以继续进行确认。用户完成确认后,进入代码生成阶段。
在以反向生成为核心的代码生成阶段,代码管理系统100根据经用户确认的子任务描述,为子任务进行代码实现。步骤式的子任务描述可以从上到下逐步输入到一个代码片段补全算法中,该算法可以根据步骤的粒度,自动选择生成的代码形式。代码形式可以包括代码片段、对库函数的调用或对自定义函数的调用。用户可以阅读代码,选择确认、修改或补充。
进一步地,被调用的自定义函数可能来源于项目其他位置,也可以不存在。当该自定义函数不存在时,代码生成系统100可以通过函数反向生成算法,为被调用但尚不存在的函数反向生成其声明和定义。具体地,代码生成系统100可以通过函数反向生成算法,先生成函数声明,函数声明包括函数级注释、参数列表和类型、返回值类型中的至少一种。进一步地,代码生成系统100可以根据函数声明生成函数定义的签名。函数定义的签名可以作为子任务描述输入到任务描述分解模型中进一步分解。当然,自定义函数足够原子,也可以直接输入到代码生成模型,将代码生成模型生成的代码作为函数体补充至函数定义中。
下面将从前端界面、人机交互、技术方案等方面,详细介绍任务描述分解阶段、代码生成阶段的实现和工作过程。
参见图4所示一种前端界面的示意图,与其他同类工具一样,本实施例的主要实现形式是作为代码编辑器或IDE的扩展功能或插件,因此前端界面是主要内嵌于IDE中的代码生成辅助工具。
以Jetbrains系列IDE、Python语言、函数级生成为例,用户期望生成一个名为get_branch_name的函数,该函数的主要功能是:“获得一个GitHub仓库的默认分支”,它接受一个参数repo作为输入,应返回默认分支名作为输出。然而,此描述是不清晰的,如果直接进行代码生成,可能会产生多种不同的但不符合期望的生成结果。
在此结果中,代码生成工具认为repo变量是有一个对象,直接简单返回其分支名属性;然而,函数实际上接受的是一个GitHub仓库的全名,此仓库不一定存在于用户本地,但现有签名和注释中并未体现这一点。因此,有必要通过人机交互进一步明确用户的需求和条件。例如用户可以点击“Generate”触发代码生成,并进入人机交互界面。
参见图5所示的人机交互界面的示意图,当用户给定较高层次的任务描述(如该例中的函数级注释),代码管理系统100将首先尝试从中生成若干个更具体的步骤式注释(如函数体内的块级和行级注释)。在图5中,获取一个GitHub仓库默认分支名这一任务被分为了三步:下载该仓库到本地,运行Git命令,打印出默认分支名。然而,由于下载耗时且占据磁盘空间,且用户本地不一定安装了Git工具,对于仅获取默认分支名这一轻量操作来说,此分解方案过于重量,依然不符合用户需求。
此时,用户可直接删除所有生成的块级和行级注释,重新按下回车后,工具会再次尝试生成一个不 同的分解方案。为了使生成的分解方案更加贴近真正需求,用户可以对函数签名和函数注释进行修改,以补充更多的信息和条件,例如添加参数和返回值类型、指定需要满足的条件等。如图6所示的人机交互界面的示意图,当用户在签名中补充参数和返回值类型为字符串、在注释中补充不需要克隆仓库等信息后按下回车,工具将生成另一个方案,包括三个步骤:创建GitHub GraphQL格式的请求、发送请求并获得响应、解析响应并返回其中包含的分支名字段。
其中,将任务描述分解为子任务描述可以通过任务描述分解模型实现。任务描述分解模型通过训练一个AI模型得到。在软件开发过程中,注释与代码经常交替出现,且不同层次的注释之间存在嵌套关系,记录了开发者在编写代码时的步骤分解过程。例如,文件、类、函数级的注释往往从较高的抽象层次介绍该部分代码整体的作用或使用方法,而行级、块级的注释则多为对所修饰代码片段功能的解释,作为其所在函数注释的步骤说明或细节补充。因此,代码管理系统100可以首先从程序语言语料中抽取任务描述样例以及子任务描述样例,例如是从海量源代码中抽取出各级注释,高层级注释为任务描述样例,高层级注释的子注释为子任务描述样例,根据任务描述样例以及子任务描述样例,通过生成式预训练方法训练所述任务描述分解模型。该任务描述分解模型以任务描述为输入,以子任务描述为输出。
接着,参见7所示的另一种前端界面的示意图,当用户确认子任务描述(也称作步骤式注释)后,通过在某个步骤式注释所在行按下快捷键(如Alt+Enter)或选中某注释后点击右键选择“生成代码”,该注释会被输入到算法中进行代码生成。
图8示出了一种代码生成结果的示意图,如图8所示,IDE此时提示出若干错误信息,因为生成的代码中包含了对未定义函数的调用,用户期望该函数内部可以通过一部分代码实现某种功能,此时每个函数可看作原任务的一个子任务。
通常情况下,开发者需要先编写函数声明或定义,再进行调用,且多数代码生成工具也需要先扫描上下文中已定义的函数才有可能推荐出对现有函数的调用代码。但实际上,先实现主要逻辑,将子函数暂时看做黑箱来使用的方式,更符合软件开发和重构的最佳实践。
为了处理对未定义函数的调用这一情况,代码管理系统100引入了反向生成功能。从交互角度而言,反向生成功能的工作过程是:当用户确认调用出的代码正确之后,可以在函数调用处按下快捷键(如Alt+Enter)或选中某注释后点击右键选择“生成代码”(如图9所示),此时代码管理系统100可以自动生成该函数的函数声明,包含该函数的签名、参数列表及类型、返回值类型、注释等要素。如图10所示,如果该函数来自于其他文件中,代码管理系统100自动生成对该函数的导入语句而不是重新实现;否则,该函数的函数声明部分可以被输入到任务描述分解模型中进行进一步分解,或直接输入到代码生成模型中进行实现代码的生成。
代码管理系统100实现反向生成的关键在于从函数调用中分析得出其定义的签名部分所应包含的信息。在本技术方案中,签名部分的各个要素以及对应的实现方式如下:
(1)函数注释:通过解析函数调用语句,定位修饰该语句的行级注释,将该注释格式化为函数级注释,复制到函数定义位置并适配上下文代码格式;
(2)函数名以及参数类型:使用软件分析中的程序切片技术抽取与该函数调用存在数据流和控制流依赖的上下文代码,特别是函数参数相关的语句;使用类型推断技术得出函数参数类型,并与参数名一一对应,以符合语法的形式带入到函数定义位置;
(3)返回类型:使用后向程序切片技术以及类型推断技术,根据调用处对返回值的定义和使用,推断得到期望函数返回的类型,以符合语法的形式带入到函数定义位置。
需要说明的是,Jetbrains系列的IDE中提供了对不存在的函数的检测、提示以及签名的自动生成功能,但仅能根据规则简单生成函数签名部分,并不能包含注释、数据类型等重要信息。本申请实施例中的反向生成功能在体现形式上可兼容和复用IDE提供的这种能力,并进一步引入了软件分析技术以补充更多重要信息,以使得代码生成模型或任务描述分解模型的输入更加准确。当用户确认子任务的函数名、实参、返回值等信息后,再输入模型中进一步生成其实现代码(即函数体)。分解后的子任务相比于原任务,对于开发者而言更容易描述清楚,对于模型而言也更容易生成正确的代码。用户还可以在生成过程中通过针对子任务的单元测试来修改调试模型生成的结果,从而达到分而治之、逐步生成的效果。
接着,参见图11所示的另一种代码管理方法的应用场景示意图,该场景中,代码管理系统100以云服务的形式提供给其他工具,通过API接口的形式进行调用。其中,API可以提供如下能力:
1.对任务描述进行分解和确认:输入为文件/类/函数级注释,输出为分解后的块级/行级注释。
2.根据任务粒度进行代码补全:输入为分解后的注释,输出为生成的代码。该代码可以包括代码片段(如简单语句、代码块)、语句、对库函数的调用、对自定义函数的调用。
3.根据对自定义函数的调用反向生成函数定义:输入为对自定义函数的调用及其上下文,输出为完整的函数定义(包括函数声明和实现代码)。
由于本申请实施例的云服务实际上独立于具体的代码生成技术,因此以上能力在不同的代码生成技术间具有通用性,可被不同的工具集成,作为一个整体提升代码生成的用户体验。
如图11所示,本申请实施例所提供的能力1可以用于实际触发代码生成之前:先通过用户交互将任务描述分解,用户在确认子任务描述后,可以选择一种或多种代码生成工具(如Copilot,Tabnine等)或补全工具(如IDE内置补全和推荐工具等)进行代码实现。此类工具的推荐往往以推荐列表的形式呈现,此时可使用能力2进行工具的选择及结果的排序(例如,对于简单一行代码片段即可实现的部分,直接调用行级补全并将结果排序在前;对于需要使用库API实现的部分,将IDE中的API联想结果排序在前)。当工具生成的代码被用户接受,但其中存在尚未实现的函数调用时,能力3则可以自动从函数调用语句及其上下文中生成函数声明以及注释,作为子任务再次触发代码生成,从而逐步使代码更加完整,且可通过单元测试验证每一步实现的正确性,从而最终解决原始问题。
基于本申请实施例提供的代码管理方法,本申请实施例还提供了一种如前述的代码管理系统100。下面结合附图对代码管理系统100进行介绍。
参见图1所示的代码管理系统100的结构示意图,该系统100包括:
交互模块102,用于接收用户输入的任务描述;
分解模块104,用于将所述任务描述分解为多个子任务描述;
生成模块106,用于根据所述多个子任务描述生成多个子任务的代码,所述多个子任务的代码与所述多个子任务描述一一对应。
上述交互模块102、分解模块104和生成模块106可以通过硬件模块实现或通过软件模块实现。其中,交互模块102可以通过收发器或者收发器上的软件实现。分解模块104、生成模块106可以通过计算设备或者计算设备上的计算引擎实现。下面,以分解模块104为例进行说明。
当通过软件实现时,分解模块104可以是运行在计算设备或计算设备集群上的应用程序或者应用程序模块,如计算引擎等。
当通过硬件实现时,分解模块104中可以包括至少一个计算设备,如服务器等。或者,分解模块104也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。
在一些可能的实现方式中,分解模块104具体用于:
通过任务描述分解模型,将任务描述分解为多个子任务的参考描述;
根据用户对多个子任务的参考描述的反馈,获得多个子任务描述。
在一些可能的实现方式中,用户对多个子任务的参考描述的反馈包括:确认、修改或补充。
在一些可能的实现方式中,任务描述分解模型通过如下方式训练得到:
从程序语言语料中抽取任务描述样例以及子任务描述样例;
根据任务描述样例以及子任务描述样例,通过生成式预训练方法训练任务描述分解模型,任务描述分解模型以任务描述为输入,以子任务描述为输出。
在一些可能的实现方式中,分解模块104具体用于:
将任务描述分解为多个第一子任务描述;
向用户呈现多个第一子任务描述;
当用户触发分解操作,将多个第一子任务描述中的目标子任务描述分解为多个第二子任务描述。
在一些可能的实现方式中,交互模块102还用于:
向用户呈现子任务的代码的注释,其中,子任务的代码的注释包括子任务描述。
在一些可能的实现方式中,生成模块106具体用于:
根据多个子任务描述,生成代码片段、对库函数的调用或对自定义函数的调用的一种或多种。
在一些可能的实现方式中,当自定义函数未定义时,生成模块106还用于:
根据对自定义函数的调用以及上下文,生成自定义函数的声明和实现代码。
在一些可能的实现方式中,生成模块106具体用于:
根据对自定义函数的调用以及上下文,生成自定义函数的声明,其中,自定义函数的声明包括自定义函数的注释、参数列表、参数类型、返回值类型中的一种或多种;
根据自定义函数的声明,生成自定义函数的实现代码。
在一些可能的实现方式中,交互模块102还用于:
接收用户对自定义函数的声明的反馈;
根据用户对自定义函数的声明的反馈,更新自定义函数的声明。
在一些可能的实现方式中,生成模块106具体用于:
当用户触发分解操作,将自定义函数的声明分解;
根据分解结果,生成自定义函数的实现代码。
本申请还提供一种计算设备1200。如图12所示,计算设备1200包括:总线1202、处理器1204、存储器1206和通信接口1208。处理器1204、存储器1206和通信接口1208之间通过总线1202通信。计算设备1200可以是服务器或终端设备。应理解,本申请不限定计算设备1200中的处理器、存储器的个数。
总线1202可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图12中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线1202可包括在计算设备1200各个部件(例如,存储器1206、处理器1204、通信接口1208)之间传送信息的通路。
处理器1204可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。
存储器1206可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器1206还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。存储器1206中存储有可执行的程序代码,处理器1204执行该可执行的程序代码以实现前述代码管理方法。具体的,存储器1206上存有代码管理系统100用于执行代码管理方法的指令。
通信接口1208使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备1200与其他设备或通信网络之间的通信。
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。
如图13所示,所述计算设备集群包括至少一个计算设备1200。计算设备集群中的一个或多个计算设备1200中的存储器1206中可以存有相同的代码管理系统100用于执行代码管理方法的指令。
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备1200也可以用于执行代码管理系统100用于执行代码管理方法的部分指令。换言之,一个或多个计算设备1200的组合可以共同执行代码管理系统100用于执行代码管理方法的指令。
需要说明的是,计算设备集群中的不同的计算设备1200中的存储器1206可以存储不同的指令,用于执行代码管理系统100的部分功能。
图14示出了一种可能的实现方式。如图14所示,两个计算设备1200A和1200B通过通信接口1208实现连接。计算设备1200A中的存储器上存有用于执行交互模块102的功能的指令。计算设备1200B中的存储器上存有用于执行分解模块104和生成模块106的功能的指令。换言之,计算设备1200A和1200B的存储器1206共同存储了代码管理系统100用于执行代码管理方法的指令。
图14所示的计算设备集群之间的连接方式可以是考虑到本申请提供的代码管理方法需要接收用户输入的任务描述和对任务描述进行分解,从而生成代码。因此考虑将交互模块102实现的功能交由计算设备1200A执行,分解模块104和生成模块106实现的功能由计算设备1200B执行。
应理解,图14中示出的计算设备1200A的功能也可以由多个计算设备1200完成。同样,计算设备1200B的功能也可以由多个计算设备1200完成。
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。图15示出了一种可能的实现方式。如图15所示,两个计算设备1200C和1200D之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,计算设备1200C中的存储器1206中存有执行交互模块102的功能的指令。同时,计算设备1200D中的存储器1206中存有执行分解模块104和生成模块106的功能的指令。
图15所示的计算设备集群之间的连接方式可以是考虑到本申请提供的代码管理方法需要接收用户输入的任务描述和对任务描述进行分解,从而生成代码。因此考虑将交互模块102实现的功能交由计算设备1200C执行,分解模块104和生成模块106实现的功能由计算设备1200D执行。应理解,图15中示出的计算设备1200C的功能也可以由多个计算设备1200完成。同样,计算设备1200D的功能也可以由多个计算设备1200完成。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行上述应用于代码管理系统用于执行代码管理方法。
本申请实施例还提供了一种包含指令的计算机程序产品。所述计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当所述计算机程序产品在至少一个计算设备上运行时,使得至少一个计算设备执行上述代码管理方法。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的保护范围。

Claims (25)

  1. 一种代码管理方法,其特征在于,所述方法包括:
    接收用户输入的任务描述;
    将所述任务描述分解为多个子任务描述;
    根据所述多个子任务描述生成多个子任务的代码,所述多个子任务的代码与所述多个子任务描述一一对应。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述任务描述分解为多个子任务描述,包括:
    通过任务描述分解模型,将所述任务描述分解为多个子任务的参考描述;
    根据所述用户对所述多个子任务的参考描述的反馈,获得所述多个子任务描述。
  3. 根据权利要求2所述的方法,其特征在于,所述用户对所述多个子任务的参考描述的反馈包括:确认、修改或补充。
  4. 根据权利要求2或3所述的方法,其特征在于,所述任务描述分解模型通过如下方式训练得到:
    从程序语言语料中抽取任务描述样例以及子任务描述样例;
    根据所述任务描述样例以及所述子任务描述样例,通过生成式预训练方法训练所述任务描述分解模型,所述任务描述分解模型以任务描述为输入,以子任务描述为输出。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述将所述任务描述分解为多个子任务描述,包括:
    将所述任务描述分解为多个第一子任务描述;
    向所述用户呈现所述多个第一子任务描述;
    当所述用户触发分解操作,将所述多个第一子任务描述中的目标子任务描述分解为多个第二子任务描述。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述方法还包括:
    向所述用户呈现所述子任务的代码的注释,所述子任务的代码的注释包括所述子任务描述。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述根据所述多个子任务描述生成多个子任务的代码,包括:
    根据所述多个子任务描述,生成代码片段、对库函数的调用或对自定义函数的调用的一种或多种。
  8. 根据权利要求7所述的方法,其特征在于,当所述自定义函数未定义时,所述方法还包括:
    根据对所述自定义函数的调用以及上下文,生成所述自定义函数的声明和实现代码。
  9. 根据权利要求8所述的方法,其特征在于,所述根据对所述自定义函数的调用以及上下文,生成所述自定义函数的声明和实现代码,包括:
    根据对所述自定义函数的调用以及上下文,生成所述自定义函数的声明,所述自定义函数的声明包括所述自定义函数的注释、参数列表、参数类型、返回值类型中的一种或多种;
    根据所述自定义函数的声明,生成所述自定义函数的实现代码。
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    接收所述用户对所述自定义函数的声明的反馈;
    根据所述用户对所述自定义函数的声明的反馈,更新所述自定义函数的声明。
  11. 根据权利要求9所述的方法,其特征在于,所述根据所述自定义函数的声明,生成所述自定义函数的实现代码,包括:
    当所述用户触发分解操作,将所述自定义函数的声明分解;
    根据分解结果,生成所述自定义函数的实现代码。
  12. 一种代码管理系统,其特征在于,所述系统包括:
    交互模块,用于接收用户输入的任务描述;
    分解模块,用于将所述任务描述分解为多个子任务描述;
    生成模块,用于根据所述多个子任务描述生成多个子任务的代码,所述多个子任务的代码与所述多个子任务描述一一对应。
  13. 根据权利要求12所述的系统,其特征在于,所述分解模块具体用于:
    通过任务描述分解模型,将所述任务描述分解为多个子任务的参考描述;
    根据所述用户对所述多个子任务的参考描述的反馈,获得所述多个子任务描述。
  14. 根据权利要求13所述的系统,其特征在于,所述用户对所述多个子任务的参考描述的反馈包括:确认、修改或补充。
  15. 根据权利要求13或14所述的系统,其特征在于,所述任务描述分解模型通过如下方式训练得到:
    从程序语言语料中抽取任务描述样例以及子任务描述样例;
    根据所述任务描述样例以及所述子任务描述样例,通过生成式预训练方法训练所述任务描述分解模型,所述任务描述分解模型以任务描述为输入,以子任务描述为输出。
  16. 根据权利要求12至15任一项所述的系统,其特征在于,所述分解模块具体用于:
    将所述任务描述分解为多个第一子任务描述;
    向所述用户呈现所述多个第一子任务描述;
    当所述用户触发分解操作,将所述多个第一子任务描述中的目标子任务描述分解为多个第二子任务描述。
  17. 根据权利要求12至16任一项所述的系统,其特征在于,所述交互模块还用于:
    向所述用户呈现所述子任务的代码的注释,所述子任务的代码的注释包括所述子任务描述。
  18. 根据权利要求12至17任一项所述的系统,其特征在于,所述生成模块具体用于:
    根据所述多个子任务描述,生成代码片段、对库函数的调用或对自定义函数的调用的一种或多种。
  19. 根据权利要求18所述的系统,其特征在于,所述生成模块还用于:
    当所述自定义函数未定义时,根据对所述自定义函数的调用以及上下文,生成所述自定义函数的声明和实现代码。
  20. 根据权利要求19所述的系统,其特征在于,所述生成模块具体用于:
    根据对所述自定义函数的调用以及上下文,生成所述自定义函数的声明,所述自定义函数的声明包括所述自定义函数的注释、参数列表、参数类型、返回值类型中的一种或多种;
    根据所述自定义函数的声明,生成所述自定义函数的实现代码。
  21. 根据权利要求20所述的系统,其特征在于,所述交互模块还用于:
    接收所述用户对所述自定义函数的声明的反馈;
    根据所述用户对所述自定义函数的声明的反馈,更新所述自定义函数的声明。
  22. 根据权利要求20所述的系统,其特征在于,所述生成模块具体用于:
    当所述用户触发分解操作,将所述自定义函数的声明分解;
    根据分解结果,生成所述自定义函数的实现代码。
  23. 一种计算设备集群,其特征在于,所述计算设备集群包括至少一台计算设备,所述至少一台计算设备包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储有计算机可读指令;所述至少一个处理器执行所述计算机可读指令,以使得所述计算设备集群执行如权利要求1至11中任一项所述的方法。
  24. 一种计算机可读存储介质,其特征在于,包括计算机可读指令;所述计算机可读指令用于实现权利要求1至11任一项所述的方法。
  25. 一种计算机程序产品,其特征在于,包括计算机可读指令;所述计算机可读指令用于实现权利要求1至11任一项所述的方法。
PCT/CN2023/101370 2022-10-13 2023-06-20 一种代码管理方法及相关设备 WO2024078000A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211255384.2A CN117931190A (zh) 2022-10-13 2022-10-13 一种代码管理方法及相关设备
CN202211255384.2 2022-10-13

Publications (1)

Publication Number Publication Date
WO2024078000A1 true WO2024078000A1 (zh) 2024-04-18

Family

ID=90668637

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101370 WO2024078000A1 (zh) 2022-10-13 2023-06-20 一种代码管理方法及相关设备

Country Status (2)

Country Link
CN (1) CN117931190A (zh)
WO (1) WO2024078000A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2136322A1 (de) * 2008-06-18 2009-12-23 Marc Thom Kollaboratives Bearbeitungsverfahren und -system
CN104461708A (zh) * 2014-12-03 2015-03-25 国家电网公司 任务信息处理方法和系统
US20170060540A1 (en) * 2015-08-26 2017-03-02 International Business Machines Corporation Aligning Natural Language to Linking Code Snippets to Perform a Complicated Task
WO2018236674A1 (en) * 2017-06-23 2018-12-27 Bonsai Al, Inc. HIERARCHICAL DECOMPOSITION DEEPENING REINFORCEMENT LEARNING FOR A MODEL OF ARTIFICIAL INTELLIGENCE
CN110941427A (zh) * 2019-11-15 2020-03-31 珠海豹趣科技有限公司 代码生成方法及代码生成器
CN114493358A (zh) * 2022-02-17 2022-05-13 数字浙江技术运营有限公司 指标分解方法、装置和电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2136322A1 (de) * 2008-06-18 2009-12-23 Marc Thom Kollaboratives Bearbeitungsverfahren und -system
CN104461708A (zh) * 2014-12-03 2015-03-25 国家电网公司 任务信息处理方法和系统
US20170060540A1 (en) * 2015-08-26 2017-03-02 International Business Machines Corporation Aligning Natural Language to Linking Code Snippets to Perform a Complicated Task
WO2018236674A1 (en) * 2017-06-23 2018-12-27 Bonsai Al, Inc. HIERARCHICAL DECOMPOSITION DEEPENING REINFORCEMENT LEARNING FOR A MODEL OF ARTIFICIAL INTELLIGENCE
CN110941427A (zh) * 2019-11-15 2020-03-31 珠海豹趣科技有限公司 代码生成方法及代码生成器
CN114493358A (zh) * 2022-02-17 2022-05-13 数字浙江技术运营有限公司 指标分解方法、装置和电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, WENFENG; HUANG, ZHUO; GUO, BO: "Method and Software Realization for Multi-Phased Complex Mission Description", ORDNANCE INDUSTRY AUTOMATION, vol. 25, no. 10, 28 October 2006 (2006-10-28), pages 19 - 20, 24, XP009554464, ISSN: 1006-1576 *

Also Published As

Publication number Publication date
CN117931190A (zh) 2024-04-26

Similar Documents

Publication Publication Date Title
US8869098B2 (en) Computer method and apparatus for providing model to model transformation using an MDA approach
US9037595B2 (en) Creating graphical models representing control flow of a program manipulating data resources
US8881097B2 (en) System and method for creating and using graphical object instances in a statechart environment
JP2020522790A (ja) 異種にプログラムされたデータ処理システムの自動依存性アナライザ
Kästner et al. The road to feature modularity?
EP2249249B1 (en) Systems and methods for modifying code generation templates
US8407667B2 (en) Inferring missing type information for reflection
Crocker Safe object-oriented software: the verified design-by-contract paradigm
Bergmayr et al. fREX: fUML-based reverse engineering of executable behavior for software dynamic analysis
CN111475150B (zh) 一种跨语言绑定方法、装置、设备及存储介质
WO2024078000A1 (zh) 一种代码管理方法及相关设备
Zarrin et al. An integrated framework to specify domain-specific modeling languages
JP2011515755A (ja) 予約されたコンポーネントコンテナ基盤ソフトウェアの開発方法及び装置
CN116755669A (zh) 一种基于dsl语言操作模型的低代码开发方法和工具
Starr et al. Models to Code
CN115756433A (zh) 代码平台的迁移方法、装置、电子设备及可读存储介质
Rivero et al. Improving user involvement through a model-driven requirements approach
CN113961238A (zh) 对象转换方法、装置及电子设备和存储介质
WO2024082983A1 (zh) 一种代码推荐方法、装置及相关设备
WO2024078472A1 (zh) 一种调试云服务应用程序接口api的方法、装置以及相关设备
WO2024139849A1 (zh) 一种用于生成漏洞挖掘模型的平台及相关方法
Chebanyuk et al. An Approach for Design of Architectural Solutions Based on Software Model-To-Model Transformation
Correia et al. Software Reengineering at the Architectural Level: Transformation of Legacy Systems
Zheng et al. Towards implementing product line architecture
US20230418574A1 (en) Using a semantic tree of a compiler to execute a semantic code query against source code

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23876210

Country of ref document: EP

Kind code of ref document: A1