WO2024103764A1 - 基于云服务的代码生成方法及装置 - Google Patents

基于云服务的代码生成方法及装置 Download PDF

Info

Publication number
WO2024103764A1
WO2024103764A1 PCT/CN2023/103999 CN2023103999W WO2024103764A1 WO 2024103764 A1 WO2024103764 A1 WO 2024103764A1 CN 2023103999 W CN2023103999 W CN 2023103999W WO 2024103764 A1 WO2024103764 A1 WO 2024103764A1
Authority
WO
WIPO (PCT)
Prior art keywords
context information
information
code
code generation
programming project
Prior art date
Application number
PCT/CN2023/103999
Other languages
English (en)
French (fr)
Inventor
申博
张嘉鑫
梁广泰
王千祥
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202211600744.8A external-priority patent/CN118092923A/zh
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024103764A1 publication Critical patent/WO2024103764A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation

Definitions

  • the present application relates to the field of software development technology, and in particular to a code generation method and device based on cloud services.
  • Code generation or program synthesis technology has always been a hot topic in academic research in the fields of software engineering (SE) and artificial intelligence (AI), and has attracted much attention from the industry due to its huge commercial value.
  • SE software engineering
  • AI artificial intelligence
  • NLP natural language processing
  • PLP programming language processing
  • code generation tools based on artificial intelligence as an auxiliary tool to improve software development efficiency, have become one of the most watched artificial intelligence applications in recent years.
  • the current code generation technology is mainly aimed at code completion and generation of line-level code. Compared with the programming environment and tools that have been developed for many years, it is still in its infancy, and its technology and product form still need to be continuously improved in practice.
  • This application provides a code generation method and device based on cloud services. This application can consider the project-level context when generating code, which helps to improve the generation effect of generated code.
  • the technical solutions provided by this application are as follows:
  • the present application provides a code generation method based on cloud services.
  • the method can be executed by a cloud platform.
  • the method includes: receiving a code generation request, the code generation request is used to request the generation of a first executable code for implementing a first method in a programming project; based on the code generation request, obtaining first context information required for generating the first executable code from information of the programming project; based on the first context information and the code generation request, generating the first executable code.
  • the first context information required for generating the first executable code can be obtained from the information of the programming project based on the code generation request; then, the first executable code is generated based on the first context information and the code generation request.
  • the first context information includes project-level context information from the entire programming project, which can reflect the logical structure of the code and the programming project, so that the code generation process can make more use of the background knowledge required by human developers in actual programming and the overall logic of the programming project, thereby improving the ability to generate code and helping to improve the actual use experience of the code generation technology.
  • a computing device can generate an executable code of a first method according to the first context information of the first method to be generated.
  • the hierarchical units involved in the programming project are, from large to small, programming projects, code modules, code packages, classes, and methods.
  • the first context information can be obtained in different hierarchical units. Therefore, a user can send a range indication to the computing device through a client to indicate the acquisition scope of the first context information.
  • the method further includes: receiving a range indication, the range indication is used to indicate the acquisition scope of the first context information.
  • the first context information required to generate the first executable code is obtained from the information of the programming project, including: when the range indication is used to indicate that the acquisition scope of the first context information is a programming project, based on the code generation request, the first context information is obtained from the information of the programming project.
  • the computing device can acquire the first context information in the entire programming project. In this way, the computing device can consider the logical structure of the code and the programming project when generating the first executable code, which helps to improve the code generation capability.
  • the user can indicate whether to view the acquired first context information according to the needs.
  • the user can perform a corresponding operation to trigger a preview indication, and convey the needs to the computing device through the preview indication.
  • the method also The method includes: receiving a preview indication, the preview indication is used to indicate whether to preview the first context information; when the preview indication is used to indicate previewing the first context information, displaying the first context information; and receiving a consent indication indicating consent to the first context information.
  • generating the first executable code includes: after receiving the consent indication, generating the first executable code based on the first context information and the code generation request.
  • the executable code for implementing the method is usually stored in a source code file, so the first context information may include the content in the source code file to which the first method belongs, and may also include the content outside the source code file to which the first method belongs.
  • the first context information may be obtained from the source code file to which the first method belongs and outside the source code file.
  • obtaining the first context information required to generate the first executable code includes: performing processes such as expanding the context outside the execution file and reorganizing the code inside the execution file, and obtaining the first context information according to the processing results.
  • the implementation process of reorganizing the code in the execution file and obtaining the first context information according to the reorganization result includes: based on the code generation request, obtaining the first context information required to generate the first executable code from the information of the programming project, including: based on the code generation request, obtaining the source code file to which the first method belongs in the file of the programming project; obtaining the context information whose writing position is after the first method in the source code file; adjusting the writing position of the context information to before the first method, so that the context information after the adjusted position becomes the context information of the first method; and obtaining the first context information based on the context information of the first method.
  • the implementation process of expanding the context outside the file and obtaining the first context information according to the expansion result includes: based on the code generation request, obtaining the source code file to which the first method belongs in the file of the programming project; obtaining external information outside the source code file used by the source code file in the information of the programming project; supplementing the external information in the source code file; and obtaining the first context information based on the supplemented source code file.
  • the method may further include: obtaining, in the information of the programming project, a permission scope that the first method has permission to access.
  • obtaining external information located outside the source code file and used by the source code file includes: obtaining the external information in the permission scope.
  • obtaining the scope of permissions that the first method has permission to access includes: obtaining the scope of permissions based on the position of the first method in the programming project, the access control permissions of the target class to which the first method belongs, and at least one of the hierarchy and reference relationship of the target class.
  • the position of the first method in the programming project can be represented by a generation point.
  • the hierarchy of the target class represents the hierarchy of the target class in the hierarchical unit of the programming project.
  • the reference relationship of the target class represents the content referenced by the target class.
  • the access control permissions of the target class can be set by the developer to limit the access control permissions of the executable code in the target class.
  • the method before generating the first executable code based on the first context information and the code generation request, the method also includes: removing target information in the first context information to obtain updated first context information, wherein the target information includes one or more of the following: code comments, variable assignments, method bodies, and information indicating the underlying logic of the code.
  • information irrelevant to the generation of the first executable code information involving privacy and/or sensitive information in the first context information can be deleted, so that the hierarchical structure of the content in the first context and the relevant content of the signature information are retained in the first context.
  • the purpose of compressing the first context information can be achieved, so that context information containing more valuable content can be input under the same input length, and on the other hand, the usability of the generated code, user privacy and code security can be guaranteed.
  • the user can determine whether to remove the target information in the first context information according to application requirements, and the method further includes: receiving a removal instruction, the removal instruction is used to indicate whether to remove the target information in the first context information. Accordingly, removing the target information in the first context information to obtain updated first context information includes: when the removal instruction is used to indicate the removal of the target information in the first context information, removing the target information in the first context information to obtain updated first context information.
  • the removal instruction may indicate that a removal operation needs to be performed.
  • the information may be removed according to a preset policy without the user indicating the specific type of target information to be removed, that is, without indicating whether the target information is implementation details of a calling method, variable assignment, or sensitive information contained in the code.
  • the removal instruction may not only indicate that a removal operation needs to be performed, but may also indicate the specific type of target information to be removed.
  • the first context information can also be abstracted to abstract the first context information into a grammatically compliant interface declaration form, so as to achieve the purpose of reusing the programming language grammar knowledge learned by the pre-trained model during the training process. Then, before the first executable code is generated based on the first context information and the code generation request, the method also includes: abstracting the first context information into an interface declaration form to obtain updated first context information.
  • the multiple contents in the first context information can also be sorted, so that the multiple contents in the first context information can be input into the code generation model in the sorted order, thereby reducing the probability that the content with a greater correlation with the first method is truncated due to the input length limit when input into the code generation model, so that context information with higher importance can be input into the code generation model.
  • the method Before generating the first executable code based on the first context information and the code generation request, the method also includes: obtaining the correlation between the multiple contents in the first context information and the first method; sorting the multiple contents based on the corresponding correlations of the multiple contents to obtain the updated first context information.
  • the plurality of contents are sorted based on their corresponding relevance, including: the plurality of contents are arranged before the task description of the first method, and the distance from any content to the task description is inversely correlated with the corresponding relevance of the content.
  • the implementation methods may include one or more of the following combinations: obtaining a first similarity between the identifier of each content and the relevant information of the first method, the first similarity being positively correlated with the correlation; obtaining the distance between each content and the first method in the hierarchy of the programming project, the distance being negatively correlated with the correlation; obtaining a second similarity between each content and the context information called by the associated content of the first method, the second similarity being positively correlated with the correlation, and the associated content of the first method includes one or more of the following: an associated class of the target class to which the first method belongs, and other methods in the target class.
  • the identifier includes one or more of the following: variable name, method name, package name, class name, and constant name.
  • the related information includes one or more of the following: method description, method name, return type, and parameter type.
  • the method before obtaining the first context information required for generating the first executable code from the information of the programming project based on the code generation request, the method further includes: generating a logical structure diagram of the programming project, the logical structure diagram being used to indicate the association relationship between the various contents in the programming project. Accordingly, obtaining the first context information required for generating the first executable code from the information of the programming project based on the code generation request includes: analyzing the logical structure diagram based on the code generation request, and obtaining the first context information required for generating the first executable code from the information of the programming project.
  • generating a first executable code based on the first context information and the code generation request includes: inputting the first context information and the code generation request into a pre-trained model to obtain the first executable code output by the pre-trained model.
  • the method Before generating the first executable code based on the first context information and the code generation request, the method also includes: obtaining a second executable code that implements the second method in a successfully compiled programming project; obtaining second context information of the second executable code; using the second context information as the input of the model to be trained and the second executable code as the expected output of the model to be trained, training the model to be trained, and obtaining a pre-trained model.
  • the input of the model to be trained may also include the method annotation.
  • the input of the model to be trained may also include the method signature.
  • the method signature is used to indicate how the method is used.
  • obtaining a second executable code that implements a second method in a successfully compiled programming project includes: obtaining all second methods in the successfully compiled programming project; screening all second methods to obtain second methods that pass the screening, and the second methods that pass the screening are used to express operation logic; and obtaining a second executable code that implements the second methods that pass the screening.
  • the second method that fails the screening has one or more of the following characteristics: the method body of the second method is empty, the second method has a special purpose, and the method body of the second method does not include an operation expression.
  • the special purpose includes one or more of the following: getting, setting, constructing, and returning.
  • the context information is relevant information required to perform the task.
  • the context information includes one or more of the following: functions, access rights, and calling methods of defined classes, variables, and methods.
  • the present application provides a cloud service-based code generation device, which is configured on a cloud platform.
  • the cloud service-based code generation device includes: a receiving module, which is used to receive a code generation request, and the code generation request is used to request the generation of a first executable code for implementing a first method in a programming project; a first acquisition module, which is used to obtain first context information required to generate the first executable code from information of the programming project based on the code generation request; and a generation module, which is used to generate the first executable code based on the first context information and the code generation request.
  • the receiving module is also used to receive a range indication, where the range indication is used to indicate a range for obtaining the first context information; the first obtaining module is specifically used to: when the range indication is used to indicate that the range for obtaining the first context information is a programming project, based on a code generation request, obtain the first context information from the information of the programming project.
  • the receiving module is also used to receive a preview indication, where the preview indication is used to indicate whether to preview the first context information;
  • the cloud service-based code generation device also includes: a display module, used to display the first context information when the preview indication is used to indicate a preview of the first context information;
  • the receiving module is also used to receive a consent indication indicating consent to the first context information;
  • the generating module is specifically used to: after receiving the consent indication, generate a first executable code based on the first context information and a code generation request.
  • the first acquisition module is specifically used to: based on a code generation request, obtain the source code file to which the first method belongs in the files of the programming project; obtain external information located outside the source code file and used by the source code file in the information of the programming project; supplement the external information in the source code file; and obtain first context information based on the supplemented source code file.
  • the first acquisition module is specifically used to: obtain, from the information of the programming project, a permission scope that the first method has permission to access; and obtain external information from the permission scope.
  • the first acquisition module is specifically used to obtain the permission scope based on the position of the first method in the programming project, the access control permission of the target class to which the first method belongs, and at least one of the hierarchy and reference relationship of the target class.
  • the first acquisition module is further used to: remove target information from the first context information to obtain updated first context information, where the target information includes one or more of the following: code comments, variable assignments, method bodies, and information indicating the underlying logic of the code.
  • the receiving module is also used to receive a removal indication, where the removal indication is used to indicate whether to remove the target information in the first context information; the first acquisition module is specifically used to: when the removal indication is used to indicate the removal of the target information in the first context information, remove the target information in the first context information to obtain the updated first context information.
  • the first acquisition module is further used to: abstract the first context information into an interface declaration form to obtain updated first context information.
  • the first acquisition module is specifically used to: based on a code generation request, obtain the source code file to which the first method belongs in the files of the programming project; in the source code file, obtain the context information whose writing position is after the first method; adjust the writing position of the context information to before the first method, so that the context information after the adjusted position becomes the context information of the first method; based on the context information of the first method, obtain the first context information.
  • the first acquisition module is further used to: obtain correlations between multiple contents in the first context information and the first method; and sort the multiple contents based on the correlations corresponding to the multiple contents to obtain updated first context information.
  • the first acquisition module is specifically configured to: arrange the multiple contents before the task description of the first method, and the distance from any content to the task description is inversely correlated with the relevance corresponding to the content.
  • obtaining the correlation between multiple contents in the first context information and the first method includes one or more combinations of the following: obtaining a first similarity between the identifier of each content and the relevant information of the first method, the first similarity is positively correlated with the correlation; obtaining the distance between each content and the first method in the hierarchy of the programming project, the distance is negatively correlated with the correlation; obtaining a second similarity between each content and the context information called by the associated content of the first method, the second similarity is positively correlated with the correlation, and the associated content of the first method includes one or more of the following: an associated class of the target class to which the first method belongs, and other methods in the target class.
  • the identifier includes one or more of the following: variable name, method name, package name, class name, and constant name.
  • the relevant information includes one or more of the following: method description, method name, return type, and parameter type.
  • the first acquisition module is also used to: generate a logical structure diagram of the programming project, the logical structure diagram is used to indicate the relationship between the contents of the programming project; the first acquisition module is specifically used to: analyze the logical structure diagram based on the code generation request, and obtain the first context information required to generate the first executable code from the information of the programming project.
  • the generation module is specifically used to: input the first context information and the code generation request into the pre-trained model to obtain the first executable code output by the pre-trained model.
  • the cloud service-based code generation device also includes: a second acquisition module, used to obtain a second executable code that implements a second method in a successfully compiled programming project; a third acquisition module, used to obtain second context information of the second executable code; a training module, used to use the second context information as the input of the model to be trained and the second executable code as the expected output of the model to be trained, to train the model to be trained, and obtain a pre-trained model.
  • a second acquisition module used to obtain a second executable code that implements a second method in a successfully compiled programming project
  • a third acquisition module used to obtain second context information of the second executable code
  • a training module used to use the second context information as the input of the model to be trained and the second executable code as the expected output of the model to be trained, to train the model to be trained, and obtain a pre-trained model.
  • the input of the model to be trained also includes one or more of the following: method annotation and method signature of the second method.
  • the second acquisition module is specifically used to: acquire all second methods in the successfully compiled programming project; screen all second methods to obtain the second methods that pass the screening, and the second methods that pass the screening are used to express the operation logic;
  • a second executable code is obtained that implements a second method that passes the screening.
  • the second method that fails the screening has one or more of the following characteristics: the method body of the second method is empty, the second method has a characteristic For special purposes, the method body of the second method does not include an operation expression.
  • special purposes include one or more of the following: get, set, construct, and return.
  • the context information includes one or more of the following: functions, access rights, and calling methods of defined classes, variables, and methods.
  • the present application provides a computing device including a memory and a processor, wherein the memory stores program instructions, and the processor runs the program instructions to execute the method provided in the first aspect of the present application and any possible implementation thereof.
  • the present application provides a computing device cluster, comprising at least one computing device, each computing device comprising a processor and a memory, the processor of at least one computing device being used to execute instructions stored in the memory of at least one computing device, so that the computing device cluster executes the method provided in the first aspect of the present application and any possible implementation thereof.
  • the present application provides a computer-readable storage medium, which is a non-volatile computer-readable storage medium, and the computer-readable storage medium includes program instructions.
  • the program instructions When the program instructions are executed on a computing device, the computing device executes the method provided in the first aspect of the present application and any possible implementation thereof.
  • the present application provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to execute the method provided in the first aspect of the present application and any possible implementation thereof.
  • FIG1 is a schematic diagram of an implementation scenario involved in a cloud service-based code generation method provided in an embodiment of the present application
  • FIG2 is a schematic diagram of an implementation scenario involved in another cloud service-based code generation method provided in an embodiment of the present application
  • FIG3 is a schematic diagram of an implementation scenario involved in another cloud service-based code generation method provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of an implementation process of a cloud service-based code generation method provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of a front-end interface of an integrated development environment provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of an interface provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of an implementation process of obtaining first context information provided by an embodiment of the present application.
  • FIG8 is a schematic diagram of a logic structure diagram provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of an implementation process of reorganizing code in an execution file and obtaining first context information according to the reorganization result provided by an embodiment of the present application;
  • FIG10 is a schematic diagram of a source code file provided in an embodiment of the present application.
  • FIG11 is a schematic diagram of a method for partitioning the contents of the source code file shown in FIG10 into code within the file, provided by an embodiment of the present application;
  • FIG12 is a schematic diagram of adjusting the writing position of the following information in FIG11 according to an embodiment of the present application.
  • FIG. 13 is a schematic diagram of an implementation process of expanding an external context of a file and acquiring first context information according to the expansion result, provided by an embodiment of the present application;
  • FIG14 is a schematic diagram of a source code file shown in FIG12 after external information is supplemented, provided by an embodiment of the present application;
  • FIG. 15 is a schematic diagram of an embodiment of the present application, in which the first context information obtained from the source code file shown in FIG. 14 is removed from the target information, abstracted into an interface declaration form, and sorted;
  • 16 is a schematic diagram of triggering a code generation request in an integrated development environment provided by an embodiment of the present application.
  • FIG17 is a schematic diagram of obtaining first context information provided by an embodiment of the present application.
  • FIG18 is a schematic diagram of a preview interface of first context information provided in an embodiment of the present application.
  • FIG19 is a schematic diagram of a generative pre-trained decoder model provided in an embodiment of the present application.
  • FIG20 is a schematic diagram of a preview interface of a first executable code provided in an embodiment of the present application.
  • 21 is a schematic diagram of inserting the first executable code into the source code file to which the first method belongs at the generation point provided by an embodiment of the present application;
  • FIG22 is a flow chart of a training method provided in an embodiment of the present application.
  • FIG. 23 is a flowchart of obtaining a second executable code for implementing a second method in a successfully compiled programming project provided by an embodiment of the present application;
  • FIG24 is a schematic diagram of context information annotated with placeholders provided in an embodiment of the present application.
  • FIG25 is a schematic diagram of a process of a cloud service-based code generation method provided in an embodiment of the present application.
  • FIG26 is a schematic diagram of the structure of a cloud service-based code generation device provided in an embodiment of the present application.
  • FIG27 is a schematic diagram of the structure of another cloud service-based code generation device provided in an embodiment of the present application.
  • FIG. 28 is a schematic diagram of the structure of a computing device cluster provided in an embodiment of the present application.
  • Code generation technology can indeed significantly reduce the cost of developers frequently switching between actual code writing and searching for knowledge, consulting documents, and finding reusable components, thereby improving development efficiency to a certain extent.
  • the current code generation technology mainly uses natural language processing technology in the process of generating code, it does not fully consider the logical structure of the code and programming projects, resulting in poor generation capabilities of code generation technology and poor actual use experience of code generation technology.
  • the embodiment of the present application provides a code generation method based on cloud services.
  • the method includes: receiving a code generation request, the code generation request is used to request the generation of a first executable code for implementing a first method in a programming project; based on the code generation request, obtaining first context information required for generating the first executable code from information of the programming project; then, based on the first context information and the code generation request, generating the first executable code.
  • the first context information includes project-level context information from the entire programming project, which can reflect the logical structure of the code and the programming project, so that the code generation process can make better use of the background knowledge required by human developers in actual programming and the overall logic of the programming project, thereby improving the ability to generate code and helping to improve the actual usage experience of the code generation technology.
  • the programming project is used to implement user services.
  • the information of the programming project can be regarded as the information in the project source code folder, which is a collection of subfolders and/or source code files, wherein the subfolders can also include subfolders and/or source code files, and so on, the lowest level subfolders include multiple source code files.
  • the hierarchical units in the programming project are: programming projects, code modules (some programming projects may not have), code packages, classes and methods (also called class methods).
  • the programming project includes one or more code modules, each code module includes one or more code packages, a code package includes one or more classes, a class includes one or more methods, and each method is used to implement one or more operation logics.
  • the project source code folder of the programming project includes one or more subfolders (referred to as first-level subfolders for ease of description), and the one or more first-level subfolders correspond to one or more code modules.
  • Each first-level subfolder also includes one or more subfolders (referred to as second-level subfolders for ease of description), and the one or more second-level subfolders correspond to one or more code packages.
  • Each second-level subfolder also includes one or more subfolders (referred to as third-level subfolders for ease of description), and the one or more third-level subfolders correspond to one or more classes.
  • Each third-level subfolder also includes one or more source code files, each of which is used to record executable code for implementing methods in a class. The computing device executes all executable code recorded in the source code file according to the relationship between the project source code folder, subfolders, and source code files of the programming project, thereby realizing the user's business.
  • the names of programming projects, code modules, code packages, classes and methods and the relations therebetween in the present application are exemplary descriptions and are not intended to limit the present application.
  • the names of programming projects, code modules, code packages, classes and methods and the relations therebetween may change with application scenarios.
  • methods are also referred to as functions. But if the thought generation code is generated by utilizing the embodiment of the present application, it should also fall within the scope protected by the present application.
  • FIG1 is a schematic diagram of an implementation scenario involved in a cloud service-based code generation method provided in an embodiment of the present application.
  • the implementation scenario includes: a computing device 10.
  • the computing device 10 is used to execute the cloud service-based code generation method provided in an embodiment of the present application.
  • the computing device 10 is used to receive a code generation request, and then, based on the code generation request, obtains first context information required to generate a first executable code from information of a programming project, and then generates a first executable code based on the first context information and the code generation request.
  • the computing device 10 can be implemented by a physical machine, a physical machine cluster including multiple physical machines, a graphics card, an artificial intelligence computing chip, a bare metal server, a cloud server, a virtual machine or a container.
  • the computing device 10 can be independently deployed on a physical machine, a physical machine cluster, a bare metal server, a cloud server, a virtual machine or a container.
  • the computing device 10 can be distributedly deployed on one or more of multiple physical machines, multiple physical machine clusters, multiple bare metal servers, multiple cloud servers, multiple virtual machines and multiple containers.
  • FIG. 1 is a schematic diagram of the structure of a computing device provided in the embodiment of the present application.
  • the computing device 10 includes a processor 101, a memory 102, a communication interface 103, and a bus 104. Among them, the processor 101, the memory 102, and the communication interface 103 are connected to each other through the bus 104.
  • the computing device 10 can be a server or a terminal device. It should be understood that the present application does not limit the number of processors and memories in the computing device 10.
  • Processor 101 may include a general-purpose processor and/or a dedicated hardware chip.
  • a general-purpose processor may include any one or more of a central processing unit (CPU), a microprocessor (MP) or a graphics processing unit (GPU).
  • the CPU is, for example, a single-core processor (single-CPU) or a multi-core processor (multi-CPU).
  • a dedicated hardware chip is a hardware module for high-performance processing.
  • a dedicated hardware chip includes at least one of a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or a network processor (NP).
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • NP network processor
  • Processor 101 may also be an integrated circuit chip with signal processing capabilities. In the implementation process, some or all of the functions of the cloud service-based code generation method of the present application may be completed by hardware integrated logic circuits or software instructions in processor 101.
  • the memory 102 is used to store computer programs, which include an operating system 102a and executable code (i.e., program instructions) 102b.
  • the memory 102 is, for example, a read-only memory or other type of static storage device that can store static information and instructions, or a random access memory or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory, a read-only optical disc or other optical disc storage, an optical disc storage (including a compact disc, a laser disc, an optical disc, a digital versatile disc, a Blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired executable code in the form of an instruction or data structure and can be accessed by a computer, but is not limited thereto.
  • the memory 102 is used to store an outbound port queue, etc.
  • the memory 102 is, for example, independent and connected to the processor 101 via the bus 104. Or the memory 102 and the processor 101 are integrated together.
  • the memory 102 may include a volatile memory, such as a random access memory (RAM).
  • the processor 101 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid state drive (SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • the memory 102 can store executable code.
  • the processor 101 is used to implement some or all of the functions of the cloud service-based code generation method provided in the embodiment of the present application. That is, the memory 102 stores instructions for implementing some or all of the functions of the cloud service-based code generation method.
  • the memory 102 may also include software modules and data required for other running processes such as the operating system.
  • the communication interface 103 uses a transceiver module such as, but not limited to, a network interface card and a transceiver to achieve communication with other devices or communication networks.
  • a transceiver module such as, but not limited to, a network interface card and a transceiver to achieve communication with other devices or communication networks.
  • the communication interface 103 can be any one or any combination of the following devices: a network interface (such as an Ethernet interface), a wireless network card, and other devices with network access functions.
  • the bus 104 is any type of communication bus for interconnecting the internal devices of the computing device (e.g., the memory 102, the processor 101, and the communication interface 103).
  • the bus 104 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus.
  • the bus may be divided into an address bus, a data bus, a control bus, and the like.
  • FIG1 is represented by only one line, but it does not mean that there is only one bus or one type of bus.
  • the bus 104 may include a path for transmitting information between the various components of the computing device 10 (e.g., the memory 102, the processor 101, and the communication interface 103).
  • the embodiment of the present application takes the above-mentioned devices inside the computing device as an example of interconnecting through the bus 104.
  • the above-mentioned devices inside the computing device 10 may also be connected to each other in communication with each other using other connection methods other than the bus 104.
  • the above-mentioned devices inside the computing device 10 are interconnected through an internal logical interface.
  • the above-mentioned multiple devices can be respectively arranged on independent chips, or at least partially or completely arranged on the same chip. Whether to independently arrange each device on different chips or to integrate and arrange it on one or more chips often depends on the needs of product design.
  • the embodiments of the present application do not limit the specific implementation form of the above-mentioned devices.
  • the descriptions of the processes corresponding to the above-mentioned figures have different focuses. For the parts not described in detail in a certain process, please refer to the relevant descriptions of other processes.
  • the cloud service-based code generation method provided in the embodiment of the present application has multiple application scenarios, and the following two examples are used to illustrate them:
  • the cloud service-based code generation method provided in the embodiment of the present application can be provided to users in the form of a code generation service.
  • the code generation service can be provided in the form of an application programming interface (API).
  • API application programming interface
  • the service provider has a large number of basic resources, such as computing resources, storage resources, and network resources, and the service provider can use the basic resources to provide code generation services.
  • the service provider can provide the code generation service to the user.
  • the user can purchase the code generation service from the management platform of the service provider through the client, and send the information of the programming project (such as all files of the programming project) and the code generation request to the management platform of the service provider.
  • the management platform of the service provider can assign the task of generating the first executable code to the computing device of the service provider, and send the information of the programming project and the code generation request to the computing device to which the task is assigned.
  • the computing device is used to execute the cloud service-based code generation method provided in the embodiment of the present application based on the information of the programming project and the code generation request, generate a first executable code, and provide the generated first executable code to the client to complete the process of providing code generation services to users.
  • the user can send a code generation request to the management platform through the client, and the management platform can provide the user's client with a tool for obtaining the first context information from the information of the programming project, so that the client obtains the first context information from the information of the programming project based on the tool and sends the first context information to the management platform.
  • the service provider's computing device is used to generate a first executable code based on the first context information and the code generation request, and provide the generated first executable code to the client, so as to complete the process of providing code generation service to the user.
  • the implementation scenario may also include: a client 20, which is used to implement interaction between the user and the management platform.
  • the client 20 can be a mobile phone, a tablet computer, a personal computer, a virtual machine, a container, a laptop computer, a mobile phone, a multimedia player, a smart home appliance, an artificial intelligence device, a smart wearable device, an e-reader, a smart vehicle device or an Internet of Things device, etc.
  • the computing device 10 can be a server.
  • the server can be a server, or a server cluster composed of several servers, or a cloud computing service center.
  • a large number of basic resources owned by a cloud service provider are deployed in the cloud computing service center.
  • computing resources, storage resources, and network resources are deployed in the cloud computing service center.
  • the cloud computing service center can use this large amount of basic resources to run the file system and implement the cloud service-based code generation method provided in the embodiment of the present application.
  • the server When the server is implemented through a cloud computing service center, the server provides the user with the function of generating code, which can be abstracted into a code generation cloud service by the cloud service provider on the cloud platform. After the user purchases the code generation cloud service on the cloud platform, the cloud platform can use the resources in the cloud computing center and use the cloud service-based code generation method provided in the embodiment of the present application to provide the code generation cloud service.
  • the code generation cloud service can be provided as an independent cloud service or as an additional service of other cloud services.
  • the cloud platform can be a cloud platform of a central cloud, or a cloud platform of an edge cloud.
  • the cloud platform can also be a cloud platform including a central cloud and an edge cloud.
  • the computing device 10 can be partially deployed in the cloud platform of the edge cloud and partially deployed in the cloud platform of the central cloud, and the embodiment of the present application does not make specific limitations on it.
  • the computing device 10 can also be implemented through other resource platforms besides the cloud platform, and the embodiment of the present application does not make specific limitations on it.
  • the server can be implemented through resources in other resource platforms and provide users with related services for generating code.
  • the cloud service-based code generation method provided in the embodiment of the present application can be provided to the user in the form of an application package.
  • a service provider can provide the user with an application package for implementing the cloud service-based code generation method provided in the embodiment of the present application.
  • the computing device 10 is a computing device owned by the user. After the user obtains the application package from the service provider, the user can install the application package in the computing device 10 to obtain an application for executing the cloud service-based code generation method.
  • the application can execute the cloud service-based code generation method based on the code generation request to generate a first executable code.
  • the cloud service-based code generation method can be provided to the user in the form of a plug-in for a code editor or an integrated development environment (IDE).
  • IDE integrated development environment
  • the computing device 10 may be a client.
  • the implementation scenario may further include: a computing device 30, which is used to provide the computing device 10 with an application package for implementing the cloud service-based code generation method provided in the embodiment of the present application.
  • the computing device 30 may be a server, etc.
  • the implementation process of the cloud service-based code generation method may include the following steps:
  • Step 401 Receive a code generation request, where the code generation request is used to request generation of a first executable code for implementing a first method in a programming project.
  • a code generation request is triggered when the developer's operation meets the triggering condition.
  • the code generation request is used to request the computing device to perform a generation task and obtain a first executable code.
  • the code generation request carries a task description of the generation task, and the computing device specifically performs the generation task based on the task description, thereby generating the first executable code.
  • the task description is equivalent to the "question surface", which is used to indicate the characteristics (such as functions) of the executable code to be generated.
  • the process of the computing device generating executable code based on the task description can be regarded as the process of solving the problem based on the "question surface".
  • the task description will also indicate a generation point, which is used to indicate the location in the programming project for inserting the generated first executable code.
  • the generation point can be the location that triggers the code generation request.
  • the developer can perform a specified operation at the location where the first executable code needs to be continued to be written to trigger the code generation request, and the location where the first executable code needs to be continued to be coded is the generation point.
  • the integrated development environment detects the operation of adding the comment and can trigger the code generation request, and the end of the comment is the generation point.
  • FIG5 is a schematic diagram of a front-end interface of an integrated development environment provided in an embodiment of the present application. As shown in FIG5, the input cursor is located at the position of the black selection box in FIG5 (i.e., at the beginning of the 19th line where the cursor is located in FIG5), and this position is the generation point.
  • the method in the integrated development environment can be implemented by one or more lines of executable code, and the method is used to implement a computing task.
  • the cloud service-based code generation method provided in the embodiment of the present application can also be used to generate executable code for other units. For example, it can be used to generate executable code for code lines, code packages, or even code modules, which is not specifically limited in the embodiment of the present application.
  • Step 402 Receive a range indication, where the range indication is used to indicate a range for obtaining the first context information.
  • a computing device may generate an executable code of a first method according to the first context information of the first method to be generated.
  • the hierarchical units involved in the programming project are, from large to small, programming projects, code modules, code packages, classes, and methods.
  • the first context information can be obtained in different hierarchical units. Therefore, a user may send a range indication to a computing device through a client to indicate the acquisition range of the first context information.
  • the cloud service-based code generation method provided in an embodiment of the present application may also include: a computing device receives a range indication.
  • the alternative acquisition range of the first context information may include: a programming project, a code module, a code package, a class, and a source code file to which the first method belongs.
  • a standard library or a tripartite library may also be used, so the alternative acquisition range of the first context information also includes a standard library or a tripartite library.
  • FIG6 is a schematic diagram of an interface for a user to select an acquisition scope provided in an embodiment of the present application.
  • Context-aware Scope represents an option for the acquisition scope
  • the options for the acquisition scope include automatic (auto), programming project (repository), code package (package), source code file (file) and class (class) to which the first method belongs.
  • the user can select the acquisition scope of the first context information from the multiple options.
  • automatic means adaptively acquiring the first context information in the entire programming project according to the generation point.
  • Programming project, code package, source code file and class indicate acquiring the first context information within their respective limited scopes.
  • the acquisition scope can be automatically set by default. When the user thinks it is necessary to select other acquisition scopes, other acquisition scopes can be selected through the interface.
  • the context information is the relevant information required to perform the task.
  • the context information includes one or more of the following: functions, access rights, and calling methods of defined classes, variables, and methods.
  • Step 403 Receive a removal instruction, where the removal instruction is used to indicate whether to remove target information in the first context information, where the target information includes one or more of the following: code comments, variable assignments, method bodies, and information indicating underlying logic of the code.
  • the first context information may include the implementation details of the calling method or the variable assignment.
  • the comments, constants, strings, code underlying logic and other locations in the code may contain sensitive data (such as credentials, passwords, secret keys and system information, etc.) or user privacy information (such as IP addresses, user names, personal information, etc.), but this information is not required for generating the first executable code.
  • the context information may contain some privacy information or sensitive information. If this information is leaked, it will affect the security and privacy of the user's code assets. Therefore, the user can determine whether to remove the target information in the first context information according to the application requirements to avoid the leakage of the target information.
  • the target information may include one or more of the following: the implementation details of the calling method, the method body, the variable assignment, the code annotation and the information indicating the underlying logic of the code.
  • the removal instruction may indicate that a removal operation needs to be performed, and the computing device receives the removal instruction to perform the removal operation.
  • information removal can be performed according to a preset strategy without the user indicating the specific type of target information to be removed, that is, without indicating whether the target information is implementation details of a calling method, variable assignment, or sensitive information contained in the code.
  • the removal instruction can not only indicate that a removal operation needs to be performed, but also indicate the specific type of target information to be removed.
  • FIG6 is a schematic diagram of an interface for a removal indication provided in an embodiment of the present application.
  • the user can select to turn on the option of the removal function to indicate that the target information in the first context information is to be removed, or the user can select to turn off the option of the removal function to indicate that the target information in the first context information does not need to be removed.
  • the removal indication only indicates whether the removal operation needs to be performed.
  • Context Desensitization in FIG6 represents the option of the context desensitization function, that is, the removal operation in FIG6 is implemented by desensitization processing.
  • Step 404 Receive a preview indication, where the preview indication is used to indicate whether to preview the first context information.
  • the user can indicate whether to view the obtained first context information according to needs.
  • the user can perform corresponding operations to trigger a preview indication, and convey his needs to the computing device through the preview indication.
  • FIG6 is a schematic diagram of a preview indication interface provided in an embodiment of the present application.
  • the user can select to turn on the preview function option to indicate that the first context information needs to be previewed, or the user can select to turn off the preview function option to indicate that the first context information does not need to be previewed.
  • "Context-Preview" in FIG6 represents the option of previewing the first context information.
  • the range indication, removal indication, and preview indication may be carried in the code generation request, or may be sent independently of the code generation request, and the embodiments of the present application do not specifically limit it.
  • the computing device may provide the function of receiving some or all of the range indication, removal indication, and preview indication, or may not provide them, and the specific implementation may be set according to the application requirements. Accordingly, when the computing device does not provide some or all of the functions of the range indication, removal indication, and preview indication, there is no need to perform the corresponding steps in the process of generating the first executable code.
  • Step 405 Generate a logical structure diagram of the programming project, where the logical structure diagram is used to indicate the relationship between various contents in the programming project.
  • the computing device can analyze the various contents in the programming project and generate a logical structure diagram of the programming project based on the association relationship of the various contents.
  • the logical structure diagram is a data structure in a graph format.
  • the various contents in the programming project can be obtained based on semantic division.
  • the various contents can be the contents of hierarchical units such as programming projects, code modules, code packages, classes, and methods.
  • the logical structure diagram can be represented by a project structure diagram with a tree structure, each hierarchical unit in the programming project can be expanded, and the root node, intermediate node and leaf node of the logical structure diagram are used to represent the corresponding hierarchical unit, and according to the association relationship between the hierarchical units, edges are added between the root node, the intermediate node and the leaf node to obtain a logical structure diagram representing the hierarchy and reference relationship between the hierarchical units in the programming project.
  • the root node represents the programming project.
  • the hierarchical unit represented by the leaf node is the same as the hierarchical unit to which the executable code to be generated belongs.
  • the hierarchical unit of the executable code to be generated by the cloud service-based code generation method is a method
  • the hierarchical unit represented by the leaf node is a method
  • the intermediate node represents a hierarchical unit with a granularity between the programming project and the method.
  • the hierarchical units represented by different intermediate nodes can be code modules, code packages and classes, respectively.
  • the logical structure diagram can be represented by a B+ tree diagram.
  • the programming project, code module, code package, class file and source code file can be expanded respectively, the root node of the B+ tree is used to represent the programming project, the leaf node of the B+ tree is used to represent the method, and the intermediate nodes of different levels of the B+ tree are used to represent the different hierarchical units between the programming project and the method, and according to the association relationship between the programming project, code module, code package, class file and the method in the source code file, add edges between the root node, the intermediate node and the leaf node to obtain a B+ tree diagram representing the hierarchy and reference relationship between the programming project, code module, code package, class file and method.
  • Figure 7 is a schematic diagram of the implementation process of the computing device obtaining the first context information.
  • the computing device can determine the programming project to which the first method belongs based on the code generation request, and then generate the logical structure diagram of the programming project.
  • FIG8 is a schematic diagram of the generated logical structure diagram.
  • the wireframes of the same shape in FIG8 represent hierarchical units of a granularity, and the granularities of the multiple hierarchical units in FIG8 decrease in sequence from top to bottom in FIG8 .
  • the solid arrows in FIG8 represent the relationship between different hierarchical units, the dotted arrows in FIG8 represent the relationship between hierarchical units of the same granularity, and the import in FIG8 is an import statement in Java.
  • Step 406 When the scope indication is used to indicate that the acquisition scope of the first context information is a programming project, based on the logical structure diagram of the programming project, obtain the permission scope that the first method has permission to access in the information of the programming project.
  • the permission range that the first method has permission to access can be divided, and then the first context information is obtained within the permission range.
  • the first context information only needs to be obtained within the permission range that the first method has permission to access, which can avoid introducing content that the first method does not have permission to access into the first method, thereby ensuring the efficiency and effectiveness of the obtained first context information.
  • the implementation process of step 406 includes: based on the position of the first method in the programming project, the The scope of authority is obtained by obtaining at least one of the hierarchy of the target class, the reference relationship and the access control authority of the target class.
  • the position of the first method in the programming project can be represented by a generation point.
  • the hierarchy of the target class represents the hierarchy of the target class in the hierarchical unit of the programming project.
  • the reference relationship of the target class represents the content referenced by the target class.
  • the access control authority of the target class can be set by the developer to limit the access control authority of the executable code in the target class. For example, taking the Java language code as an example, the access control authority of the accessed object can be public, protected, private and default.
  • the access control authority of the accessed object When the access control authority of the accessed object is public, it means that the accessed object can be accessed by the access objects of the entire programming project. When the access control authority of the accessed object is private, it means that the accessed object can only be accessed by the access objects within the scope of the accessed object. For example, when the access control authority of the class is private, the information of the class can only be accessed by the access objects such as methods belonging to the class. When the access control authority of the accessed object is default, it means that the access control authority of the accessed object complies with the default provisions of the programming language. The Java language stipulates that the contents in the same hierarchical unit can be directly interoperable.
  • the Java language stipulates that the multiple source code files can be directly used with each other, and the multiple source code files have the right to access each other.
  • Protected access rights increase the access rights of subclasses to the accessed objects on the basis of the default.
  • Subclasses are classes obtained through inheritance.
  • Table 1 is the provision for access rights in the Java language, where "Yes” means having access rights and "No” means having no access rights.
  • “Yes” in the second row and second column of Table 1 means that the accessing object that belongs to the same class as the accessed object has access rights to the accessed object.
  • the position of the first method in the programming project, the hierarchy of the target class, the reference relationship, and the scope of authorized access can be determined first, and then the scope that the target class needs to access is determined based on the position of the first method in the programming project, the hierarchy of the target class, and the reference relationship, and the intersection of the scope that the target class needs to access and the scope of authorized access is determined as the scope of authority that the first method has access to.
  • a logical structure diagram of the programming project when determining the scope of authority, the nodes represented by the logical structure diagram and the relationship between the nodes can be analyzed according to the above description to determine the scope of authority, and the scope of authority can be regarded as a subgraph of the logical structure diagram.
  • the scope of authority that the first method has access to is divided in the logical structure diagram, and the scope of authority is as shown in the dotted box circled in Figure 8.
  • Step 407 Analyze the subgraph within the authority range in the logic structure diagram based on the code generation request, and obtain the first context information required to generate the first executable code from the information of the programming project.
  • the computing device can acquire the first context information in the entire programming project. In this way, the computing device can consider the logical structure of the code and the programming project when generating the first executable code, which helps to improve the code generation capability.
  • the executable code for implementing the method is usually stored in a source code file, so the first context information may include the content in the source code file to which the first method belongs, and may also include the content outside the source code file to which the first method belongs.
  • the first context information can be obtained from the source code file to which the first method belongs and outside the source code file.
  • obtaining the first context information required to generate the first executable code includes: performing processes such as expanding the context outside the execution file and reorganizing the code inside the execution file, and obtaining the first context information according to the processing results.
  • the implementation process of reorganizing the code in the execution file and obtaining the first context information according to the reorganization result includes:
  • Step 407a1 based on the code generation request, obtain the source code file to which the first method belongs from the files of the programming project.
  • the executable code of the method is stored in the source code file, and the source code file to which the first method belongs can be located in the file of the programming project based on the code generation request.
  • the process of locating the source code file to which the first method belongs is actually the process of determining the source code file where the generation point is located based on the generation point indicated by the code generation request. For example, as shown in FIG7, after receiving the code generation request, the computing device can determine the source code file to which the first method belongs based on the code generation request, and the content of the source code file is shown in FIG10.
  • Step 407a2 in the source code file to which the first method belongs, obtain the context information whose writing position is after the first method.
  • the writing position of the first method in the source code file can be obtained, and then the context information whose writing position is located after the first method in the source code file is obtained.
  • the position of the generation point in the source code file is the writing position of the first method in the source code file.
  • the context information of the first method is the context information of the first method.
  • all information whose writing position is located before the generation point is the context information of the first method.
  • the context information generally includes code packages and import statements, signatures of the class in which it is located, partial declarations of the class in which it is located (such as declarations of member variables, constructors, partial methods, etc.).
  • the context information generally includes other method declarations of the class in which it is located.
  • the process of obtaining the context information is actually the process of partitioning the content in the source code file, that is, the process of file code partitioning, which divides the content in the source code file into the context information and context information of the first method.
  • the process of file code partitioning which divides the content in the source code file into the context information and context information of the first method.
  • the context information the context information
  • the first method the context information.
  • the content before the arrows pointing to 5.1, 5.2 and 5.3 in Figure 11 is the above information
  • the content before the arrow pointing to 5.4 is the first method
  • the content before the arrow pointing to 5.5 is the following information.
  • Figures 11, 12, 14 and 15 are related, for the convenience of description, numbers are marked before each box in Figures 11, 12, 14 and 15, and arrows are marked after the boxes to indicate the relationship between the figures, and the numbers before any box in Figures 11, 12, 14 and 15 are used to indicate the box, and the arrows after the box are used to indicate the content of the box after processing.
  • Step 407a3 Adjust the writing position of the following information to before the first method, so that the following information after the adjustment becomes the previous information of the first method.
  • the order in which methods are written in source code files is not very important, and exchanging the order of different methods will not affect the execution and semantics of executable programs.
  • the order between methods in compiled languages such as Java is not important, and exchanging the order does not affect program execution and semantics.
  • the first context information is usually perceived in a top-to-bottom order of natural language. The order change of different contents in the first context information may affect semantics, and even fail to perceive the following information, resulting in difficulty in obtaining and utilizing global information such as background knowledge, structure, and code of programming projects.
  • the pre-trained language model when using a pre-trained language model to generate the first executable code, the pre-trained language model usually adopts a top-to-bottom training order similar to natural language, and cannot perceive the following. Therefore, after obtaining the following information of the first method, the following information can be adjusted to before the first method, so that the following information of the first method becomes the above information of the first method, so as to achieve the purpose of reorganizing the code in the source code file.
  • the positions of the context information, the first method and the context information in the source code file are shown in FIG12, and it can be seen that the writing order of the context information indicated by 5.5 in FIG12 has been adjusted to be before the writing position of the first method indicated by 5.4 in FIG12.
  • the above steps 407a1 to 407a3 are the process of reorganizing the code in the source code file.
  • Step 407a4 Based on the above information of the first method, obtain first context information.
  • the first context information can be obtained based on the context information in the reorganized source code file. For example, all the context information in the reorganized source code file can be determined as the first context information. For another example, when it is necessary to perform other processing on the reorganized source code file, other processing (such as file-outside context expansion) can be performed on the reorganized source code file, and the first context information can be obtained based on the source code file that has undergone other processing.
  • other processing such as file-outside context expansion
  • the implementation process of expanding the context outside the file and acquiring the first context information according to the expansion result includes:
  • Step 407b1 Based on the code generation request, obtain the source code file to which the first method belongs from the files of the programming project.
  • step 407b For the implementation process of step 407b1, please refer to the implementation process of step 405a1, which will not be repeated here.
  • Step 407b2 Analyze the subgraphs in the logic structure diagram that are within the authority range based on the code generation request, and obtain external information outside the source code file that is used by the source code file from the information of the programming project.
  • the external information can be determined according to the association relationship between the source code file and other hierarchical units in the programming project. For example, when the logical structure diagram is analyzed based on the code generation request, the external information can be determined according to the edge between the node used to represent the content in the source code file and other nodes in the logical structure diagram.
  • the content outside the source code file is the external information.
  • the executable code in the source code file indicates that the content in other hierarchical units needs to be used
  • the content indicated to be used can be determined as external information according to the indication.
  • the import statement is an import statement in Java, which indicates that the content in other hierarchical units needs to be used.
  • the information imported outside the source code file can be determined according to the instructions of the import statement, and the information is determined It is external information.
  • the executable code in the current source code file indicates the need to use the content in other hierarchical units, which may not be the content explicitly indicated.
  • it is necessary to determine the source code file needs to use the content in other hierarchical units based on the actual content represented by the executable code in the source code file.
  • the content in the same hierarchical unit can be directly interchangeable (just like multiple source code files in a code package can be directly interchangeable) and does not need to be imported through import statements. Therefore, it is necessary to determine the external information based on the actual content represented by the executable code in the source code file.
  • the external information used by the source code file and located outside the source code file includes: obtaining the external information within the scope of authority. That is, when obtaining external information, the external information is obtained from the information outside the source code file to which the first method belongs and within the scope of authority, without obtaining the external information outside the scope of authority.
  • the subgraph located within the scope of authority in the logical structure diagram is analyzed based on the code generation request, and the external information located outside the source code file used by the source code file is obtained in the information of the programming project.
  • the implementation process of obtaining external information within the scope of authority is different from the implementation process of obtaining external information in the entire programming project, and the only difference is the size of the alternative scope of obtaining external information.
  • the implementation method of obtaining external information within the alternative scope is the same. Therefore, the implementation method of obtaining external information within the scope of authority is not repeated here.
  • Step 407b3 Supplement the external information in the source code file to which the first method belongs.
  • the external information can be supplemented in the source code file to which the first method belongs, so as to achieve the purpose of expanding the context based on the external information outside the source code file, so as to generate the first executable code based on the external information and the source code file.
  • the implementation method of supplementing the external information in the source code file may include: replacing the content indicating the external information at the location where the external information is used in the source code file. For example, when an import statement is used in a source code file to import external information, the import statement can be replaced with the external information.
  • FIG14 is a schematic diagram after the external information is supplemented in the source code file shown in FIG12. According to FIG14, it can be seen that the external information is supplemented to part 5.1 in FIG12, and part 5.1 after the external information is supplemented is 6.1 in FIG14.
  • Step 407b4 Obtain first context information based on the supplemented source code file.
  • the first context information can be obtained based on the supplemented source code file. For example, all the information in the supplemented source code file can be determined as the first context information. For another example, when it is necessary to perform other processing on the supplemented source code file, other processing (such as reorganization of the code in the file) can also be performed on the supplemented source code file, and the first context information can be obtained based on the source code file that has undergone other processing.
  • Step 408 When the removal instruction is used to indicate removal of target information in the first context information, remove the target information from the first context information to obtain updated first context information.
  • the target information can also be removed in the first context information to update the first context information.
  • information irrelevant to the generation of the first executable code information involving privacy and/or sensitive information in the first context information can be deleted, so that the hierarchical structure of the content in the first context and the relevant content of the signature information are retained in the first context.
  • the purpose of compressing the first context information can be achieved, so that context information containing more valuable content can be input under the same input length, and on the other hand, the availability of the generated code, user privacy and code security can be guaranteed.
  • the code generation model When the code generation model is used to generate the first executable code based on the first context information, due to the input length of the code generation model (also called the window size, usually 1024 to 4096) has certain restrictions, in the case of limited input length, if the first context information obtained by the above steps is directly used, redundant information will be introduced into the code generation model, limiting the amount of context information available to the code generation model, thereby reducing the efficiency of the use of context information. Therefore, by compressing the first context information, it is possible to delete information in the first context information that is not very useful for generating the first executable code, so that more useful context information can be input within a limited input length, thereby improving the efficiency of using the context information and thus improving the code generation capability.
  • the window size also called the window size, usually 1024 to 4096
  • the target information such as “private void clean() ⁇ ... ⁇ ” and “public String readConfig() ⁇ ... ⁇ ” in 6.1 of FIG14 , the IP address “192.168.11.34” in 6.3, and the identifier “/**start listening to 8080**/” in 6.5 are removed.
  • a pre-set removal rule may be used to remove the target information.
  • the removal rule indicates that all content in the first context information that matches the target information is removed.
  • a third-party tool such as ShiftLeft may be used to remove the target information.
  • Step 409 abstract the first context information into an interface declaration form to obtain updated first context information.
  • the first context information may be abstracted to be an interface declaration (interface declaration) form that conforms to the syntax, so as to achieve the purpose of reusing the programming language grammar knowledge learned by the pre-trained model during the training process.
  • interface declaration interface declaration
  • abstracting the first context information into an interface declaration form may include: converting the class (such as class) in the first context information into an interface (such as interface). Among them, this process is also called standardizing the first context information according to the interface representation. As shown in Figure 15, the class in 6.1 in Figure 14 is replaced with interface.
  • Step 410 sort the multiple contents based on the relevance between the multiple contents in the first context information and the first method to obtain updated first context information.
  • the multiple contents in the first context information can also be sorted, so that the multiple contents in the first context information are input to the code generation model in the sorted order, thereby reducing the probability that the content with greater relevance to the first method is truncated due to the input length limit when inputting the code generation model, so that the context information with higher importance can be input into the code generation model.
  • the first executable code is generated using the code generation model, not only the first context information needs to be input into the code generation model, but also the task description of the first method needs to be input into the code generation model.
  • the code generation model When the code generation model reads the input information, it first reads the task description, and then reads the content close to the task description according to the distance from the content to the task description, and then reads the content far from the task description. Therefore, in an implementable manner, based on the relevance of the multiple contents to the first method, the multiple contents are sorted, including: arranging the multiple contents before the task description of the first method, and the distance of any content to the task description is anti-correlated with the relevance corresponding to the content. Furthermore, a relevance threshold may be set. When the relevance of any content to the first method is less than the relevance threshold, the content may be deleted from the first context information.
  • implementations for obtaining the correlation between the multiple contents in the first context information and the first method, which are described below using the following implementations as examples.
  • implementations may include one or more of the following combinations:
  • obtaining the correlation between multiple contents in the first context information and the first method includes: obtaining the first similarity between the identifier of each content and the relevant information of the first method.
  • the first similarity between any content and the first method is positively correlated with its correlation, that is, the greater the first similarity between the content and the first method, the greater the correlation between the content and the first method.
  • the identifier includes one or more of the following: variable name, method name, package name, class name and constant name.
  • the relevant information of the first method includes one or more of the following: method description of the first method, method name of the first method, return type of the first method and parameter type of the first method.
  • obtaining the correlation between multiple contents and the first method in the first context information includes: obtaining the distance between each content and the first method in the programming project.
  • the distance between any content and the first method in the programming project is inversely correlated with its correlation, that is, when the distance between the content and the first method in the programming project is greater, the correlation between the content and the first method is greater.
  • the distance difference between different code modules, different code packages, different classes and different methods in the programming project can be pre-set.
  • the hierarchical unit where the content and the first method are located can be determined, and then the hierarchical unit spanned by the two can be determined according to the hierarchical unit where the content is located and the hierarchical unit where the first method is located, and then the distance between the content and the first method in the programming project can be obtained according to the distance difference corresponding to the spanned hierarchical unit.
  • the sum of the distance differences corresponding to the hierarchical units spanned by the two can be determined as the distance between the content and the first method in the programming project.
  • the distance between the content and the first method in the programming project can be determined according to the number of jumps that the node on the logical structure diagram of the content passes through to reach the node on the logical structure diagram of the first method.
  • the distance between the content and the first method in the programming project can be determined directly by the number of jumps that the node on the logical structure diagram of the content passes through to reach the node on the logical structure diagram of the first method.
  • weights can also be set for different hierarchical units, and the distance between the content and the first method in the programming project can be determined according to the weight of the jumps that the node on the logical structure diagram of the content passes through to reach the node on the logical structure diagram of the first method.
  • the number of hops from the node of the first content on the logical structure diagram to the node of the first method on the logical structure diagram is at least 3, and when the second content and the first method have an inheritance relationship, the number of hops from the node of the second content on the logical structure diagram to the node of the first method on the logical structure diagram may be 2, then the hierarchical distance between the second content and the first method in the programming project is smaller than the hierarchical distance between the first content and the first method in the programming project, and the second content is more correlated with the first method than the first content.
  • obtaining the correlation between the first method and the plurality of contents in the first context information includes: obtaining the correlation between each content and the first method.
  • the second similarity of the context information called by the associated content of the first method is positively correlated with the relevance thereof, that is, the greater the second similarity of the context information called by the associated content of the content and the associated content of the first method, the greater the relevance of the content to the first method.
  • the associated content of the first method includes one or more of the following: an associated class of the target class to which the first method belongs, and other methods in the target class except the first method.
  • the associated class of the target class means that the services implemented by the class are associated.
  • the class used to obtain the names of students in the class, the class used to obtain the grades of students in the class, and the class used to obtain the name of the head teacher of the class all need to obtain relevant information about the class, so the three are associated classes.
  • the process of obtaining the second similarity based on the context information called by the associated class is equivalent to obtaining the second similarity between the content in the first context information and the context information called by the associated class, under the premise that the target class to which the first method belongs has a large correlation with its associated class, to infer the probability that the first method calls the content, thereby obtaining the correlation between the content in the first context information and the first method.
  • this implementation method can also be called obtaining the coupling degree between the content in the first context information and the first method.
  • the other method can be a method represented by a brother node of the leaf node representing the first method.
  • the process of obtaining the second similarity based on the context information called by the other method is equivalent to obtaining the second similarity between the content in the first context information and the context information called by the other method, under the premise that the first method and the other method have a greater correlation, to infer the probability that the first method calls the content, thereby obtaining the correlation between the content in the first context information and the first method.
  • This implementation can also be called obtaining the coupling degree between the content in the first context information and the first method.
  • Figure 15 is a schematic diagram of the first context information obtained from the source code file shown in Figure 14 after removing the target information, abstracting it into an interface declaration form, and sorting it.
  • the first context information obtained includes: project-level context, dependency library context, and file-level context.
  • the project-level context is the content of 7.1 in Figure 15
  • the dependency library context is the content of 7.2 in Figure 15
  • the file-level context is the content of 7.3 and 7.5 in Figure 15.
  • the task description of 7.4 in Figure 15 is also obtained, and the "???" in the task description is used to mark the generation point.
  • the types of the various contents in the context information included therein may be different.
  • different processing methods in the above steps 407 to 410 may be adopted for different types to obtain effective and accurate first context information.
  • the source code file to which the first method belongs may include: project internal import statements, standard library and third-party library import statements, the class where the generation point is located, and other classes defined in the file header.
  • the standard library and third-party library import statements they can be directly retained as part of the first context information.
  • the class where the generation point is located and other classes defined in the file they can be abstracted into the form of interface declarations and sorted according to their relevance to the first method.
  • project internal import statements For the project internal import statements, they can be processed using the context expansion outside the file.
  • project internal import statements such as package and import statements
  • other classes in the project that the current class depends on such as interfaces or enumeration types, etc.
  • their contents can be expanded to analyze the member variable names, method signatures, constants and their access control keywords defined therein, and specific assignment statements, initialization blocks, method bodies and other parts can be removed, and abstracted into interface declaration forms, and sorted according to their relevance to the first method.
  • FIG. 16 is a schematic diagram of triggering a code generation request in an integrated development environment.
  • the generation point in FIG. 16 is located after the signature of the init() method (i.e., at the beginning of the 16th line where the cursor is located in FIG. 16 ).
  • the context in which the writing position in the current source code file is located within a certain range near the generation point is usually sent to the code generation model in the form of text as a request, so that the code generation model can use the context to generate executable code.
  • the input window of the code generation model is 1024, at most 1024 tokens before and after the cursor position are input into the code generation model.
  • contexts include method declarations, comments, member variable declarations above init(), class declaration statements, import statements, etc., and some models (such as Meta's InCoder) also allow other content under init() to be used as input.
  • some models such as Meta's InCoder
  • Meta's InCoder also allow other content under init() to be used as input.
  • the code generation model since the information obtained is at most the content in the current source code file, when the code generation model generates executable code based on these contexts, it tends to generate executable code according to the code implementation scheme with a higher frequency in the training data, such as directly using ServerSocket, ServerHandler and other classes from scratch.
  • the executable code generated in this way is too low-level and may contain code from other projects and unimported dependencies, which does not meet user expectations and leads to a poor user experience.
  • FIG17 is a schematic diagram of the first context information obtained according to the configuration using the method for obtaining context information provided in an embodiment of the present application.
  • the computing device first analyzes the location of the current generation point, and then classifies the content in the source code file where the first method is located, and performs the corresponding processing methods in the above steps 407 to 410 on the corresponding content according to the type of content.
  • the processing methods include: directly using some content in the source code file as the first context information, using the file-outside context expansion, removing the target information, abstracting to the interface declaration form and sorting for some content in the source code file, and removing the target information of the expanded content, and using the file-inside code reorganization, removing the target information and abstracting to the interface declaration form for some content in the source code file.
  • Step 411 When the preview indication is used to indicate previewing the first context information, display the first context information.
  • the computing device may display the first context information to the user, so that the user can view the first context information.
  • the first context information used to generate the first executable code is the first context information acquired in step 407.
  • the first context information used to generate the first executable code is the updated first context information.
  • Figure 18 is a schematic diagram of a preview interface for the first context information provided by an embodiment of the present application.
  • the preview interface can be displayed on the right side of the front-end interface shown in Figure 5.
  • the first context information obtained through the above process includes: (1) the createServer() method declaration in the util.Helper class; (2) the Java standard library imported by the current file; (3) the member variable ip declared by the current class Server; (4) the method start() defined after the generation point of the current class Server.
  • These context information will be attached before the comment of the init() method and input into the code generation model as request content, making it easier for the code generation model to generate code implementations that are relevant to the current project context and reuse existing packages as much as possible, thereby better meeting user expectations.
  • Step 412 Receive an approval indication indicating approval of the first context information.
  • the user can review the first context information.
  • the user can perform a specified operation to trigger a consent indication to inform the computing device that the user agrees to the first context information.
  • the computing device will receive the consent indication.
  • an accept button is set in the upper left corner of the preview interface.
  • the user can click the consent button to trigger the consent indication to inform the computing device that the user agrees to the first context information.
  • a refresh button is set in the upper right corner of the preview interface. When the user believes that the first context information does not meet its application requirements, the user can click the refresh button to trigger the computing device to re-acquire the first context information.
  • Step 413 After receiving the consent indication, generate a first executable code based on the first context information and the code generation request.
  • the computing device may generate the first executable code based on the first context information and the code generation request.
  • the code generation request carries a task description, and the computing device generates the first executable code based on the first context information and the code generation request, mainly based on the first context information and the task description.
  • the implementation process of step 413 may include: inputting the first context information and the code generation request into the code generation model to obtain the first executable code output by the code generation model.
  • the code generation model can be a pre-trained model.
  • the code generation model used in the embodiment of the present application can be a pre-trained language model (PLM), such as a generative pre-trained transformer (GPT) model.
  • PLM pre-trained language model
  • GPT generative pre-trained transformer
  • a context embedding layer is introduced into the generative pre-trained decoder model, which is used to assign values to different types of content in the input data, such as assigning 0 to the content belonging to the context, 1 to the code comment content, and 2 to the code snippet, so that the generative pre-trained decoder model can distinguish the characteristics of different types of content.
  • the preview indication can also indicate whether to preview the generated first executable code.
  • the computing device can display the first executable code to the user so that the user can view the first executable code.
  • the user can review the first executable code.
  • the user can perform a specified operation to trigger a consent indication to inform the computing device that the user agrees to the first executable code. Accordingly, after the user triggers the consent indication, the computing device will receive the consent indication and can insert the first executable code into the generation point.
  • Figure 20 is a schematic diagram of a preview interface of a first executable code provided in an embodiment of the present application.
  • an accept button is provided in the upper left corner of the preview interface.
  • the accept button can be clicked to trigger an consent indication to inform the computing device that the user agrees to the first executable code.
  • the computing device can insert the first executable code into the source code file to which the first method belongs at the generation point.
  • the source code after the first executable code is inserted is shown in Figure 21, and the executable code in the dotted box in Figure 21 is the first executable code.
  • the first context information required for generating the first executable code can be obtained from the information of the programming project based on the code generation request; then, the first executable code is generated based on the first context information and the code generation request.
  • the first context information includes project-level context information from the entire programming project, which can reflect the logical structure of the code and the programming project, so that the code generation process can make more use of the actual It can also improve the background knowledge required by human developers in real-world programming and the overall logic of programming projects, thereby improving the ability to generate code and helping to improve the actual usage experience of code generation technology.
  • the process of obtaining the first context information can be independent of the specific code generation process, that is, the process of obtaining the first context information can be decoupled from the process of generating the first executable code based on the first context information. Therefore, the ability to obtain context information is universal among different code generation technologies and can be integrated by different tools to enhance the user experience of different code generation technologies.
  • the process of obtaining context information can be used not only to generate executable code for the implementation method, but also to generate code for other units such as line-level code completion, and can be used not only for object-oriented language code generation, but also for process-oriented language code generation.
  • the implementation forms of the cloud service-based code generation method provided in the embodiment of the present application can also be multiple.
  • client code editors such as VSCode
  • integrated development environments such as JetBrains series
  • code generation plug-in tools such as Copilot, Tabnine, AiXCoder
  • auxiliary coding functions of cloud code editors such as SourceGraph
  • development environments such as GitHub Codespace
  • the method includes the following steps:
  • Step 2201 Obtain a second executable code for implementing a second method in a successfully compiled programming project.
  • obtaining a second executable code for implementing the second method in a successfully compiled programming project includes:
  • Step 22011 obtain all second methods in the programming project that has been successfully compiled.
  • methods are usually represented by method bodies. Therefore, when obtaining the second method in a programming project, all method bodies can be determined in the programming project first to obtain all second methods represented by method bodies.
  • the successfully compiled programming project can come from the accumulation of historical development processes, or from some public data sources, for example, from an open source software code repository (such as GitHub).
  • a logical structure diagram of the successfully compiled programming project can be generated, and the leaf nodes of the logical structure diagram are used to represent the methods in the programming project, and then the methods represented by all the leaf nodes of the logical structure diagram are determined as the second method.
  • the implementation process of generating the logical structure diagram please refer to the implementation process of step 405, which will not be repeated here.
  • Step 22012 Filter all second methods to obtain the second methods that pass the filter.
  • the second methods that pass the filter are used to express the operation logic.
  • the second methods can be screened to obtain the second methods that meet the standards. Since only methods related to business implementation are helpful in generating code, the second methods can be screened according to whether the second methods express business logic with practical significance. Only methods that express specific operation logic can express business logic with practical significance. Therefore, when screening, the second methods used to express operation logic can be screened, and the second methods that do not express operation logic can be screened.
  • the second method that fails to pass the screening may have one or more of the following characteristics: the method body of the second method is empty, the second method has a special purpose, and the method body of the second method does not include an operation expression.
  • the special purpose may include one or more of the following: getting, setting, constructing, and returning. For example, taking Java language code as an example, the getter method is used to get, the setter method is used to set, the toString method is used to convert an object into a string and return the result, and the hashCode method is used to return the hash value of the object.
  • Step 22013 obtain the second executable code that implements the second method that passes the screening.
  • the second executable code of the second method that passes the screening can be obtained, so as to train the model to be trained using the second executable code.
  • the code in the method body of the second method that passes the screening can be determined as the second executable code for implementing the second method.
  • Step 2202 Obtain second context information of the second executable code.
  • the implementation process of obtaining the second context information of the second executable code can refer to the implementation process of obtaining the first context information required to generate the first executable code in the aforementioned description, so as to align the training task and the reasoning task of the model.
  • the scope of permissions that the second method has access to can also be obtained in advance, and then the second context information can be obtained within the scope of permissions.
  • the second context information can also be updated according to some strategies, such as removing the target information in the second context information, abstracting the second context information into an interface declaration form, and sorting multiple contents based on the relevance of multiple contents in the second context information to the second method.
  • the implementation process of the above related processes for obtaining the second context information The relevant description of obtaining the first context information can be referred to above.
  • the second context information of the second executable code can be obtained by analyzing the logical structure diagram.
  • the implementation process can also refer to the implementation process of obtaining the first context information based on the logical structure diagram above, which will not be repeated here.
  • Step 2203 Use the second context information as the input of the model to be trained, use the second executable code as the expected output of the model to be trained, train the model to be trained, and obtain a pre-trained model.
  • the second context information can be used as the input of the model to be trained, and the second executable code can be used as the expected output of the model to be trained, and the model to be trained can be trained to obtain a pre-trained model.
  • the second context information and the second executable code of the same second method can be formed into a piece of training data.
  • the input of the model to be trained can also include the method annotation.
  • the input of the model to be trained can also include the method signature.
  • the method signature is used to indicate how the method is used.
  • the pre-trained model used in the embodiment of the present application can be a generative pre-trained decoder model.
  • the pre-trained model used in the embodiment of the present application can be a generative pre-trained decoder model.
  • different types of context information can be spliced according to the source of the context information, in the order of project-level context, file-level context, class-level context, method annotation, and method code snippet, and different types of context information are marked with placeholders (such as ⁇ context>, ⁇ comment>, ⁇ java>, etc.), so as to formulate different loss update strategies for different types of information during training.
  • the placeholder can be located at the starting position of the context information of the corresponding type.
  • the context information marked with placeholders is shown in Figure 24.
  • the embodiments of the present application may adopt a pretrain-finetune paradigm for model training and optimization.
  • the model to be trained may learn the grammar and patterns of the language through unsupervised training on a large amount of corpus.
  • the model to be trained needs to process data in a targeted manner according to the downstream task objectives and perform targeted optimization through supervised learning.
  • the causal language modeling (CLM) method can be adopted to train the model by the task of predicting the next word (next token prediction) based on the existing words, but only the loss value of the target code part to be predicted (corresponding to the part after ⁇ java> in Figure 24) is calculated, and then the loss of this part is used to update the model weight, so as to specifically optimize the model's prediction ability for the code implementation part under the premise of known project-level context information and the current method function description.
  • the computing device can display the second context information according to the preview indication, and after receiving a consent indication for the second context information, use the second context information for input to the model to be trained.
  • the training data is obtained from the programming project, so that the training data is programming project-level data, which can take into account the logical structure of the code and the programming project, so that the training process can make more use of the background knowledge required by human developers in actual programming.
  • the pre-trained model trained by this training method is used to generate code, the ability of the pre-trained model to generate code can be improved, which helps to improve the actual use experience of the pre-trained model to generate code.
  • the training method aligns the training task and the reasoning task of the model, it can improve the trained model's ability to perceive and utilize context. Using the trained pre-trained model to generate code can make greater use of the model's performance, which can further improve the ability to generate code and improve the actual use experience of code generation technology.
  • the training method is positioned at a common training data processing scheme and format between models. Therefore, the training method can be applied to pre-training code generation models from random initialization, and can also be applied to targeted tuning training of multiple existing code generation models (such as Microsoft's Codex model, Salesforce's CodeGen model, Meta's InCoder model, etc.).
  • the training method provided in the embodiment of the present application can be applied to training at different stages, for example, direct pre-training (pretrain), multi-stage pre-training (multi-stage pretrain) and fine tuning (finetune).
  • direct pre-training is to start training directly from a randomly initialized model.
  • Multi-stage pre-training is based on a certain pre-trained model, and the model is continued to be trained after changing the data and modifying the hyperparameters.
  • Fine tuning is based on a certain pre-trained model, by fixing certain neural network layers or adding parameters about prompts (prompt), and targeted training is performed on a certain data set or downstream task.
  • the operation of obtaining the first context information can be implemented by a parser, an integrated development environment or a model.
  • the operation of obtaining the first context information and the operation of generating the first executable code can be implemented by the same model, or can be implemented by two models respectively.
  • the following takes the example of obtaining the first context information from the programming project by the first AI model and generating the first executable code based on the first context information by the second AI model as an example to illustrate the implementation process of the embodiment of the present application.
  • the implementation process includes a training phase and a reasoning phase. The training phase is used to train the second AI model.
  • the reasoning phase is used to obtain the first context information from the programming project using the first AI model, and then generate the first executable code based on the first context information using the second AI model.
  • the operation of obtaining the second context information from the successfully compiled programming project in the training phase can be performed by the first AI model, or it can be implemented by an AI model with similar functions to the first AI model.
  • the first AI model can be an existing model such as the Code BERT model, or it can be a self-trained model, and the training method can be a self-encoding training method.
  • the second AI model can be a generative pre-trained decoder model, etc. The following is an explanation using the first AI model as an example.
  • the computing device can obtain a successfully compiled programming project from the software code warehouse, obtain a source code file data set in the successfully compiled programming project, and perform preprocessing such as extraction, screening, and deduplication on the content of the programming project in the process. Then, the source code file data set is input into the first AI model, so that the first AI model executes steps 2201 and 2202 in the training method provided in the embodiment of the present application to obtain training data for training the second AI model to be trained. Then, the computing device executes the above step 2203, inputs the training data into the second AI model to be trained, and saves the model after the training is completed, thereby obtaining a trained second AI model.
  • the computing device when the computing device receives a code generation request from the current programming project, on the one hand, the relevant files of the current programming project can be input into the first AI model, so that the first AI model executes steps 405 to 410 in the code generation method based on cloud services provided in the embodiment of the present application, and uses the first AI model to obtain the first context information from the relevant files of the current programming project.
  • the computing device receives the first context information output by the first AI model
  • the first context information is spliced with the task description of the code generation request, and the spliced first context information and task description are input into the second AI model, so that the second AI model executes step 413 in the code generation method based on cloud services provided in the embodiment of the present application to generate the first executable code for implementing the first method.
  • the computing device receives the first executable code output by the second AI model, it recommends the first executable code to the developer.
  • FIG. 26 is a structural schematic diagram of a cloud service-based code generation device provided by an embodiment of the present application. Based on the following multiple modules shown in Figure 26, the cloud service-based code generation device shown in Figure 26 can perform all or part of the operations shown in Figure 4 above. It should be understood that the device may include more additional modules than the modules shown or omit some of the modules shown therein, and the embodiment of the present application does not limit this.
  • the cloud service-based code generation device can be configured on a cloud platform. As shown in Figure 26, the cloud service-based code generation device 260 includes:
  • the receiving module 2601 is used to receive a code generation request, where the code generation request is used to request generation of a first executable code for implementing a first method in a programming project.
  • the first acquisition module 2602 is used to acquire first context information required to generate a first executable code from information of a programming project based on a code generation request.
  • the generating module 2603 is used to generate a first executable code based on the first context information and the code generation request.
  • the receiving module 2601 is further configured to receive a range indication, where the range indication is used to indicate a range for obtaining the first context information.
  • the receiving module 2601 is further used to receive a preview indication, where the preview indication is used to indicate whether to preview the first context information.
  • the cloud service-based code generation device 260 further includes: a display module 2604, configured to display the first context information when the preview indication is used to indicate a preview of the first context information.
  • the receiving module 2601 is further configured to receive an approval indication indicating approval of the first context information.
  • the generation module 2603 is specifically configured to: after receiving the consent indication, generate a first executable code based on the first context information and the code generation request.
  • the first acquisition module 2602 is specifically used to: based on the code generation request, acquire the source code file to which the first method belongs in the files of the programming project; acquire external information outside the source code file used by the source code file in the information of the programming project; The external information is supplemented in the source code file; and the first context information is obtained based on the supplemented source code file.
  • the first acquisition module 2602 is specifically used to: obtain, from the information of the programming project, a permission scope that the first method has permission to access; and obtain external information from the permission scope.
  • the first acquisition module 2602 is specifically used to obtain the permission scope based on the position of the first method in the programming project, the access control permission of the target class to which the first method belongs, and at least one of the hierarchy and reference relationship of the target class.
  • the first acquisition module 2602 is further used to: remove target information from the first context information to obtain updated first context information, where the target information includes one or more of the following: code comments, variable assignments, method bodies, and information indicating the underlying logic of the code.
  • the receiving module 2601 is further used to receive a removal instruction, where the removal instruction is used to indicate whether to remove the target information in the first context information.
  • the first acquisition module 2602 is specifically used to: based on the code generation request, obtain the source code file to which the first method belongs in the file of the programming project. In the source code file, obtain the context information whose writing position is after the first method. Adjust the writing position of the context information to before the first method, so that the adjusted context information becomes the context information of the first method. Based on the context information of the first method, obtain the first context information.
  • the identifier includes one or more of the following: variable name, method name, package name, class name, and constant name.
  • the first acquisition module 2602 is further used to: generate a logical structure diagram of the programming project, where the logical structure diagram is used to indicate the relationship between various contents in the programming project.
  • the first acquisition module 2602 is specifically used to: analyze the logic structure diagram based on the code generation request, and acquire the first context information required to generate the first executable code from the information of the programming project.
  • the generation module 2603 is specifically used to: input the first context information and the code generation request into the pre-training model to obtain the first executable code output by the pre-training model.
  • the training module 2607 is used to use the second context information as the input of the model to be trained and the second executable code as the expected output of the model to be trained, to train the model to be trained and obtain a pre-trained model.
  • the second method that fails the screening has one or more of the following characteristics: the method body of the second method is empty, the second method has a special purpose, and the method body of the second method does not include an operation expression.
  • special purposes include one or more of the following: get, set, construct, and return.
  • the context information includes one or more of the following: functions, access rights, and calling methods of defined classes, variables, and methods.
  • the first context information required for generating the first executable code can be obtained from the information of the programming project based on the code generation request; then, the first executable code is generated based on the first context information and the code generation request.
  • the first context information includes project-level context information from the entire programming project, which can reflect the logical structure of the code and the programming project, so that the code generation process can make more use of the background knowledge required by human developers in actual programming and the overall logic of the programming project, thereby improving the ability to generate code and helping to improve the actual use experience of the code generation technology.
  • the receiving module 2601 may include code running on a computing instance.
  • the computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Further, the above-mentioned computing instance may be one or more.
  • the receiving module 2601 may include code running on multiple hosts/virtual machines/containers. It should be noted that the multiple hosts/virtual machines/containers used to run the code may be distributed in the same region (region) or in different regions.
  • VPC virtual private cloud
  • multiple hosts/virtual machines/containers used to run the code can be distributed in the same virtual private cloud (VPC) or in multiple VPCs.
  • VPC virtual private cloud
  • a VPC is set up in a region.
  • a communication gateway needs to be set up in each VPC to achieve interconnection between VPCs through the communication gateway.
  • the receiving module 2601 may include at least one computing device, such as a server, etc.
  • the receiving module 2601 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL generic array logic
  • the multiple computing devices included in the receiving module 2601 can be distributed in the same region or in different regions.
  • the multiple computing devices included in the receiving module 2601 can be distributed in the same AZ or in different AZs.
  • the multiple computing devices included in the receiving module 2601 can be distributed in the same VPC or in multiple VPCs.
  • the multiple computing devices can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
  • all or part of the embodiments may be implemented by software, hardware, firmware, or any combination thereof.
  • all or part of the embodiments may be implemented in the form of a computer program product.
  • the computer program product providing the program development platform includes one or more computer instructions, and when these computer program instructions are loaded and executed on a computing device, all or part of the functions of the cloud service-based code generation method provided in the embodiments of the present application are implemented.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, a computer, a server or a data center via a wired (e.g., The computer program instructions for providing a program development platform are stored in the computer readable storage medium.
  • the embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device can be a server, such as a central server, an edge server, or a local server in a local data center.
  • the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smart phone.
  • the memory 102 of one or more computing devices 100 in the computing device cluster may also store partial instructions for executing the code generation method based on cloud services.
  • the combination of one or more computing devices 100 can jointly execute instructions for executing the code generation method based on cloud services.
  • the memory 102 in different computing devices 100 in the computing device cluster may store different instructions, which are respectively used to execute part of the functions of the code generation apparatus based on cloud services. That is, the instructions stored in the memory 102 in different computing devices 100 may implement the functions of one or more modules of the receiving module 2601, the first acquisition module 2602, the generating module 2603, the display module 2604, the second acquisition module 2605, the third acquisition module 2606 and the training module 2607.
  • one or more computing devices in the computing device cluster can be connected via a network.
  • the network can be a wide area network or a local area network, etc.
  • Figure 28 shows a possible implementation.
  • two computing devices 2800A and 2800B are connected via a network.
  • the network is connected through a communication interface in each computing device.
  • computing devices 2800A and 2800B include a bus 2802, a processor 2804, a memory 2806, and a communication interface 2808.
  • the memory 2806 in the computing device 2800A there are stored instructions for executing the functions of the receiving module 2601, the first acquisition module 2602, the generating module 2603, and the display module 2604.
  • the memory 2806 in the computing device 2800B is stored with instructions for executing the functions of the second acquisition module 2605, the third acquisition module 2606, and the training module 2607.
  • the present application also provides a computer program product including instructions.
  • the computer program product may be a software or program product including instructions that can be run on a computing device or stored in any available medium.
  • the computer program product is run on at least one computing device, the computer implements the cloud service-based code generation method provided in the present application.
  • An embodiment of the present application also provides a chip, including a processor, for calling and executing instructions stored in the memory from the memory, so that a computing device equipped with the chip executes the cloud service-based code generation method provided in the embodiment of the present application.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • Computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • computer instructions can be transmitted from one website site, computer, server or data center to another website site, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium can Any available medium that can be accessed by a computer or a data storage device such as a server or data center that contains one or more available media. Available media can be magnetic media (e.g., floppy disks, hard disks, tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid state drives).
  • the computer program code for realizing the method for the embodiment of the present application can be written in one or more programming languages. These computer program codes can be provided to the processor of a general-purpose computer, a special-purpose computer or other programmable rule-finding device, so that the program code, when executed by a computer or other programmable rule-finding device, causes the function/operation specified in the flow chart and/or block diagram to be implemented.
  • the program code can be executed completely on a computer, partially on a computer, as an independent software package, partially on a computer and partially on a remote computer or completely on a remote computer or server.
  • computer program codes or related data may be carried by any appropriate carrier to enable a device, apparatus or processor to perform the various processes and operations described above.
  • carriers include signals, computer readable media, etc.
  • signals may include electrical, optical, radio, acoustic or other forms of propagation signals, such as carrier waves, infrared signals, etc.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the module is only a logical function division. There may be other division methods in actual implementation, such as multiple modules or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or modules, or it can be an electrical, mechanical or other form of connection.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place or distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the embodiments of the present application.
  • each functional module in each embodiment of the present application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or software functional modules.
  • first, second, etc. are used to distinguish between identical or similar items with substantially the same effects and functions. It should be understood that there is no logical or temporal dependency between “first”, “second”, and “nth”, nor is the quantity and execution order limited. It should also be understood that although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, without departing from the scope of various examples, a first link may be referred to as a second link, and similarly, a second link may be referred to as a first link.
  • the size of the serial number of each process does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
  • determining B based on A does not mean determining B only based on A.
  • B can also be determined based on A and/or other information.
  • references to “one embodiment”, “an embodiment”, or “a possible implementation” throughout the specification mean that specific features, structures, or characteristics related to the embodiment or implementation are included in at least one embodiment of the present application. Therefore, the references to “in one embodiment” or “in an embodiment”, or “a possible implementation” throughout the specification do not necessarily refer to the same embodiment. In addition, these specific features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
  • the information including but not limited to user device information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • signals involved in this application are all authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with relevant laws, regulations and standards of relevant countries and regions.
  • the information and instructions involved in this application are all obtained with full authorization.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

本申请公开了一种基于云服务的代码生成方法及装置,属于软件开发技术领域。该方法包括:接收代码生成请求,代码生成请求用于请求生成编程项目中实现第一方法的第一可执行代码;基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息;基于第一上下文信息和代码生成请求,生成第一可执行代码。本申请在生成代码时能够考虑项目级的上下文,有助于提高生成代码的生成效果。

Description

基于云服务的代码生成方法及装置
本申请要求于2022年11月15日提交的申请号为202211430214.3、发明名称为“代码生成方法及装置”的中国专利申请的优先权,以及于2022年12月13日提交的申请号为202211600744.8、发明名称为“基于云服务的代码生成方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及软件开发技术领域,特别涉及一种基于云服务的代码生成方法及装置。
背景技术
代码生成(code generation)或程序合成(program synthesis)技术,一直是软件工程(software engineering,SE)和人工智能(artificial intelligence,AI)领域学术研究中的热点,并且因其巨大的商业价值而备受工业界关注。近两年来,得益于人工智能研究在(natural language processing,NLP)和程序语言处理(programming language processing,PLP)方面取得的成果,两个领域技术的结合将代码生成相关技术从学术研究逐步推向实际应用。就目前来看,基于人工智能的代码生成工具作为提高软件开发效率的辅助工具,已经成为近期备受瞩目的人工智能落地应用之一。
目前的代码生成技术主要针对行级代码的代码补全(code completion)和生成(code generation),与发展多年的编程环境和工具相比,其仍处于起步阶段,其技术和产品形态仍需要在实践中持续改进。
发明内容
本申请提供了一种基于云服务的代码生成方法及装置。本申请在生成代码时能够考虑项目级的上下文,有助于提高生成代码的生成效果。本申请提供的技术方案如下:
第一方面,本申请提供了一种基于云服务的代码生成方法。该方法可以由云平台执行。该方法包括:接收代码生成请求,代码生成请求用于请求生成编程项目中实现第一方法的第一可执行代码;基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息;基于第一上下文信息和代码生成请求,生成第一可执行代码。
在本申请提供的基于云服务的代码生成方法中,在接收代码生成请求后,能够基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息;然后,基于第一上下文信息和代码生成请求,生成第一可执行代码。在该基于云服务的代码生成方法中,由于需要先从编程项目的信息中获取生成第一可执行代码所需的第一上下文信息,该第一上下文信息包括来源于整个编程项目的项目级上下文信息,其能够反映代码和编程项目的逻辑结构,使得代码生成过程能够更多地利用实际编程中人类开发者所需的背景知识和编程项目的整体逻辑,因此提高了生成代码的能力,有助于改善代码生成技术的实际使用体验。
在本申请中,计算设备可以根据待生成的第一方法的第一上下文信息,生成第一方法的可执行代码。并且,根据前面描述可知,编程项目涉及的层级单元从大到小依次为:编程项目、代码模块、代码包、类和方法,则根据不同的应用需求,第一上下文信息可以在不同层级单元中获取。因此,用户可以通过客户端向计算设备发送范围指示,以指示第一上下文信息的获取范围。则该方法还包括:接收范围指示,范围指示用于指示第一上下文信息的获取范围。相应的,基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息,包括:当范围指示用于指示第一上下文信息的获取范围为编程项目时,基于代码生成请求,从编程项目的信息中,获取第一上下文信息。
当范围指示用于指示第一上下文信息的获取范围为编程项目时,计算设备能够在整个编程项目中获取第一上下文信息。这样一来,计算设备就能够在生成第一可执行代码时,考虑代码和编程项目的逻辑结构,有助于提高代码生成能力。
用户可以根据需求指示是否查看获取的第一上下文信息,在用户需要或者不需要查看获取的第一上下文信息时,其可以执行对应的操作以触发预览指示,通过该预览指示向计算设备传达其需求。则该方法还 包括:接收预览指示,预览指示用于指示是否预览第一上下文信息;当预览指示用于指示预览第一上下文信息时,显示第一上下文信息;接收指示同意第一上下文信息的同意指示。相应的,基于第一上下文信息和代码生成请求,生成第一可执行代码,包括:在接收到同意指示后,基于第一上下文信息和代码生成请求,生成第一可执行代码。
用于实现方法的可执行代码通常保存在源代码文件中,那么第一上下文信息可能包括该第一方法所属的源代码文件中的内容,也可能包括该第一方法所属的源代码文件外的内容,则在获取第一上下文信息时,可以分别从第一方法所属的源代码文件中和源代码文件外获取第一上下文信息。在一种实现方式中,获取生成第一可执行代码所需的第一上下文信息,包括:执行文件外上下文展开和文件内代码重组等处理,根据处理结果获取第一上下文信息。
在一种实现方式中,执行文件内代码重组,并根据重组结果获取第一上下文信息的实现过程包括:基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息,包括:基于代码生成请求,在编程项目的文件中,获取第一方法所属的源代码文件;在源代码文件中,获取撰写位置位于第一方法后的下文信息;将下文信息的撰写位置调整至第一方法之前,使得调整位置后的下文信息成为第一方法的上文信息;基于第一方法的上文信息,获取第一上下文信息。
在一种实现方式中,通过文件外上下文展开,并根据展开结果获取第一上下文信息的实现过程包括:基于代码生成请求,在编程项目的文件中,获取第一方法所属的源代码文件;在编程项目的信息中,获取源代码文件使用到的位于源代码文件外的外部信息;在源代码文件中补充外部信息;基于经过补充的源代码文件,获取第一上下文信息。
可选的,该方法还可以包括:在编程项目的信息中,获取第一方法有权限访问的权限范围。相应的,在编程项目的信息中,获取源代码文件使用到的位于源代码文件外的外部信息,包括:在权限范围中,获取外部信息。通过在第一方法有权限访问的权限范围内获取外部信息,能够避免向该第一方法中引入第一方法没有权限访问的内容,保证获取的第一上下文信息的效率和有效性。
在一种实现方式中,获取第一方法有权限访问的权限范围,包括:基于第一方法在编程项目中的位置、第一方法所属的目标类的访问控制权限、及目标类的层次和引用关系中的至少一个,获取权限范围。其中,第一方法在编程项目中的位置可以通过生成点表示。目标类的层次表示目标类在编程项目的层级单元中的层次。目标类的引用关系表示目标类引用的内容。目标类的访问控制权限可以由开发人员设置,用于限定目标类中可执行代码的访问控制权限。
可选的,在基于第一上下文信息和代码生成请求,生成第一可执行代码之前,方法还包括:在第一上下文信息中移除目标信息,得到更新后的第一上下文信息,目标信息包括以下一种或多种:代码注释、变量赋值、方法体和指示代码底层逻辑的信息。
这样一来,可以删除第一上下文信息中与生成第一可执行代码无关的信息、涉及隐私的信息和/或敏感信息,从而在第一上下文中保留体现第一上下文中内容的层次结构和签名信息的相关内容。一方面能够达到压缩第一上下文信息的目的,使得在同样的输入长度下能够输入包含更多有价值内容的上下文信息,另一方面还能够保证生成的代码的可用性、用户隐私和代码的安全性。
在一种可实现方式中,用户可以根据应用需求,确定是否需要移除第一上下文信息中的目标信息,则该方法还包括:接收移除指示,移除指示用于指示是否移除第一上下文信息中的目标信息。相应的,在第一上下文信息中移除目标信息,得到更新后的第一上下文信息,包括:当移除指示用于指示移除第一上下文信息中的目标信息时,在第一上下文信息中移除目标信息,得到更新后的第一上下文信息。
在一种实现方式中,该移除指示可以指示需要执行移除操作,计算设备接收到指示执行移除操作的移除指示时,可以根据预设策略进行信息移除,而无需用户指示需要移除的目标信息的具体类型,即无需指示目标信息为调用方法的实现细节、变量赋值还是代码中包含的敏感信息。在另一种实现方式中,移除指示不仅可以指示需要执行移除操作,还可以指示需要移除的目标信息的具体类型。
当使用预训练模型生成第一可执行代码时,在基于第一上下文信息和代码生成请求,生成第一可执行代码之前,还可以对第一上下文信息进行抽象化处理,将第一上下文信息抽象为符合语法的接口声明形式,以达到复用预训练模型在训练过程中学习到的编程语言语法知识的目的。则在基于第一上下文信息和代码生成请求,生成第一可执行代码之前,该方法还包括:将第一上下文信息抽象为接口声明形式,得到更新后的第一上下文信息。
当使用代码生成模型生成第一可执行代码时,由于代码生成模型的输入长度有限制,因此在获取第一上下文信息后,还可以对第一上下文信息中的多个内容进行排序,以便于按照排序后的先后顺序向代码生成模型输入第一上下文信息中的多个内容,从而降低与第一方法具有较大的相关性的内容在输入代码生成模型时,因输入长度限制被截断的概率,以便于具有较高重要性的上下文信息能够输入代码生成模型。则在基于第一上下文信息和代码生成请求,生成第一可执行代码之前,该方法还包括:获取第一上下文信息中多个内容与第一方法的相关性;基于多个内容对应的相关性,对多个内容进行排序,得到更新后的第一上下文信息。
在一种实现方式中,基于多个内容对应的相关性,对多个内容进行排序,包括:将多个内容排列在第一方法的任务描述之前,且任一内容到任务描述的距离与内容对应的相关性反相关。
可选的,获取第一上下文信息中多个内容与第一方法的相关性的实现方式有多种,下面以以下几种实现方式为例对其进行说明。并且,其实现方式可以包括以下一种或多种的组合:获取每个内容的标识符与第一方法的相关信息的第一相似度,第一相似度与相关性正相关;获取每个内容与第一方法在编程项目中层次的距离,距离与相关性反相关;获取每个内容与第一方法的关联内容调用的上下文信息的第二相似度,第二相似度与相关性正相关,第一方法的关联内容包括以下一种或多种:第一方法所属的目标类的关联类,目标类中其它方法。
其中,标识符包括以下一种或多种:变量名、方法名、包名、类名和常量名。相关信息包括以下一种或多种:方法描述、方法名、返回类型和参数类型。
在一种实现方式中,在基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息之前,还方法还包括:生成编程项目的逻辑结构图,逻辑结构图用于指示编程项目中各项内容的关联关系。相应的,基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息,包括:基于代码生成请求对逻辑结构图进行分析,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息。
可选的,基于第一上下文信息和代码生成请求,生成第一可执行代码,包括:将第一上下文信息和代码生成请求输入预训练模型,得到预训练模型输出的第一可执行代码。
则在基于第一上下文信息和代码生成请求,生成第一可执行代码之前,该方法还包括:获取已成功编译的编程项目中实现第二方法的第二可执行代码;获取第二可执行代码的第二上下文信息;将第二上下文信息作为待训练模型的输入,将第二可执行代码作为待训练模型的期望输出,对待训练模型进行训练,得到预训练模型。
可选地,当第二方法有方法注释时,待训练模型的输入还可以包括方法注释。类似地,当第二方法有方法签名时,待训练模型的输入还可以包括方法签名。其中,方法签名用于指示方法的使用方式。
在一种实现方式中,获取已成功编译的编程项目中实现第二方法的第二可执行代码,包括:获取已成功编译的编程项目中的所有第二方法;对所有第二方法进行筛选,得到通过筛选的第二方法,通过筛选的第二方法用于表达运算逻辑;获取实现通过筛选的第二方法的第二可执行代码。
其中,未通过筛选的第二方法具有以下一种或多种特点:第二方法的方法体为空,第二方法具有特殊用途,第二方法的方法体不包括运算表达式。例如,特殊用途包括以下一种或多种:获取、设置、构造和返回。
可选地,上下文信息为执行任务所需要的相关信息。在一种实现方式中,上下文信息包括以下一种或多种:已定义的类、变量和方法的功能、访问权限以及调用方式。通过获取第一上下文信息,并基于第一上下文信息生成第一可执行代码,能够复用编程项目中已定义的类、变量和方法的功能、访问权限以及调用方式,简化了生成第一可执行代码的复杂度,有助于提高代码生成能力。
第二方面,本申请提供了一种基于云服务的代码生成装置,配置于云平台,该基于云服务的代码生成装置包括:接收模块,用于接收代码生成请求,代码生成请求用于请求生成编程项目中实现第一方法的第一可执行代码;第一获取模块,用于基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息;生成模块,用于基于第一上下文信息和代码生成请求,生成第一可执行代码。
可选的,接收模块,还用于接收范围指示,范围指示用于指示第一上下文信息的获取范围;第一获取模块,具体用于:当范围指示用于指示第一上下文信息的获取范围为编程项目时,基于代码生成请求,从编程项目的信息中,获取第一上下文信息。
可选的,接收模块,还用于接收预览指示,预览指示用于指示是否预览第一上下文信息;该基于云服务的代码生成装置还包括:显示模块,用于当预览指示用于指示预览第一上下文信息时,显示第一上下文信息;接收模块,还用于接收指示同意第一上下文信息的同意指示;生成模块,具体用于:在接收到同意指示后,基于第一上下文信息和代码生成请求,生成第一可执行代码。
可选的,第一获取模块,具体用于:基于代码生成请求,在编程项目的文件中,获取第一方法所属的源代码文件;在编程项目的信息中,获取源代码文件使用到的位于源代码文件外的外部信息;在源代码文件中补充外部信息;基于经过补充的源代码文件,获取第一上下文信息。
可选的,第一获取模块,具体用于:在编程项目的信息中,获取第一方法有权限访问的权限范围;在权限范围中,获取外部信息。
可选的,第一获取模块,具体用于:基于第一方法在编程项目中的位置、第一方法所属的目标类的访问控制权限、及目标类的层次和引用关系中的至少一个,获取权限范围。
可选的,第一获取模块,还用于:在第一上下文信息中移除目标信息,得到更新后的第一上下文信息,目标信息包括以下一种或多种:代码注释、变量赋值、方法体和指示代码底层逻辑的信息。
可选的,接收模块,还用于接收移除指示,移除指示用于指示是否移除第一上下文信息中的目标信息;第一获取模块,具体用于:当移除指示用于指示移除第一上下文信息中的目标信息时,在第一上下文信息中移除目标信息,得到更新后的第一上下文信息。
可选的,第一获取模块,还用于:将第一上下文信息抽象为接口声明形式,得到更新后的第一上下文信息。
可选的,第一获取模块,具体用于:基于代码生成请求,在编程项目的文件中,获取第一方法所属的源代码文件;在源代码文件中,获取撰写位置位于第一方法后的下文信息;将下文信息的撰写位置调整至第一方法之前,使得调整位置后的下文信息成为第一方法的上文信息;基于第一方法的上文信息,获取第一上下文信息。
可选的,第一获取模块,还用于:获取第一上下文信息中多个内容与第一方法的相关性;基于多个内容对应的相关性,对多个内容进行排序,得到更新后的第一上下文信息。
可选的,第一获取模块,具体用于:将多个内容排列在第一方法的任务描述之前,且任一内容到任务描述的距离与内容对应的相关性反相关。
可选的,获取第一上下文信息中多个内容与第一方法的相关性,包括以下一种或多种的组合:获取每个内容的标识符与第一方法的相关信息的第一相似度,第一相似度与相关性正相关;获取每个内容与第一方法在编程项目中层次的距离,距离与相关性反相关;获取每个内容与第一方法的关联内容调用的上下文信息的第二相似度,第二相似度与相关性正相关,第一方法的关联内容包括以下一种或多种:第一方法所属的目标类的关联类,目标类中其它方法。
可选的,标识符包括以下一种或多种:变量名、方法名、包名、类名和常量名。
可选的,相关信息包括以下一种或多种:方法描述、方法名、返回类型和参数类型。
可选的,第一获取模块,还用于:生成编程项目的逻辑结构图,逻辑结构图用于指示编程项目中各项内容的关联关系;第一获取模块,具体用于:基于代码生成请求对逻辑结构图进行分析,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息。
可选的,生成模块,具体用于:将第一上下文信息和代码生成请求输入预训练模型,得到预训练模型输出的第一可执行代码。
可选的,该基于云服务的代码生成装置还包括:第二获取模块,用于获取已成功编译的编程项目中实现第二方法的第二可执行代码;第三获取模块,用于获取第二可执行代码的第二上下文信息;训练模块,用于将第二上下文信息作为待训练模型的输入,将第二可执行代码作为待训练模型的期望输出,对待训练模型进行训练,得到预训练模型。
可选的,待训练模型的输入还包括以下一种或多种:第二方法的方法注释和方法签名。
可选的,第二获取模块,具体用于:获取已成功编译的编程项目中的所有第二方法;对所有第二方法进行筛选,得到通过筛选的第二方法,通过筛选的第二方法用于表达运算逻辑;
获取实现通过筛选的第二方法的第二可执行代码。
可选的,未通过筛选的第二方法具有以下一种或多种特点:第二方法的方法体为空,第二方法具有特 殊用途,第二方法的方法体不包括运算表达式。
可选的,特殊用途包括以下一种或多种:获取、设置、构造和返回。
可选的,上下文信息包括以下一种或多种:已定义的类、变量和方法的功能、访问权限以及调用方式。
第三方面,本申请提供了一种计算设备,包括存储器和处理器,存储器存储有程序指令,处理器运行程序指令以执行本申请第一方面以及其任一种可能的实现方式中提供的方法。
第四方面,本申请提供了一种计算设备集群,包括至少一个计算设备,每个计算设备包括处理器和存储器,至少一个计算设备的处理器用于执行至少一个计算设备的存储器中存储的指令,以使得计算设备集群执行本申请第一方面以及其任一种可能的实现方式中提供的方法。
第五方面,本申请提供了一种计算机可读存储介质,该计算机可读存储介质为非易失性计算机可读存储介质,该计算机可读存储介质包括程序指令,当程序指令在计算设备上运行时,使得计算设备执行本申请第一方面以及其任一种可能的实现方式中提供的方法。
第六方面,本申请提供了一种包含指令的计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行本申请第一方面以及其任一种可能的实现方式中提供的方法。
附图说明
图1是本申请实施例提供的一种基于云服务的代码生成方法涉及的实施场景的示意图;
图2是本申请实施例提供的另一种基于云服务的代码生成方法涉及的实施场景的示意图;
图3是本申请实施例提供的再一种基于云服务的代码生成方法涉及的实施场景的示意图;
图4是本申请实施例提供的一种基于云服务的代码生成方法的实现过程的示意图;
图5是本申请实施例提供的一种集成开发环境的前端界面的示意图;
图6是本申请实施例提供的一种界面示意图;
图7是本申请实施例提供的一种获取第一上下文信息的实现过程示意图;
图8是本申请实施例提供的一种逻辑结构图的示意图;
图9是本申请实施例提供的一种执行文件内代码重组,并根据重组结果获取第一上下文信息的实现过程的示意图;
图10是本申请实施例提供的一种源代码文件的示意图;
图11是本申请实施例提供的一种对图10所示的源代码文件的内容进行文件内代码分区后的示意图;
图12是本申请实施例提供的一种将图11中下文信息的撰写位置调整后的示意图;
图13是本申请实施例提供的一种通过文件外上下文展开,并根据展开结果获取第一上下文信息的实现过程的示意图;
图14是本申请实施例提供的一种在图12所示的源代码文件中补充外部信息后的示意图;
图15是本申请实施例提供的一种对图14所示的源代码文件得到的第一上下文信息执行移除目标信息、抽象为接口声明形式和排序后的示意图;
图16是本申请实施例提供的一种在集成开发环境中触发代码生成请求的示意图;
图17是本申请实施例提供的一种获取第一上下文信息的示意图;
图18是本申请实施例提供的一种第一上下文信息的预览界面示意图;
图19是本申请实施例提供的一种生成式预训练解码器模型的示意图;
图20是本申请实施例提供的一种第一可执行代码的预览界面示意图;
图21是本申请实施例提供的一种将该第一可执行代码在生成点处插入第一方法所属的源代码文件后的示意图;
图22是本申请实施例提供的一种训练方法的流程图;
图23是本申请实施例提供的一种获取已成功编译的编程项目中实现第二方法的第二可执行代码的流程图;
图24是本申请实施例提供的一种使用占位符标注的上下文信息的示意图;
图25是本申请实施例提供的一种基于云服务的代码生成方法的过程示意图;
图26是本申请实施例提供的一种基于云服务的代码生成装置的结构示意图;
图27是本申请实施例提供的另一种基于云服务的代码生成装置的结构示意图;
图28是本申请实施例提供的一种计算设备集群的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
目前的代码生成技术主要采用模型(如AI模型)生成代码。其场景主要聚焦于基于自然语言描述生成特定编程语言的代码,从而实现用户以自然语言描述所表达的需求。类比于开发者编写代码的过程,其工作过程类似于开发者先编写代码注释,再由模型生成对应于该注释所描述功能的代码片段。
代码生成技术确实能够显著降低开发者在实际编写代码与搜索知识、查阅文档、寻找可复用组件等活动之间频繁切换带来的成本,从而在一定程度上提高开发的效率。但是,由于当前代码生成技术生成代码的过程中主要采用的是自然语言处理技术,其未充分考虑代码和编程项目的逻辑结构,导致代码生成技术的生成能力较差,代码生成技术的实际使用体验较差。
本申请实施例提供了一种基于云服务的代码生成方法。该方法包括:接收代码生成请求,代码生成请求用于请求生成编程项目中实现第一方法的第一可执行代码;基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息;然后,基于第一上下文信息和代码生成请求,生成第一可执行代码。
在该基于云服务的代码生成方法中,由于需要先从编程项目的信息中获取生成第一可执行代码所需的第一上下文信息,该第一上下文信息包括来源于整个编程项目的项目级上下文信息,其能够反映代码和编程项目的逻辑结构,使得代码生成过程能够更多地利用实际编程中人类开发者所需的背景知识和编程项目的整体逻辑,因此提高了生成代码的能力,有助于改善代码生成技术的实际使用体验。
其中,编程项目用于实现用户业务。编程项目的信息可以看成项目源代码文件夹中的信息,该项目源代码文件夹为子文件夹和/或源代码文件的集合,其中的子文件夹还可以包括子文件夹和/或源代码文件,依次类推,最低层次的子文件夹包括多个源代码文件。按照从大到小的粒度,编程项目中的层级单元分别为:编程项目、代码模块(一些编程项目可能不具有)、代码包、类和方法(也称类方法)。例如,以Java语言代码为例,编程项目包括一个或多个代码模块,每个代码模块包括一个或多个代码包,一个代码包包括一个或多个类,一个类包括一个或多个方法,每个方法用于实现一个或多个运算逻辑。相应的,编程项目的项目源代码文件夹包括一个或多个子文件夹(为便于描述称为一级子文件夹),该一个或多个一级子文件夹对应一个或多个代码模块。每个一级子文件夹还包括一个或多个子文件夹(为便于描述称为二级子文件夹),该一个或多个二级子文件夹对应一个或多个代码包。每个二级子文件夹还包括一个或多个子文件夹(为便于描述称为三级子文件夹),该一个或多个三级子文件夹对应一个或多个类。每个三级子文件夹还包括一个或多个源代码文件,每个源代码文件用于记录类中用于实现方法的可执行代码。计算设备按照编程项目的项目源代码文件夹、子文件夹和源代码文件之间的关系,执行源代码文件中记载的所有可执行代码,能够实现用户的业务。
需要说明的是,本申请中编程项目、代码模块、代码包、类和方法的名称及其之间的关系为示例性的说明,并不用于限定本申请。在一些实现场景中,编程项目、代码模块、代码包、类和方法的名称及其之间的关系,可能会随应用场景发生变化。例如,在一些编程场景中,方法也称为函数。但若其利用本申请实施例生成代码的思想生成代码,其也应落入本申请所保护的范围内。
图1是本申请实施例提供的一种基于云服务的代码生成方法涉及的实施场景的示意图。如图1所示,该实施场景包括:计算设备10。该计算设备10用于执行本申请实施例提供的基于云服务的代码生成方法。例如,该计算设备10用于接收代码生成请求,然后,基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息,再基于第一上下文信息和代码生成请求,生成第一可执行代码。
计算设备10可以通过物理机、包括多个物理机的物理机集群、显卡、人工智能计算芯片、裸金属服务器、云服务器、虚拟机或容器等实现。并且,计算设备10可以独立地部署在物理机、物理机集群、裸金属服务器、云服务器、虚拟机或容器上。或者,计算设备10可以分布式地部署在多个物理机、多个物理机集群、多个裸金属服务器、多个云服务器、多个虚拟机和多个容器中的一个或多个上。
本申请实施例提供了一种计算设备。该计算设备用于实现本申请实施例提供的基于云服务的代码生成方法的部分或全部功能。图1是本申请实施例提供的一种计算设备的结构示意图。如图1所示,该计算设备10包括处理器101、存储器102、通信接口103和总线104。其中,处理器101、存储器102、通信接口103通过总线104实现彼此之间的通信连接。计算设备10可以是服务器或终端设备。应理解,本申请不限定计算设备10中的处理器、存储器的个数。
处理器101可以包括通用处理器和/或专用硬件芯片。通用处理器可以包括:中央处理器(central processing unit,CPU)、微处理器(micro processor,MP)或图形处理器(graphics processing unit,GPU)等处理器中的任意一种或多种。CPU例如是一个单核处理器(single-CPU),又如是一个多核处理器(multi-CPU)。专用硬件芯片是一个高性能处理的硬件模块。专用硬件芯片包括数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程逻辑门阵列(field-programmable gate array,FPGA)或者网络处理器(network processer,NP)中的至少一项。处理器101还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的基于云服务的代码生成方法的部分或全部功能,可以通过处理器101中的硬件的集成逻辑电路或者软件形式的指令完成。
存储器102用于存储计算机程序,计算机程序包括操作系统102a和可执行代码(即程序指令)102b。存储器102例如是只读存储器或可存储静态信息和指令的其它类型的静态存储设备,又如是随机存取存储器或者可存储信息和指令的其它类型的动态存储设备,又如是电可擦可编程只读存储器、只读光盘或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备,或者是能够用于携带或存储具有指令或数据结构形式的期望的可执行代码并能够由计算机存取的任何其它介质,但不限于此。例如存储器102用于存放出端口队列等。存储器102例如是独立存在,并通过总线104与处理器101相连接。或者存储器102和处理器101集成在一起。例如,存储器102可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器101还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。
存储器102可以存储可执行代码,当存储器102中存储的可执行代码被处理器101执行时,处理器101用于实现本申请实施例提供的基于云服务的代码生成方法的部分或全部功能。也即是,存储器102上存有用于实现基于云服务的代码生成方法的部分或全部功能的指令。处理器101执行该过程的实现方式请相应参考前述实施例中的相关描述。存储器102中还可以包括操作系统等其他运行进程所需的软件模块和数据等。
通信接口103使用例如但不限于网络接口卡、收发器一类的收发模块,来实现与其他设备或通信网络之间的通信。例如,通信接口103可以是以下器件的任一种或任一种组合:网络接口(如以太网接口)、无线网卡等具有网络接入功能的器件。
总线104是任何类型的,用于实现计算设备的内部器件(例如,存储器102、处理器101、通信接口103)互连的通信总线。例如,总线104可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图1中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线104可包括在计算设备10各个部件(例如,存储器102、处理器101、通信接口103)之间传送信息的通路。本申请实施例以计算设备内部的上述器件通过总线104互连为例说明,可选地,计算设备10内部的上述器件还可以采用除了总线104之外的其他连接方式彼此通信连接。例如,计算设备10内部的上述器件通过内部的逻辑接口互连。
需要说明的是,上述多个器件可以分别设置在彼此独立的芯片上,也可以至少部分的或者全部的设置在同一块芯片上。将各个器件独立设置在不同的芯片上,还是整合设置在一个或者多个芯片上,往往取决于产品设计的需要。本申请实施例对上述器件的具体实现形式不做限定。且上述各个附图对应的流程的描述各有侧重,某个流程中没有详述的部分,可以参见其他流程的相关描述。
本申请实施例提供的基于云服务的代码生成方法的应用场景有多种,下面以以下两种为例对其进行说明:
在第一种应用场景中,本申请实施例提供的基于云服务的代码生成方法可以以代码生成服务的形式向用户提供。并且,该代码生成服务可以以应用程序接口(application programming interface,API)的形式被 调用。服务提供商拥有大量基础资源,如计算资源、存储资源和网络资源等,服务提供商能够利用该基础资源提供代码生成服务。用户从服务提供商处购买代码生成服务后,服务提供商能够为用户提供代码生成服务。其中,用户可以通过客户端从服务提供商的管理平台购买代码生成服务,并向服务提供商的管理平台发送编程项目的信息(如编程项目的所有文件)和代码生成请求。服务提供商的管理平台可向服务提供商的计算设备分配生成第一可执行代码的任务,并将编程项目的信息和代码生成请求发送至被分配任务的计算设备。该计算设备用于基于编程项目的信息和代码生成请求,执行本申请实施例提供的基于云服务的代码生成方法,生成第一可执行代码,并向客户端提供生成的第一可执行代码,以完成向用户提供代码生成服务的过程。或者,基于一些考虑(如信息安全的考虑),在用户购买代码生成服务后,用户可以通过客户端向管理平台发送代码生成请求,管理平台可以向用户的客户端提供用于从编程项目的信息中获取第一上下文信息的工具,使得客户端基于该工具从编程项目的信息中获取第一上下文信息,并向管理平台发送第一上下文信息,服务提供商的计算设备用于基于第一上下文信息和代码生成请求生成第一可执行代码,并向客户端提供生成的第一可执行代码,以完成向用户提供代码生成服务的过程。
此时,如图2所示,该实施场景还可以包括:客户端20,该客户端20用于实现用户与管理平台之间的交互。可选地,客户端20可以是手机、平板电脑、个人计算机、虚拟机、容器、膝上型计算机、移动电话、多媒体播放器、智能家电、人工智能设备、智能可穿戴设备、电子阅读器、智能车载设备或物联网设备等。此时,计算设备10可以为服务器。且服务器可以是一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务中心。其中,云计算服务中心中部署有云服务提供商拥有的大量基础资源。例如云计算服务中心中部署有计算资源、存储资源和网络资源等。云计算服务中心可以利用该大量基础资源,运行文件系统,并实现本申请实施例提供的基于云服务的代码生成方法。
当服务器通过云计算服务中心实现时,服务器为用户提供生成代码的功能,可以由云服务提供商在云平台抽象成一种代码生成云服务。用户在云平台购买代码生成云服务后,云平台能够利用云计算中心中的资源,使用本申请实施例提供的基于云服务的代码生成方法提供代码生成云服务。并且,该代码生成云服务可以作为一项独立的云服务提供,也可以作为其他云服务的附加服务提供。
可选地,在本申请实施例中,云平台可以是中心云的云平台,或边缘云的云平台。并且,当该计算设备10采用分布式部署方式进行部署时,该云平台还可以是包括中心云和边缘云的云平台。此时,该计算设备10可以部分部署在边缘云的云平台中,部分部署在中心云的云平台中,本申请实施例对其不做具体限定。需要说明的是,在图2所示的实施环境中,计算设备10也可以通过除云平台外的其他资源平台实现,本申请实施例对其不做具体限定。此时,服务器可以通过其他资源平台中的资源实现,并向用户提供生成代码的相关服务。
在第二种应用场景中,本申请实施例提供的基于云服务的代码生成方法,可以以应用程序包的方式向用户提供。例如,服务提供商可以向用户提供用于实现本申请实施例提供的基于云服务的代码生成方法的应用程序包。此时,计算设备10为用户拥有的计算设备。用户从服务提供商处获取该应用程序包后,用户可以在计算设备10中安装该应用程序包,得到用于执行该基于云服务的代码生成方法的应用程序。当计算设备10接收到用户的代码生成请求后,该应用程序可以基于代码生成请求执行该基于云服务的代码生成方法,以生成第一可执行代码。在一种实现方式中,该基于云服务的代码生成方法可以以代码编辑器或集成开发环境(integrated development environment,IDE)的插件等方式供用户使用。
此时,计算设备10可以为客户端。并且,如图3所示,该实施场景还可以包括:计算设备30,该计算设备30用于为计算设备10提供用于实现本申请实施例提供的基于云服务的代码生成方法的应用程序包。可选地,该计算设备30可以为服务器等。
应当理解的是,以上内容是对本申请实施例提供的基于云服务的代码生成方法的应用场景的示例性说明,并不构成对该基于云服务的代码生成方法的应用场景的限定,本领域普通技术人员可知,随着业务需求的改变,其应用场景可以根据应用需求进行调整,本申请实施例对其不做一一列举。
下面对本申请实施例提供的基于云服务的代码生成方法的实现过程进行说明。如图4所示,该基于云服务的代码生成方法的实现过程可以包括以下步骤:
步骤401、接收代码生成请求,代码生成请求用于请求生成编程项目中实现第一方法的第一可执行代码。
开发人员在代码编辑器或集成开发环境中编写可执行代码时,当开发人员的操作满足触发条件时,会触发代码生成请求。该代码生成请求用于请求计算设备执行生成任务,得到第一可执行代码。代码生成请求携带有生成任务的任务描述,计算设备具体基于该任务描述执行生成任务,从而生成第一可执行代码。其中,任务描述相当于“题面”,用于指示需要生成的可执行代码的特点(如功能)。计算设备基于任务描述生成可执行代码的过程,可以看成基于“题面”解题的过程。并且,任务描述还会指示生成点,该生成点用于指示编程项目中用于插入生成的第一可执行代码的位置。可选地,生成点可以为触发代码生成请求的位置。例如,开发人员在编写可执行代码的过程中,可以在需要继续编写第一可执行代码的位置处执行指定操作,以触发代码生成请求,该需要继续编码第一可执行代码的位置即为生成点。又例如,开发人员在集成开发环境的代码行中添加注释后,集成开发环境检测到该添加注释的操作后,可以触发代码生成请求,则该注释的结尾处即为生成点。又例如,开发人员在集成开发环境中编写代码时,输入光标在某个代码编写位置上的停顿时长大于指定时长阈值时,可以触发代码生成请求,则该停顿位置即为生成点。示例的,图5为本申请实施例提供的一种集成开发环境的前端界面的示意图。如图5所示,输入光标位于图5中黑色选择框的位置处(即图5中光标所处的第19行开始位置处),该位置为生成点。
集成开发环境中的方法可以通过一行或多行可执行代码实现,该方法用于实现一项计算任务。需要说明的是,本申请实施例提供的基于云服务的代码生成方法还可以用于生成其他单位的可执行代码。例如,可以用于生成代码行、代码包甚至代码模块的可执行代码,本申请实施例对其不做具体限定。
步骤402、接收范围指示,范围指示用于指示第一上下文信息的获取范围。
在本申请实施例中,计算设备可以根据待生成的第一方法的第一上下文信息,生成第一方法的可执行代码。并且,根据前面描述可知,编程项目涉及的层级单元从大到小依次为:编程项目、代码模块、代码包、类和方法,则根据不同的应用需求,第一上下文信息可以在不同层级单元中获取。因此,用户可以通过客户端向计算设备发送范围指示,以指示第一上下文信息的获取范围。相应的,本申请实施例提供的基于云服务的代码生成方法还可以包括:计算设备接收范围指示。其中,第一上下文信息的备选获取范围可以包括:第一方法所属的编程项目、代码模块、代码包、类和源代码文件。另外,在集成开发环境中编写可执行代码时,还可以使用标准库或三方库(library),因此第一上下文信息的备选获取范围还包括标准库或三方库。
例如,图6为本申请实施例提供的一种用户选择获取范围的界面示意图。如图6所示,“Context-aware Scope”表示获取范围的选项,且获取范围的备选项有自动(auto)、编程项目(repository)、代码包(package)、第一方法所属的源代码文件(file)和类(class),用户可以在该多个备选项中选择第一上下文信息的获取范围。其中,自动表示根据生成点自适应地在整个编程项目中获取第一上下文信息。编程项目、代码包、源代码文件和类表示在各自限定的范围内获取第一上下文信息。并且,该获取范围可以默认为自动,在用户认为需要选择其他获取范围时,可以通过该界面选择其他获取范围。
上下文信息为执行任务所需要的相关信息。在一种实现方式中,上下文信息包括以下一种或多种:已定义的类、变量和方法的功能、访问权限以及调用方式。通过获取第一上下文信息,并基于第一上下文信息生成第一可执行代码,能够复用编程项目中已定义的类、变量和方法的功能、访问权限以及调用方式,简化了生成第一可执行代码的复杂度,有助于提高代码生成能力。
步骤403、接收移除指示,移除指示用于指示是否移除第一上下文信息中的目标信息,目标信息包括以下一种或多种:代码注释、变量赋值、方法体和指示代码底层逻辑的信息。
在代码复用时,一般不需要太过关心调用方法的实现细节或变量赋值。例如,在面向对象编程场景中,由于面向对象编程的抽象、封装和继承特性,其编程过程不需要太过关心调用方法的实现细节或变量赋值。而第一上下文信息中可能包括调用方法的实现细节或变量赋值。另外,通过观察实际代码,可以发现代码中的注释、常量、字符串、代码底层逻辑等位置,可能包含敏感数据(如凭据、密码、秘钥和系统信息等)或用户隐私信息(如IP地址、用户名、个人信息等),但这些信息对于生成第一可执行代码而言并不需要。而上下文信息中可能包含一些隐私信息或敏感信息,若这些信息被泄露,将会影响用户代码资产的安全性和隐私安全。因此,用户可以根据应用需求,确定是否需要移除第一上下文信息中的目标信息,以避免该目标信息泄露。其中,目标信息可以包括以下一种或多种:调用方法的实现细节、方法体、变量赋值、代码注释和指示代码底层逻辑的信息。
在一种实现方式中,该移除指示可以指示需要执行移除操作,计算设备接收到指示执行移除操作的移 除指示时,可以根据预设策略进行信息移除,而无需用户指示需要移除的目标信息的具体类型,即无需指示目标信息为调用方法的实现细节、变量赋值还是代码中包含的敏感信息。在另一种实现方式中,移除指示不仅可以指示需要执行移除操作,还可以指示需要移除的目标信息的具体类型。
例如,图6为本申请实施例提供的一种移除指示的界面示意图。如图6所示,用户可以在移除功能的选项处选择开启,以指示移除第一上下文信息中的目标信息,或者,用户可以在移除功能的选项处选择关闭,以指示无需移除第一上下文信息中的目标信息。此时,该移除指示仅指示了是否需要执行移除操作。其中,图6中“Context Desensitization”表示上下文脱敏功能的选项,即该图6中移除操作通过脱敏处理实现。
步骤404、接收预览指示,预览指示用于指示是否预览第一上下文信息。
用户可以根据需求指示是否查看获取的第一上下文信息,在用户需要或者不需要查看获取的第一上下文信息时,其可以执行对应的操作以触发预览指示,通过该预览指示向计算设备传达其需求。
例如,图6为本申请实施例提供的一种预览指示的界面示意图。如图6所示,用户可以在预览功能的选项处选择开启,以指示需要预览第一上下文信息,或者,用户可以在预览功能的选项处选择关闭,以指示无需预览第一上下文信息。其中,图6中“Context-Preview”表示预览第一上下文信息的选项。
需要说明的是,范围指示、移除指示和预览指示中的部分或全部可以携带在代码生成请求中,也可以独立于代码生成请求发送,本申请实施例对其不做具体限定。并且,计算设备可以提供接收范围指示、移除指示和预览指示中的部分或全部的功能,也可以不提供,其具体实现可以根据应用需求进行设置。相应的,当计算设备不提供范围指示、移除指示和预览指示中的部分或全部的功能时,在生成第一可执行代码的过程中则无需执行对应的步骤。
步骤405、生成编程项目的逻辑结构图,逻辑结构图用于指示编程项目中各项内容的关联关系。
计算设备可以对编程项目中的各项内容进行分析,并根据各项内容的关联关系生成编程项目的逻辑结构图。逻辑结构图是具有图格式的数据结构。编程项目中的各项内容可以根据语义划分得到。例如,各项内容可以分别为编程项目、代码模块、代码包、类和方法等层级单元的内容。
在一种实现方式中,逻辑结构图可以通过具有树形结构的项目结构图表示,可以将编程项目中的每个层级单元展开,并使用逻辑结构图的根节点、中间节点和叶子节点表示对应的层级单元,并根据层级单元之间的关联关系,在根节点、中间节点和叶子节点之间添加边,以得到表示编程项目中各个层级单元之间层次和引用关系的逻辑结构图。其中,根节点表示编程项目。叶子节点表示的层级单元与需要生成的可执行代码所属的层级单元相同。例如,该基于云服务的代码生成方法需要生成的可执行代码的层级单元为方法,则叶子节点表示的层级单元为方法。相应的,中间节点表示粒度位于编程项目与方法之间的层级单元。例如,不同的中间节点表示的层级单元可以分别为代码模块、代码包和类。
例如,以Java语言代码为例,该逻辑结构图可以采用B+树的图表示,则在生成逻辑结构图的过程中,可以分别将编程项目、代码模块、代码包、类文件和源代码文件展开,使用B+树的根节点表示编程项目,使用B+树的叶子节点表示方法,使用B+树不同层级的中间节点分别表示位于编程项目和方法之间的不同层级单元,并根据编程项目、代码模块、代码包、类文件和源代码文件中的方法之间的关联关系,在根节点、中间节点和叶子节点之间添加边,以得到表示了编程项目、代码模块、代码包、类文件和方法之间层次和引用关系的B+树的图。示例地,图7为计算设备获取第一上下文信息的实现过程示意图。如图7所示,计算设备接收到代码生成请求后,可以基于代码生成请求确定第一方法所属的编程项目,然后生成编程项目的逻辑结构图。图8为生成的逻辑结构图的示意图,如图8所示,图8中同一种形状的线框表示一种粒度的层级单元,且图8中多种层级单元的粒度按照图8中从上至下的顺序依次减小。图8中实线箭头表示不同层级单元之间的关系,图8中虚线箭头表示相同粒度的层级单元之间的关系,图8中import为Java中的导入语句。
步骤406、当范围指示用于指示第一上下文信息的获取范围为编程项目时,基于编程项目的逻辑结构图,在编程项目的信息中,获取第一方法有权限访问的权限范围。
可选地,在获取第一上下文信息之前,可以先划分第一方法有权限访问的权限范围,然后在该权限范围内获取第一上下文信息。这样一来,仅需在第一方法有权限访问的权限范围内获取第一上下文信息,能够避免向该第一方法中引入第一方法没有权限访问的内容,保证获取的第一上下文信息的效率和有效性。
在一种实现方式中,该步骤406的实现过程包括:基于第一方法在编程项目中的位置、第一方法所属 的目标类的层次、引用关系及目标类的访问控制权限中的至少一个,获取权限范围。第一方法在编程项目中的位置可以通过生成点表示。目标类的层次表示目标类在编程项目的层级单元中的层次。目标类的引用关系表示目标类引用的内容。目标类的访问控制权限可以由开发人员设置,用于限定目标类中可执行代码的访问控制权限。例如,以Java语言代码为例,被访问对象的访问控制权限可以为公共(public)、受保护(protected)、私有(private)和默认(default)。当被访问对象的访问控制权限为公共时,表示该被访问对象能够被整个编程项目的访问对象访问。当被访问对象的访问控制权限为私有时,表示该被访问对象仅能被属于该被访问对象范围内的访问对象访问,如类的访问控制权限为私有时,该类的信息仅能被属于该类的方法等访问对象访问。当被访问对象的访问控制权限为默认时,表示该被访问对象的访问控制权限遵守编程语言的默认规定。Java语言规定同一层级单元中的内容可以直接互用。例如,当多个源代码文件属于同一个代码包,且多个源代码文件的访问权限均为非私有时,Java语言规定该多个源代码文件可以直接互用,该多个源代码文件具有互相访问的权限。受保护的访问权限是在默认的基础上增加了子类对被访问对象的访问权限。子类是通过继承得到的类。示例的,表1为Java语言中对访问权限的规定,其中“是”表示具有访问权限,“否”表示没有访问权限,如表1中第2行第2列中的“是”,表示与被访问对象同属本类的访问对象对被访问对象具有访问权限。
表1
可选地,在确定权限范围时,可以先确定第一方法在编程项目中的位置、目标类的层次、引用关系和有权限访问的范围,然后根据第一方法在编程项目中的位置、目标类的层次和引用关系,确定该目标类需要访问的范围,将目标类需要访问的范围和有权限访问的范围的交集,确定为第一方法有权限访问的权限范围。需要说明的是,若生成了编程项目的逻辑结构图,在确定权限范围时,可以基于逻辑结构图所表示的节点及节点之间的关系,按照上述描述进行分析,以确定该权限范围,且该权限范围可视为该逻辑结构图的子图。示例地,如图7所示,计算设备生成编程项目的逻辑结构图后,在该逻辑结构图中划分了第一方法有权限访问的权限范围,该权限范围如图8中虚线框圈出的范围。
步骤407、基于代码生成请求对逻辑结构图中位于权限范围内的子图进行分析,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息。
当范围指示用于指示第一上下文信息的获取范围为编程项目时,计算设备能够在整个编程项目中获取第一上下文信息。这样一来,计算设备就能够在生成第一可执行代码时,考虑代码和编程项目的逻辑结构,有助于提高代码生成能力。
根据前面内容可知,用于实现方法的可执行代码通常保存在源代码文件中,那么第一上下文信息可能包括该第一方法所属的源代码文件中的内容,也可能包括该第一方法所属的源代码文件外的内容,则在获取第一上下文信息时,可以分别从第一方法所属的源代码文件中和源代码文件外获取第一上下文信息。在一种实现方式中,获取生成第一可执行代码所需的第一上下文信息,包括:执行文件外上下文展开和文件内代码重组等处理,根据处理结果获取第一上下文信息。
在一种实现方式中,如图9所示,执行文件内代码重组,并根据重组结果获取第一上下文信息的实现过程包括:
步骤407a1、基于代码生成请求,在编程项目的文件中,获取第一方法所属的源代码文件。
根据前面描述可知,方法的可执行代码保存在源代码文件中,则可以基于代码生成请求,在编程项目的文件中,定位第一方法所属的源代码文件。该定位第一方法所属的源代码文件的过程,实际是基于代码生成请求指示的生成点,确定该生成点所在的源代码文件的过程。示例地,如图7所示,计算设备接收到代码生成请求后,可以基于代码生成请求确定第一方法所属的源代码文件,该源代码文件的内容如图10所示。
步骤407a2、在第一方法所属的源代码文件中,获取撰写位置位于第一方法后的下文信息。
获取第一方法所属的源代码文件后,可以获取该第一方法在该源代码文件中的撰写位置,然后在源代码文件中获取撰写位置位于该第一方法后的下文信息。其中,生成点在该源代码文件中的位置即为第一方法在源代码文件中的撰写位置。在源代码文件中,撰写位置位于该生成点后的所有信息均为该第一方法的下文信息。相应的,撰写位置位于该生成点前的所有信息均为该第一方法的上文信息。一般地,上文信息一般包括代码包和import语句,所在类的签名,所在类的部分声明(如成员变量、构造函数、部分方法等声明)等。而下文信息一般包括所在类的其他方法声明。该获取下文信息的过程实际是对源代码文件中内容进行分区的过程,即文件内代码分区的过程,该过程将源代码文件中内容分为第一方法的上文信息和下文信息。示例地,如图11所示,对图10所示的源代码文件的内容进行文件内代码分区后,该源代码文件的内容分成了三部分:上文信息、第一方法和下文信息。图11中指向5.1、5.2和5.3的箭头前面的内容为上文信息,指向5.4的箭头前面的内容为第一方法,指向5.5的箭头前面的内容为下文信息。其中,由于图11、图12、图14和图15具有关联性,为便于描述,在图11、图12、图14和图15的每个方框前标注了数字,在方框后标注了箭头以指示图之间的关联性,且图11、图12、图14和图15中任一方框前的数字用于指示该方框,方框后的箭头用于指示经过处理后该方框的内容。
步骤407a3、将下文信息的撰写位置调整至第一方法之前,使得调整位置后的下文信息成为第一方法的上文信息。
在一些可执行代码(如编译型语言的可执行代码)中,方法在源代码文件中的撰写顺序不是很重要,交换不同方法的顺序不会影响可执行程序的执行和语义。例如,在Java等编译型语言中的方法之间顺序并不重要,交换顺序并不影响程序执行和语义。但是,在基于第一上下文信息生成第一可执行代码时,通常是按照自然语言的从上到下的顺序对第一上下文信息进行感知的,第一上下文信息中不同内容的顺序改变可能会影响语义,甚至无法感知下文信息,导致难以获得和利用编程项目的背景知识、结构和代码等全局信息。例如,当使用预训练语言模型生成第一可执行代码时,预训练语言模型通常采用类似自然语言的从上到下的训练顺序,无法对下文进行感知。因此,在获取到第一方法的下文信息后,可以将下文信息调整至第一方法之前,使得第一方法的下文信息变成第一方法的上文信息,达到在源代码文件中对代码进行重组的目的。示例地,将图11中下文信息的撰写位置调整后,该源代码文件中上文信息、第一方法和下文信息的位置如图12所示,可见图12中5.5指示的下文信息的撰写顺序已调整至图12中5.4指示的第一方法的撰写位置前。上述步骤407a1至步骤407a3即为对源代码文件中代码进行重组的过程。
步骤407a4、基于第一方法的上文信息,获取第一上下文信息。
在源代码文件中对代码进行重组后,即可基于经过重组的源代码文件中的上文信息,获取第一上下文信息。例如,可以将经过重组的源代码文件中的所有上文信息确定为第一上下文信息。又例如,当该需要对该经过重组的源代码文件执行其他处理时,还可以对该经过重组的源代码文件执行其他处理(如文件外上下文展开),并基于经过其他处理的源代码文件获取第一上下文信息。
在一种实现方式中,如图13所示,通过文件外上下文展开,并根据展开结果获取第一上下文信息的实现过程包括:
步骤407b1、基于代码生成请求,在编程项目的文件中,获取第一方法所属的源代码文件。
该步骤407b1的实现过程请相应参考步骤405a1的实现过程,此处不再赘述。
步骤407b2、基于代码生成请求对逻辑结构图中位于权限范围内的子图进行分析,在编程项目的信息中,获取源代码文件使用到的位于源代码文件外的外部信息。
在编程项目的信息中,有的信息可能被第一方法所属的源代码文件使用,有些信息可能不被第一方法所属的源代码文件使用到。在对文件外上下文展开时,需要先在编程项目的信息中,确定被第一方法所属的源代码文件使用到的外部信息,然后基于该外部信息进行上下文展开。在一种可实现方式中,可以根据该源代码文件与编程项目中其他层级单元的关联关系,确定该外部信息。例如,当基于代码生成请求对逻辑结构图进行分析时,可以根据逻辑结构图中用于表示该源代码文件中内容的节点与其他节点之间的边,确定该外部信息。当逻辑结构图中用于表示该源代码文件中内容的节点与逻辑结构图中用于表示该源代码文件外内容的节点连接有边时,该源代码文件外内容即为该外部信息。又例如,当该源代码文件中的可执行代码指示需要使用其他层级单元中的内容时,可以根据该指示将其指示需要使用的内容确定为外部信息。以Java语言代码为例,import语句为Java中的导入语句,其指示需要使用其他层级单元中的内容,则在确定外部信息时,可以根据该import语句的指示确定其导入的位于源代码文件外的信息,并将该信息确定 为外部信息。需要说明的是,在一些编程场景中,当前源代码文件中的可执行代码指示需要使用其他层级单元中的内容,可以不是显式方式指示的使用的内容,则在确定外部信息时,需要根据源代码文件中可执行代码表示的实际内容,确定该源代码文件需要使用其他层级单元中的内容。例如,以Java语言代码为例,同一层级单元中的内容可以直接互用(如同一个代码包中的多个源代码文件可以直接互用),不需要通过import语句导入,因此,需要根据源代码文件中可执行代码表示的实际内容,确定外部信息。
需要说明的是,若确定了第一方法有权限访问的权限范围,则在编程项目的信息中,获取源代码文件使用到的位于源代码文件外的外部信息,包括:在该权限范围中,获取外部信息。也即是,在获取外部信息时,是在第一方法所属的源代码文件外且位于该权限范围内的信息中,获取该外部信息,而无需在该权限范围外获取外部信息。类似地,当基于编程项目的逻辑结构图获取外部信息时,此时是基于代码生成请求对逻辑结构图中位于权限范围内的子图进行分析,在编程项目的信息中,获取源代码文件使用到的位于源代码文件外的外部信息。其中,在权限范围内获取外部信息的实现过程,相对于在整个编程项目中获取外部信息的实现过程,区别仅在于获取外部信息的备选范围的大小,在备选范围内获取外部信息的实现方式相同,因此,此处对在权限范围内获取外部信息的实现方式不再赘述。通过在第一方法有权限访问的权限范围内获取外部信息,能够避免向该第一方法中引入第一方法没有权限访问的内容,保证获取的第一上下文信息的效率和有效性。
步骤407b3、在第一方法所属的源代码文件中补充外部信息。
在确定外部信息后,就可以在第一方法所属的源代码文件中补充外部信息,达到基于源代码文件外的外部信息对上下文展开的目的,以便于基于该外部信息和该源代码文件生成第一可执行代码。在一种可实现方式中,在源代码文件中补充外部信息的实现方式可以包括:在该源代码文件中使用该外部信息的位置处,替换指示该外部信息的内容。例如,当源代码文件中使用import语句导入外部信息时,可以使用该外部信息替换该import语句。示例地,图14为在图12所示的源代码文件中补充外部信息后的示意图,根据图14可见向图12中5.1部分补充了外部信息,补充外部信息后的5.1即为图14中的6.1。
步骤407b4、基于经过补充的源代码文件,获取第一上下文信息。
在源代码文件中补充外部信息后,即可基于经过补充的源代码文件获取第一上下文信息。例如,可以将经过补充的源代码文件中的所有信息确定为第一上下文信息。又例如,当该需要对该经过补充的源代码文件执行其他处理时,还可以对该经过补充的源代码文件执行其他处理(如文件内代码重组),并基于经过其他处理的源代码文件获取第一上下文信息。
需要说明的是,在获取第一上下文信息时,除了执行文件外上下文展开和文件内代码重组等处理,还可以结合其它的一些分析方式。例如,借助Eclipse JDT和TreeSitter等程序分析工具,和/或,通过集成开发环境提供的应用程序接口访问集成开发环境分析项目得到的中间数据(如Intellji的PSI),和/或,复用集成开发环境缓存的项目索引信息,以更快速、更准确地得到第一上下文信息,为代码生成模型提供更多的生成依据。
步骤408、当移除指示用于指示移除第一上下文信息中的目标信息时,在第一上下文信息中移除目标信息,得到更新后的第一上下文信息。
当移除指示用于指示移除第一上下文信息中的目标信息时,在获取第一上下文信息后,还可以在第一上下文信息中移除目标信息,以更新第一上下文信息。这样一来,可以删除第一上下文信息中与生成第一可执行代码无关的信息、涉及隐私的信息和/或敏感信息,从而在第一上下文中保留体现第一上下文中内容的层次结构和签名信息的相关内容。一方面能够达到压缩第一上下文信息的目的,使得在同样的输入长度下能够输入包含更多有价值内容的上下文信息,另一方面还能够保证生成的代码的可用性、用户隐私和代码的安全性。当使用代码生成模型基于第一上下文信息生成第一可执行代码时,由于代码生成模型的输入长度(也称为窗口大小,通常为1024到4096)有一定的限制,在输入长度有限的情况下,若直接使用上述步骤获取的第一上下文信息,会向代码生成模型带入冗余信息,限制代码生成模型可利用的上下文信息量,从而降低上下文信息的利用效率。因此,通过压缩第一上下文信息,能够删除第一上下文信息中对生成第一可执行代码作用不大的信息,使得在有限的输入长度内输入更多有利用价值的上下文信息,提高对上下文信息的利用效率,从而提高代码生成能力。如图15所示,通过移除操作,移除了将图14中6.1中的“private void clean(){…}”和“public String readConfig(){…}”、6.3中的IP地址“192.168.11.34”、6.5中的标识符“/**start listening to 8080**/”等目标信息。
在一种实现方式中,可以采用预先设定的移除规则对目标信息进行移除。该移除规则指示将第一上下文信息中所有符合目标信息的内容进行移除。或者,可以使用第三方工具(如ShiftLeft)对目标信息进行移除。
步骤409、将第一上下文信息抽象为接口声明形式,得到更新后的第一上下文信息。
当使用预训练模型生成第一可执行代码时,在基于第一上下文信息和代码生成请求,生成第一可执行代码之前,还可以对第一上下文信息进行抽象化处理,将第一上下文信息抽象为符合语法的接口声明(interface declaration)形式,以达到复用预训练模型在训练过程中学习到的编程语言语法知识的目的。在一种实现方式中,以Java语言代码为例,将第一上下文信息抽象为接口声明形式可以包括:将第一上下文信息中的类(如class)变成接口(如interface)。其中,该过程也称为将第一上下文信息按照接口表示进行标准化。如图15所示,将图14中6.1中的class替换成了interface。
步骤410、基于第一上下文信息中多个内容与第一方法的相关性,对多个内容进行排序,得到更新后的第一上下文信息。
当使用代码生成模型生成第一可执行代码时,由于代码生成模型的输入长度有限制,因此在获取第一上下文信息后,还可以对第一上下文信息中的多个内容进行排序,以便于按照排序后的先后顺序向代码生成模型输入第一上下文信息中的多个内容,从而降低与第一方法具有较大的相关性的内容在输入代码生成模型时,因输入长度限制被截断的概率,以便于具有较高重要性的上下文信息能够输入代码生成模型。并且,当使用代码生成模型生成第一可执行代码时,不仅需要向代码生成模型输入第一上下文信息,还需要向代码生成模型输入第一方法的任务描述。而代码生成模型读取输入的信息时,先读取任务描述,然后按照内容到任务描述的距离,优先读取距离任务描述近的内容,然后读取距离任务描述远的内容。因此,在一种可实现方式中,基于多个内容与第一方法的相关性,对多个内容进行排序,包括:将多个内容排列在第一方法的任务描述之前,且任一内容到任务描述的距离与内容对应的相关性反相关。并且,还可以设置相关性阈值,当任一内容与第一方法的相关性小于相关性阈值时,可以在第一上下文信息中删除该内容。
本申请实施例中,获取第一上下文信息中多个内容与第一方法的相关性的实现方式有多种,下面以以下几种实现方式为例对其进行说明。并且,其实现方式可以包括以下一种或多种的组合:
在第一种实现方式中,获取第一上下文信息中多个内容与第一方法的相关性,包括:获取每个内容的标识符与第一方法的相关信息的第一相似度。其中,任一内容与第一方法的第一相似度与其相关性正相关,即当该内容与第一方法的第一相似度越大时,该内容与第一方法的相关性越大。可选地,标识符包括以下一种或多种:变量名、方法名、包名、类名和常量名。第一方法的相关信息包括以下一种或多种:第一方法的方法描述、第一方法的方法名、第一方法的返回类型和第一方法的参数类型。
在第二种实现方式中,获取第一上下文信息中多个内容与第一方法的相关性,包括:获取每个内容与第一方法在编程项目中层次的距离。其中,任一内容与第一方法在编程项目中层次的距离与其相关性反相关,即当该内容与第一方法在编程项目中层次的距离越大时,该内容与第一方法的相关性越大。可选的,可以预先设置编程项目中不同代码模块、不同代码包、不同类和不同方法之间的距离差值,在获取每个内容与第一方法在编程项目中层次的距离时,可以确定该内容与第一方法所在的层级单元,然后根据该内容所在的层级单元和第一方法所在的层级单元确定两者跨越的层级单元,然后根据跨越的层级单元对应的距离差值,得到该内动与第一方法在编程项目中层次的距离。例如,可以将两者跨越的层级单元对应的距离差值的总和确定为该内容与第一方法在编程项目中的层次的距离。或者,当根据逻辑结构图获取第一上下文信息时,可以根据该内容在逻辑结构图上的节点到达第一方法在逻辑结构图上的节点经过的跳数,确定该内容与第一方法在编程项目中的层次的距离。例如,可以直接将该内容在逻辑结构图上的节点到达第一方法在逻辑结构图上的节点经过的跳数,确定为该内容与第一方法在编程项目中的层次的距离。或者,还可以为不同层级单元设置权重,可以根据该内容在逻辑结构图上的节点到达第一方法在逻辑结构图上的节点经过的跳的权重,确定为该内容与第一方法在编程项目中的层次的距离。示例的,当第一内容与第一方法分别属于不同类时,该第一内容在逻辑结构图上的节点到达第一方法在逻辑结构图上的节点经过的跳数至少为3,当第二内容与第一方法具有继承关系时,该第二内容在逻辑结构图上的节点到达第一方法在逻辑结构图上的节点经过的跳数可以为2,则第二内容与第一方法在编程项目中层次的距离小于第一内容与第一方法在编程项目中层次的距离,第二内容相对于第一内容与第一方法的相关性更大。
在第三种实现方式中,获取第一上下文信息中多个内容与第一方法的相关性,包括:获取每个内容与 第一方法的关联内容调用的上下文信息的第二相似度。其中,任一内容与第一方法的关联内容调用的上下文信息的第二相似度与其相关性正相关,即当该内容与第一方法的关联内容调用的上下文信息的第二相似度越大时,该内容与第一方法的相关性越大。可选的,第一方法的关联内容包括以下一种或多种:第一方法所属的目标类的关联类,该目标类中除第一方法外的其它方法。
其中,目标类的关联类是指类实现的业务具有关联性,例如,用于获取班级学生姓名的类、用于获取班级学生成绩的类和用于获取班级的班主任姓名的类都需要获取班级的相关信息,因此三者互为关联类。根据关联类调用的上下文信息获取第二相似度的过程,相当于是在默认第一方法所属的目标类与其关联类具有较大相关性的前提下,通过获取第一上下文信息中内容与该关联类调用的上下问信息的第二相似度,以推测第一方法调用该内容的概率,从而得到第一上下文信息中内容与第一方法的相关性。在一些实现场景中,该实现方式也可称为获取第一上下文信息中内容与第一方法的耦合度。
在一种实现方式中,当使用逻辑结构图的叶子节点表示方法时,该其它方法可以为表示该第一方法的叶子节点的兄弟节点表示的方法。此时,根据该其它方法调用的上下文信息获取第二相似度的过程,相当于是在默认第一方法与该其它方法具有较大相关性的前提下,通过获取第一上下文信息中内容与该其它方法调用的上下问信息的第二相似度,以推测第一方法调用该内容的概率,从而得到第一上下文信息中内容与第一方法的相关性。该实现方式也可称为获取第一上下文信息中内容与第一方法的耦合度。
示例地,图15为对图14所示的源代码文件得到的第一上下文信息执行移除目标信息、抽象为接口声明形式和排序后的示意图。如图15所示,经过移除目标信息、抽象为接口声明形式和排序后,得到的第一上下文信息包括:项目级上下文、依赖库上下文和文件级上下文,项目级上下文为图15中7.1的内容,依赖库上下文为图15中7.2的内容,文件级上下文为图15中7.3和7.5的内容。并且,如图15所示,也得到了图15中7.4的任务描述,该任务描述中的“???”用于标示生成点。
需要说明的是,在第一方法所属的源代码文件中,其包括的上下文信息中的多种内容的类型可能不同,在获取第一上下文信息的过程中,可以针对不同类型采取上述步骤407至步骤410中的不同处理方式,以得到有效且准确的第一上下文信息。例如,第一方法所属的源代码文件可能包括:项目内部导入语句、标准库和三方库导入语句、生成点所在类、文件头部的文件内定义的其他类等几部分。对于标准库和三方库导入语句,可以直接保留作为第一上下文信息的一部分。而对于生成点所在类和文件内定义的其他类,可以将其抽象为接口声明形式,并根据其与第一方法的相关性对其进行排序。对于项目内部导入语句,可以对其使用文件外上下文展开的处理。如对于语句package和import语句等项目内部导入语句,可以分析当前类所依赖的项目中的其他类(如接口或枚举类型等),并展开其内容,分析其中定义的成员变量名、方法签名、常量及其访问控制关键字,将具体的赋值语句、初始化块和方法体等部分移除,以及,抽象为接口声明形式,并根据其与第一方法的相关性对其进行排序。
作为一种示例,图16为在集成开发环境中触发代码生成请求的示意图。如图16所示,图16中生成点位于init()方法的签名之后(即图16中光标所处的第16行开始位置处)。在相关技术中,通常会将当前源代码文件内撰写位置位于生成点附近一定范围内的上下文,以文本形式作为请求发送到代码生成模型,以便于代码生成模型使用该上下文生成可执行代码。假设代码生成模型的输入窗口为1024,则最多将光标所在位置前后的1024个token输入代码生成模型。这些上下文包括init()的方法声明、注释、init()之上的成员变量声明、类声明语句、导入语句等,且部分模型(如Meta的InCoder)还允许将init()之下的其他内容作为输入。在这种输入情况下,由于所得信息最多为当前源代码文件中内容,代码生成模型在基于这些上下文生成可执行代码时,倾向于按照训练数据中出现频率较高的代码实现方案生成可执行代码,例如直接使用ServerSocket、ServerHandler等类从头实现。但是,按照此方式生成的可执行代码会过于底层,且可能会包含来自其他项目、未导入依赖的代码,不符合用户预期,导致用户体验较差。
假设按照图6所示的选择进行配置,则图17为按照该配置,采用本申请实施例提供的获取上下文信息的方式获取的第一上下文信息的示意图。如图17所示,在确定生成点后,计算设备首先会分析当前生成点所在位置,然后对第一方法所在源代码文件中的内容进行分类,并按照内容的类型对对应内容执行上述步骤407至步骤410中对应的处理方式。其处理方式包括:将源代码文件中的一些内容直接用作第一上下文信息,对源代码文件中的一些内容使用了文件外上下文展开、移除目标信息、抽象为接口声明形式和排序的处理,并移除了展开后的内容的目标信息,对源代码文件中的一些内容使用了文件内代码重组、移除目标信息和抽象为接口声明形式的处理。
步骤411、当预览指示用于指示预览第一上下文信息时,显示第一上下文信息。
当预览指示用于指示预览第一上下文信息时,在获取到用于生成第一可执行代码的第一上下文信息后,计算设备可以向用户显示第一上下文信息,以便于用户对第一上下文信息进行查看。其中,当未对第一上下文信息进行更新时,用于生成第一可执行代码的第一上下文信息为通过步骤407获取到的第一上下文信息。当对第一上下文信息进行更新时,用于生成第一可执行代码的第一上下文信息为更新后的第一上下文信息。
图18为本申请实施例提供的一种第一上下文信息的预览界面示意图。该预览界面可以显示在图5所示的前端界面的右侧。如图18所示,经过上述过程获取的第一上下文信息包含:(1)util.Helper类之中的createServer()方法声明;(2)当前文件导入的Java标准库;(3)当前类Server声明的成员变量ip;(4)当前类Server在生成点之后已定义的方法start()。这些上下文信息会被附加在init()方法的注释之前,作为请求内容输入到代码生成模型中,使代码生成模型更容易生成与当前项目上下文相关、尽量复用已有封装的代码实现,从而更加符合用户期望。
步骤412、接收指示同意第一上下文信息的同意指示。
用户在查看第一上下文信息时,可以对第一上下文信息进行审核,当用户认为第一上下文信息符合其应用需求时,可以执行指定操作以触发同意指示,以告知计算设备用户同意该第一上下文信息。相应的,在用户触发同意指示后,计算设备会接收到该同意指示。如图18所示,预览界面的左上角设置有同意按钮(accept),当用户认为第一上下文信息符合其应用需求时,可以点击同意按钮以触发同意指示,告知计算设备用户同意该第一上下文信息。相应的,预览界面的右上角设置有刷新按钮(refresh),当用户认为第一上下文信息不符合其应用需求时,可以点击刷新按钮以触发计算设备重新获取第一上下文信息。
步骤413、在接收到同意指示后,基于第一上下文信息和代码生成请求,生成第一可执行代码。
计算设备接收到同意指示后,可以基于第一上下文信息和代码生成请求,生成第一可执行代码。其中,代码生成请求携带有任务描述,计算设备基于第一上下文信息和代码生成请求,生成第一可执行代码,主要是基于第一上下文信息和任务描述生成第一可执行代码。当使用代码生成模型生成第一可执行代码时,该步骤413的实现过程可以包括:将第一上下文信息和代码生成请求输入代码生成模型,得到代码生成模型输出的第一可执行代码。
可选的,代码生成模型可以为预训练模型。在一种实现方式中,本申请实施例使用的代码生成模型可以为预训练语言模型(pre-trained language model,PLM),如生成式预训练解码器(generative pre-trained transformer,GPT)模型。并且,为了显式区分上下文、注释和代码片段之间的差异,如图19所示,生成式预训练解码器模型中引入了上下文嵌入(context embedding)层,该上下文嵌入层用于对输入数据中属于不同类型的内容进行赋值,如对属于上下文的内容赋值为0,对代码注释内容赋值为1,对代码片段赋值为2,从而使生成式预训练解码器模型能够区分不同类型的内容的特征。
需要说明的是,预览指示还可以指示是否预览生成的第一可执行代码。当预览指示用于指示预览第一可执行代码时,在生成第一可执行代码后,计算设备可以向用户显示第一可执行代码,以便于用户对第一可执行代码进行查看。用户在查看第一可执行代码时,可以对第一可执行代码进行审核,当用户认为第一可执行代码符合其应用需求时,可以执行指定操作以触发同意指示,以告知计算设备用户同意该第一可执行代码。相应的,在用户触发同意指示后,计算设备会接收到该同意指示,可将该第一可执行代码插入生成点。
图20为本申请实施例提供的一种第一可执行代码的预览界面示意图。如图20所示,预览界面的左上角设置有同意按钮(accept),当用户认为第一可执行代码符合其应用需求时,可以点击同意按钮以触发同意指示,告知计算设备用户同意该第一可执行代码。计算设备收到该同意指示,可将该第一可执行代码在生成点处插入第一方法所属的源代码文件。例如,插入第一可执行代码后的源代码如图21所示,图21中虚线框内的可执行代码为第一可执行代码。
综上所述,在本申请实施例提供的基于云服务的代码生成方法中,在接收代码生成请求后,能够基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息;然后,基于第一上下文信息和代码生成请求,生成第一可执行代码。在该基于云服务的代码生成方法中,由于需要先从编程项目的信息中获取生成第一可执行代码所需的第一上下文信息,该第一上下文信息包括来源于整个编程项目的项目级上下文信息,其能够反映代码和编程项目的逻辑结构,使得代码生成过程能够更多地利用实 际编程中人类开发者所需的背景知识和编程项目的整体逻辑,因此提高了生成代码的能力,有助于改善代码生成技术的实际使用体验。
需要说明的是,上述获取第一上下文信息的过程可以独立于具体的代码生成过程,即上述获取第一上下文信息的过程与基于第一上下文信息生成第一可执行代码的过程可以解耦。因此,以上获取上下文信息的能力在不同的代码生成技术间具有通用性,能够被不同的工具集成,以提升不同的代码生成技术的用户体验。例如,该获取上下文信息的过程,不仅可以用于生成实现方法的可执行代码,还可以用于行级代码补全等其他单元的代码的生成,且不仅可以用于面向对象语言的代码生成,也可以用于面向过程语言的代码生成。并且,随着应用场景的不同,本申请实施例提供的基于云服务的代码生成方法的实现形态也可以有多种。例如,其可能的形态可以包括:以客户端代码编辑器(如VSCode)或集成开发环境(如JetBrains系列)的新功能,随着版本更新迭代出现;以基于预训练语言模型的代码生成插件工具(如Copilot、Tabnine、AiXCoder)的新特性,随着版本更新迭代出现;以云端代码编辑器(如SourceGraph)或开发环境(如GitHub Codespace)的辅助编码功能出现。
下面对预训练模型的训练方法进行说明。如图22所示,该方法包括如下步骤:
步骤2201、获取已成功编译的编程项目中实现第二方法的第二可执行代码。
可选地,如图23所示,获取已成功编译的编程项目中实现第二方法的第二可执行代码,包括:
步骤22011、获取已成功编译的编程项目中的所有第二方法。
在可执行代码中,方法通常采用方法体表示。因此,在获取编程项目中的第二方法时,可以先在编程项目中确定所有方法体,以得到采用方法体表示的所有第二方法。在一种实现方式中,已成功编译的编程项目可以来自于历史开发过程的积累,也可以来自于一些公开的数据源,例如,可以来自于开源的软件代码仓库(如GitHub)等。
需要说明的是,在执行步骤22011之前,可以先生成已成功编译的编程项目的逻辑结构图,并使用该逻辑结构图的叶子节点表示该编程项目中的方法,然后将逻辑结构图的所有叶子节点表示的方法确定为该第二方法。其中,生成逻辑结构图的实现过程,请相应参考步骤405的实现过程,此处不再赘述。
步骤22012、对所有第二方法进行筛选,得到通过筛选的第二方法,通过筛选的第二方法用于表达运算逻辑。
在获取已成功编译的编程项目中的所有第二方法后,可以对第二方法进行筛选,以得到符合标准的第二方法。由于与实现业务有关的方法才对生成代码有帮助,因此可以按照第二方法是否表达具有实际意义的业务逻辑,对第二方法进行筛选。而表达具体的运算逻辑的方法才能表达具有实际意义的业务逻辑,因此在筛选时,可以令用于表达运算逻辑的第二方法通过筛选,令未表达运算逻辑的第二方法不通过筛选。
在一种实现方式中,未通过筛选的第二方法可以具有以下一种或多种特点:第二方法的方法体为空,第二方法具有特殊用途,第二方法的方法体不包括运算表达式。可选地,特殊用途可以包括以下一种或多种:获取、设置、构造和返回。例如,以Java语言代码为例,getter方法用于获取,setter方法用于设置,方法toString用于把一个对象转换为一个字符串,并返回结果,方法hashCode用于返回对象的哈希值,均为实现业务服务的特殊方法,均不表达具有实际意义的业务逻辑,因此这些方法均无法通过筛选。
步骤22013、获取实现通过筛选的第二方法的第二可执行代码。
对第二方法进行筛选后,可获取通过筛选的第二方法的第二可执行代码,以便于使用该第二可执行代码训练待训练模型。其中,在对第二方法进行筛选后,可将通过筛选的第二方法的方法体中的代码确定为用于实现该第二方法的第二可执行代码。
步骤2202、获取第二可执行代码的第二上下文信息。
获取第二可执行代码的第二上下文信息的实现过程,可以相应参考前述描述中获取生成第一可执行代码所需的第一上下文信息的实现过程,以对齐模型的训练任务和推理任务。例如,在获取第二上下文信息之前,也可以预先获取第二方法有权限访问的权限范围,然后在该权限范围内获取第二上下文信息。在获取第二上下文信息时,也可以执行文件外上下文展开和文件内代码重组等处理,然后根据处理结果获取第二上下文信息。以及,在获取第二上下文信息后,还可以根据一些策略对第二上下文信息进行更新,如移除第二上下文信息中的目标信息,将第二上下文信息抽象为接口声明形式,基于第二上下文信息中多个内容与第二方法的相关性,对多个内容进行排序等。并且,以上获取第二上下文信息的相关过程的实现过程, 均可以相应参考前述获取第一上下文信息的相关描述。另外,若预先生成了用于训练的编程项目的逻辑结构图,可以通过对该逻辑结构图进行分析,获取第二可执行代码的第二上下文信息。其实现过程也可以相应参考前面基于逻辑结构图获取第一上下文信息的实现过程,此处不再赘述。
步骤2203、将第二上下文信息作为待训练模型的输入,将第二可执行代码作为待训练模型的期望输出,对待训练模型进行训练,得到预训练模型。
在获取第二上下文信息和第二可执行代码后,就可以将第二上下文信息作为待训练模型的输入,将第二可执行代码作为待训练模型的期望输出,对待训练模型进行训练,得到预训练模型。其中,同一第二方法的第二上下文信息和第二可执行代码可以形成为一条训练数据。可选地,当第二方法有方法注释时,待训练模型的输入还可以包括方法注释。类似地,当第二方法有方法签名时,待训练模型的输入还可以包括方法签名。其中,方法签名用于指示方法的使用方式。
在一种实现方式中,本申请实施例使用的预训练模型可以为生成式预训练解码器模型。为了适配生成式预训练解码器模型的输入格式要求,需要先将第二上下文信息处理为序列形式,然后使用经过处理的第二上下文信息用于训练,以便于兼顾语言模型的特点,尽量符合编程语言本身的语法规则。在一种实现方式中,在获取第二上下文信息后,可以按照上下文信息的来源,按项目级上下文、文件级上下文、类级上下文、方法注释、方法代码片段的顺序对不同类型的上下文信息进行拼接,并使用占位符(如<context>、<comment>、<java>等)对不同类型的上下文信息进行标注,以便于在训练过程中针对不同类型的信息制定不同的损失更新策略。可选的,占位符可以位于对应类型的上下文信息的起始位置处。使用占位符标注的上下文信息如图24所示。
并且,将训练数据用于训练之前,还可以将训练数据转换为可运算张量(tensor)(如词向量)格式。其实现过程包括:先使用分词器(如Tokenizer)和词表(如Vocab)将训练数据中的每个词(Token)转换成其在词表中对应的索引,形成训练数据中每个词在词表中对应索引的序列,并通过索引得到其在词嵌入模型中对应词的向量表示(word embedding),以此类推得到多个训练数据的向量表示组成的词向量矩阵。该词向量矩阵用于输入待训练模型对待训练模型进行训练。
在训练过程中,本申请实施例可以采用预训练-精调(pretrain-finetune)范式进行模型训练和优化。其中,在预训练阶段,待训练模型可以通过在大量语料上的无监督训练过程中对语言的语法和模式进行学习。在精调阶段,待训练模型需要根据下游任务目标针对性地处理数据,通过监督学习进行针对性优化。
并且,在本申请实施例提供的训练方法中,可以采用因果语言建模(causal language model,CLM)的方式,通过根据已有词预测下一个词(next token prediction)这一任务对模型进行训练,但只计算待预测的目标代码部分(对应图24中<java>后的部分)的损失值,然后利用此部分的损失对模型权重进行更新,从而针对性地优化模型在已知项目级上下文信息和当前方法功能描述的前提下对代码实现部分的预测能力。
需要说明的是,在获取第二上下文信息后,若计算设备也接收到了预览指示,且预览指示需要预览第二上下文信息,则计算设备可以根据预览指示显示第二上下文信息,并在接收到该第二上下文信息的同意指示后,将该第二上下文信息用于对待训练模型的输入。
由上可知,在该训练方法中,训练数据从编程项目中获取,使得训练数据为编程项目级的数据,其能够考虑代码和编程项目的逻辑结构,使得训练过程能够更多地利用实际编程中人类开发者所需的背景知识,当将经过该训练方法训练得到的预训练模型用于生成代码时,能够提高预训练模型生成代码的能力,有助于提高预训练模型生成代码的实际使用体验。并且,由于该训练方法将模型的训练任务和推理任务对齐,能够提升经过训练的模型对上下文的感知和利用能力,使用训练得到的预训练模型生成代码能够较大程度地利用模型的性能,能够进一步提高生成代码的能力,改善代码生成技术的实际使用体验。
另外,根据上述对待训练模型的训练过程可以看出,该训练方法定位于一种模型间通用的训练数据处理方案和格式,因此,该训练方法既可以应用于从随机初始化开始预训练代码生成模型,也可以应用于对多个现有的代码生成模型(如微软的Codex模型、Salesforce的CodeGen模型、Meta的InCoder模型等)的针对性调优训练。并且,根据优化目标的不同,本申请实施例提供的训练方法可以适用于不同阶段的训练,例如,直接预训练(pretrain)、多阶段预训练(multi-stage pretrain)和精调(finetune)。在一种实现方式中,直接预训练是直接从随机初始化的模型开始进行训练。多阶段预训练是基于某一个已经过预训练的模型,在更换数据和修改超参数后继续对该模型进行训练。精调是基于某一个预训练模型,通过固定某些神经网络层或增加关于提示(prompt)的参数,在某一数据集或下游任务上进行针对性训练。
在本申请实施例中,获取第一上下文信息的操作可以通过解析器、集成开发环境或模型实现。并且,当采用模型获取第一上下文信息时,获取第一上下文信息的操作和生成第一可执行代码的操作,可以由同一个模型实现,或者,可以分别由两个模型实现。下面以由第一AI模型从编程项目中获取第一上下文信息,由第二AI模型基于第一上下文信息生成第一可执行代码为例,对本申请实施例的实现过程进行举例说明。如图25所示,该实现过程包括训练阶段和推理阶段。训练阶段用于对第二AI模型进行训练。推理阶段用于使用第一AI模型从编程项目中获取第一上下文信息,然后使用第二AI模型基于第一上下文信息生成第一可执行代码。其中,训练阶段中从已成功编译的编程项目中获取第二上下文信息的操作,可以由第一AI模型执行,也可以由与该第一AI模型具有相似功能的AI模型实现。在一种实现方式中,第一AI模型可以是Code BERT模型等已有模型,也可以是自训练的模型,且训练方式可以为自编码的训练方式。第二AI模型可以是生成式预训练解码器模型等。下面以采用第一AI模型实现为例进行说明。
如图25所示,在训练阶段中,计算设备可以从软件代码仓中获取已成功编译的编程项目,获取已成功编译的编程项目中的源代码文件数据集,并在该过程中对编程项目的内容执行抽取、筛选和去重等预处理。然后,将源代码文件数据集输入第一AI模型,使得第一AI模型执行本申请实施例提供的训练方法中的步骤2201和步骤2202,以得到用于对待训练的第二AI模型进行训练的训练数据。然后,计算设备执行上述步骤2203,将训练数据输入待训练的第二AI模型,在训练结束后保存模型,从而得到经过训练的第二AI模型。
如图25所示,在推理阶段,当计算设备接收到来自于当前编程项目的代码生成请求后,一方面可以将当前编程项目的相关文件输入第一AI模型,使得第一AI模型执行本申请实施例提供的基于云服务的代码生成方法中的步骤405至步骤410,使用第一AI模型从当前编程项目的相关文件中获取第一上下文信息。另一方面计算设备接收到第一AI模型输出的第一上下文信息后,将该第一上下文信息与代码生成请求的任务描述进行拼接,并将拼接后的第一上下文信息和任务描述输入第二AI模型,使得第二AI模型执行本申请实施例提供的基于云服务的代码生成方法中的步骤413,以生成用于实现第一方法的第一可执行代码。然后计算设备接收到第二AI模型输出的第一可执行代码后,向开发人员推荐第一可执行代码。
以上介绍了本申请实施例的基于云服务的代码生成方法,与上述方法对应,本申请实施例还提供了基于云服务的代码生成装置。图26是本申请实施例提供的一种基于云服务的代码生成装置的结构示意图。基于图26所示的如下多个模块,该图26所示的基于云服务的代码生成装置能够执行上述图4所示的全部或部分操作。应理解到,该装置可以包括比所示模块更多的附加模块或者省略其中所示的一部分模块,本申请实施例对此并不进行限制。可选的,该基于云服务的代码生成装置可配置于云平台。如图26所示,该基于云服务的代码生成装置260包括:
接收模块2601,用于接收代码生成请求,代码生成请求用于请求生成编程项目中实现第一方法的第一可执行代码。
第一获取模块2602,用于基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息。
生成模块2603,用于基于第一上下文信息和代码生成请求,生成第一可执行代码。
可选的,接收模块2601,还用于接收范围指示,范围指示用于指示第一上下文信息的获取范围。
相应的,第一获取模块2602,具体用于:当范围指示用于指示第一上下文信息的获取范围为编程项目时,基于代码生成请求,从编程项目的信息中,获取第一上下文信息。
可选的,接收模块2601,还用于接收预览指示,预览指示用于指示是否预览第一上下文信息。
相应的,如图27所示,该基于云服务的代码生成装置260还包括:显示模块2604,用于当预览指示用于指示预览第一上下文信息时,显示第一上下文信息。
接收模块2601,还用于接收指示同意第一上下文信息的同意指示。
生成模块2603,具体用于:在接收到同意指示后,基于第一上下文信息和代码生成请求,生成第一可执行代码。
可选的,第一获取模块2602,具体用于:基于代码生成请求,在编程项目的文件中,获取第一方法所属的源代码文件;在编程项目的信息中,获取源代码文件使用到的位于源代码文件外的外部信息;在源代 码文件中补充外部信息;基于经过补充的源代码文件,获取第一上下文信息。
可选的,第一获取模块2602,具体用于:在编程项目的信息中,获取第一方法有权限访问的权限范围;在权限范围中,获取外部信息。
可选的,第一获取模块2602,具体用于:基于第一方法在编程项目中的位置、第一方法所属的目标类的访问控制权限、及目标类的层次和引用关系中的至少一个,获取权限范围。
可选的,第一获取模块2602,还用于:在第一上下文信息中移除目标信息,得到更新后的第一上下文信息,目标信息包括以下一种或多种:代码注释、变量赋值、方法体和指示代码底层逻辑的信息。
可选的,接收模块2601,还用于接收移除指示,移除指示用于指示是否移除第一上下文信息中的目标信息。
相应的,第一获取模块2602,具体用于:当移除指示用于指示移除第一上下文信息中的目标信息时,在第一上下文信息中移除目标信息,得到更新后的第一上下文信息。
可选的,第一获取模块2602,还用于:将第一上下文信息抽象为接口声明形式,得到更新后的第一上下文信息。
可选的,第一获取模块2602,具体用于:基于代码生成请求,在编程项目的文件中,获取第一方法所属的源代码文件。在源代码文件中,获取撰写位置位于第一方法后的下文信息。将下文信息的撰写位置调整至第一方法之前,使得调整位置后的下文信息成为第一方法的上文信息。基于第一方法的上文信息,获取第一上下文信息。
可选的,第一获取模块2602,还用于:获取第一上下文信息中多个内容与第一方法的相关性;基于多个内容对应的相关性,对多个内容进行排序,得到更新后的第一上下文信息。
可选的,第一获取模块2602,具体用于:将多个内容排列在第一方法的任务描述之前,且任一内容到任务描述的距离与内容对应的相关性反相关。
可选的,获取第一上下文信息中多个内容与第一方法的相关性,包括以下一种或多种的组合:获取每个内容的标识符与第一方法的相关信息的第一相似度,第一相似度与相关性正相关;获取每个内容与第一方法在编程项目中层次的距离,距离与相关性反相关;获取每个内容与第一方法的关联内容调用的上下文信息的第二相似度,第二相似度与相关性正相关,第一方法的关联内容包括以下一种或多种:第一方法所属的目标类的关联类,目标类中其它方法。
可选的,标识符包括以下一种或多种:变量名、方法名、包名、类名和常量名。
可选的,相关信息包括以下一种或多种:方法描述、方法名、返回类型和参数类型。
可选的,第一获取模块2602,还用于:生成编程项目的逻辑结构图,逻辑结构图用于指示编程项目中各项内容的关联关系。
相应的,第一获取模块2602,具体用于:基于代码生成请求对逻辑结构图进行分析,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息。
可选的,生成模块2603,具体用于:将第一上下文信息和代码生成请求输入预训练模型,得到预训练模型输出的第一可执行代码。
可选的,如图27所示,该基于云服务的代码生成装置260还包括:
第二获取模块2605,用于获取已成功编译的编程项目中实现第二方法的第二可执行代码。
第三获取模块2606,用于获取第二可执行代码的第二上下文信息。
训练模块2607,用于将第二上下文信息作为待训练模型的输入,将第二可执行代码作为待训练模型的期望输出,对待训练模型进行训练,得到预训练模型。
可选的,待训练模型的输入还包括以下一种或多种:第二方法的方法注释和方法签名。
可选的,第二获取模块2605,具体用于:获取已成功编译的编程项目中的所有第二方法;对所有第二方法进行筛选,得到通过筛选的第二方法,通过筛选的第二方法用于表达运算逻辑;获取实现通过筛选的第二方法的第二可执行代码。
可选的,未通过筛选的第二方法具有以下一种或多种特点:第二方法的方法体为空,第二方法具有特殊用途,第二方法的方法体不包括运算表达式。
可选的,特殊用途包括以下一种或多种:获取、设置、构造和返回。
可选的,上下文信息包括以下一种或多种:已定义的类、变量和方法的功能、访问权限以及调用方式。
在本申请实施例提供的基于云服务的代码生成装置中,在接收代码生成请求后,能够基于代码生成请求,从编程项目的信息中,获取生成第一可执行代码所需的第一上下文信息;然后,基于第一上下文信息和代码生成请求,生成第一可执行代码。在该基于云服务的代码生成方法中,由于需要先从编程项目的信息中获取生成第一可执行代码所需的第一上下文信息,该第一上下文信息包括来源于整个编程项目的项目级上下文信息,其能够反映代码和编程项目的逻辑结构,使得代码生成过程能够更多地利用实际编程中人类开发者所需的背景知识和编程项目的整体逻辑,因此提高了生成代码的能力,有助于改善代码生成技术的实际使用体验。
其中,接收模块2601、第一获取模块2602、生成模块2603、显示模块2604、第二获取模块2605、第三获取模块2606和训练模块2607均可以通过软件实现,或者可以通过硬件实现。示例性地,接下来以接收模块2601为例,介绍接收模块2601的实现方式。类似的,第一获取模块2602、生成模块2603、显示模块2604、第二获取模块2605、第三获取模块2606和训练模块2607的实现方式可以参考接收模块2601的实现方式。
模块作为软件功能单元的一种举例,接收模块2601可以包括运行在计算实例上的代码。其中,计算实例可以包括物理主机(计算设备)、虚拟机、容器中的至少一种。进一步地,上述计算实例可以是一台或者多台。例如,接收模块2601可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的区域(region)中,也可以分布在不同的region中。进一步地,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone,AZ)中,也可以分布在不同的AZ中,每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中,通常一个region可以包括多个AZ。
同样,用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个VPC设置在一个region内,同一region内两个VPC之间,以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关,经通信网关实现VPC之间的互连。
模块作为硬件功能单元的一种举例,接收模块2601可以包括至少一个计算设备,如服务器等。或者,接收模块2601也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。
接收模块2601包括的多个计算设备可以分布在相同的region中,也可以分布在不同的region中。接收模块2601包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的AZ中。同样,接收模块2601包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,所述多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。
需要说明的是,在其他实施例中,接收模块2601、第一获取模块2602、生成模块2603、显示模块2604、第二获取模块2605、第三获取模块2606和训练模块2607中任一模块可以用于执行基于云服务的代码生成方法中的任意步骤。接收模块2601、第一获取模块2602、生成模块2603、显示模块2604、第二获取模块2605、第三获取模块2606和训练模块2607负责实现的步骤可根据需要指定,通过接收模块2601、第一获取模块2602、生成模块2603、显示模块2604、第二获取模块2605、第三获取模块2606和训练模块2607分别实现基于云服务的代码生成方法中不同的步骤来实现基于云服务的代码生成装置的全部功能。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和模块的具体工作过程,可以参考前述方法实施例中的对应内容,在此不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。提供程序开发平台的计算机程序产品包括一个或多个计算机指令,在计算设备上加载和执行这些计算机程序指令时,全部或部分地实现本申请实施例提供的基于云服务的代码生成方法的功能。
并且,计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例 如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质存储有提供程序开发平台的计算机程序指令。
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。
可选地,计算设备集群包括的至少一个计算设备的结构可参见图1示出的计算设备100。计算设备集群中的一个或多个计算设备100中的存储器102中可以存有相同的用于执行基于云服务的代码生成方法的指令。
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备100的存储器102中也可以分别存有用于执行基于云服务的代码生成方法的部分指令。换言之,一个或多个计算设备100的组合可以共同执行用于执行基于云服务的代码生成方法的指令。
需要说明的是,计算设备集群中的不同的计算设备100中的存储器102可以存储不同的指令,分别用于执行基于云服务的代码生成装置的部分功能。也即,不同的计算设备100中的存储器102存储的指令可以实现接收模块2601、第一获取模块2602、生成模块2603、显示模块2604、第二获取模块2605、第三获取模块2606和训练模块2607中的一个或多个模块的功能。
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。图28示出了一种可能的实现方式。如图28所示,两个计算设备2800A和2800B之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,计算设备2800A和2800B包括总线2802、处理器2804、存储器2806和通信接口2808。计算设备2800A中的存储器2806中存有执行接收模块2601、第一获取模块2602、生成模块2603和显示模块2604的功能的指令。同时,计算设备2800B中的存储器2806中存有执行第二获取模块2605、第三获取模块2606和训练模块2607的功能的指令。
应理解,图28中示出的计算设备2800A的功能也可以由多个计算设备2800完成。同样,计算设备2800B的功能也可以由多个计算设备2800完成。且用于实现基于云服务的代码生成方法的模块在计算设备中的部署方式也可以根据应用需求进行调整。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质为非易失性计算机可读存储介质。计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,该指令指示计算设备执行本申请实施例提供的基于云服务的代码生成方法。
本申请实施例还提供了一种包含指令的计算机程序产品。计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当计算机程序产品在至少一个计算设备上运行时,使得计算机实现本申请实施例提供的基于云服务的代码生成方法。
本申请实施例还提供了一种芯片,包括处理器,用于从存储器中调用并运行存储器中存储的指令,使得安装有芯片的计算设备执行本申请实施例提供的基于云服务的代码生成方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请提供的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以 是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk)等。
为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各实施例的步骤及组成。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
用于实现本申请实施例的方法的计算机程序代码可以用一种或多种编程语言编写。这些计算机程序代码可以提供给通用计算机、专用计算机或其他可编程的规则查找装置的处理器,使得程序代码在被计算机或其他可编程的规则查找装置执行的时候,引起在流程图和/或框图中规定的功能/操作被实施。程序代码可以完全在计算机上、部分在计算机上、作为独立的软件包、部分在计算机上且部分在远程计算机上或完全在远程计算机或服务器上执行。
在本申请实施例的上下文中,计算机程序代码或者相关数据可以由任意适当载体承载,以使得设备、装置或者处理器能够执行上文描述的各种处理和操作。载体的示例包括信号、计算机可读介质等等。信号的示例可以包括电、光、无线电、声音或其它形式的传播信号,诸如载波、红外信号等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、设备和模块的具体工作过程,可以参见前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,该模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、设备或模块的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
该作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以是两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如,在不脱离各种示例的范围的情况下,第一链路可以被称为第二链路,并且类似地,第二链路可以被称为第一链路。
还应理解,在本申请的各个实施例中,各个过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本申请中术语“至少一个”的含义是指一个或多个,本申请中术语“多个”的含义是指两个或两个以上,例如,多个第二报文是指两个或两个以上的第二报文。本文中术语“系统”和“网络”经常可互换使用。
应理解,在本文中对各种示例的描述中所使用的术语只是为了描述特定示例,而并非旨在进行限制。如在对各种示例的描述和所附权利要求书中所使用的那样,单数形式“一个(“a”,“an”)”和“该”旨在也包括复数形式,除非上下文另外明确地指示。
还应理解,术语“包括”(也称“includes”、“including”、“comprises”和/或“comprising”)当在本说明书中使用时指定存在所陈述的特征、整数、步骤、操作、元素、和/或部件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元素、部件、和/或其分组。
还应理解,根据上下文,短语“若确定...”或“若检测到[所陈述的条件或事件]”可被解释为意指“在确定...时”或“响应于确定...”或“在检测到[所陈述的条件或事件]时”或“响应于检测到[所陈述的条件或事件]”。
应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。
还应理解,说明书通篇中提到的“一个实施例”、“一实施例”、“一种可能的实现方式”意味着与实施例或实现方式有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”、“一种可能的实现方式”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。
需要说明的是,本申请所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号,均为经用户授权或者经过各方充分授权的,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。例如,本申请中涉及到的信息和指令等都是在充分授权的情况下获取的。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的保护范围。

Claims (49)

  1. 一种基于云服务的代码生成方法,其特征在于,由云平台执行,所述方法包括:
    接收代码生成请求,所述代码生成请求用于请求生成编程项目中实现第一方法的第一可执行代码;
    基于所述代码生成请求,从所述编程项目的信息中,获取生成所述第一可执行代码所需的第一上下文信息;
    基于所述第一上下文信息和所述代码生成请求,生成所述第一可执行代码。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    接收范围指示,所述范围指示用于指示所述第一上下文信息的获取范围;
    所述基于所述代码生成请求,从所述编程项目的信息中,获取生成所述第一可执行代码所需的第一上下文信息,包括:
    当所述范围指示用于指示所述第一上下文信息的获取范围为所述编程项目时,基于所述代码生成请求,从所述编程项目的信息中,获取所述第一上下文信息。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    接收预览指示,所述预览指示用于指示是否预览所述第一上下文信息;
    当所述预览指示用于指示预览所述第一上下文信息时,显示所述第一上下文信息;
    接收指示同意所述第一上下文信息的同意指示;
    所述基于所述第一上下文信息和所述代码生成请求,生成所述第一可执行代码,包括:
    在接收到所述同意指示后,基于所述第一上下文信息和所述代码生成请求,生成所述第一可执行代码。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述基于所述代码生成请求,从所述编程项目的信息中,获取生成所述第一可执行代码所需的第一上下文信息,包括:
    基于所述代码生成请求,在所述编程项目的文件中,获取所述第一方法所属的源代码文件;
    在所述编程项目的信息中,获取所述源代码文件使用到的位于所述源代码文件外的外部信息;
    在所述源代码文件中补充所述外部信息;
    基于经过补充的源代码文件,获取所述第一上下文信息。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    在所述编程项目的信息中,获取所述第一方法有权限访问的权限范围;
    所述在所述编程项目的信息中,获取所述源代码文件使用到的位于所述源代码文件外的外部信息,包括:
    在所述权限范围中,获取所述外部信息。
  6. 根据权利要求5所述的方法,其特征在于,所述在所述编程项目的信息中,获取所述第一方法有权限访问的权限范围,包括:
    基于所述第一方法在所述编程项目中的位置、所述第一方法所属的目标类的访问控制权限、及所述目标类的层次和引用关系中的至少一个,获取所述权限范围。
  7. 根据权利要求1至6任一所述的方法,其特征在于,在所述基于所述第一上下文信息和所述代码生成请求,生成所述第一可执行代码之前,所述方法还包括:
    在所述第一上下文信息中移除目标信息,得到更新后的第一上下文信息,所述目标信息包括以下一种或多种:代码注释、变量赋值、方法体和指示代码底层逻辑的信息。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    接收移除指示,所述移除指示用于指示是否移除所述第一上下文信息中的所述目标信息;
    所述在所述第一上下文信息中移除目标信息,得到更新后的第一上下文信息,包括:
    当所述移除指示用于指示移除所述第一上下文信息中的所述目标信息时,在所述第一上下文信息中移除所述目标信息,得到更新后的第一上下文信息。
  9. 根据权利要求1至8任一所述的方法,其特征在于,在所述基于所述第一上下文信息和所述代码生成请求,生成所述第一可执行代码之前,所述方法还包括:
    将所述第一上下文信息抽象为接口声明形式,得到更新后的第一上下文信息。
  10. 根据权利要求1至9任一所述的方法,其特征在于,所述基于所述代码生成请求,从所述编程项目的信息中,获取生成所述第一可执行代码所需的第一上下文信息,包括:
    基于所述代码生成请求,在所述编程项目的文件中,获取所述第一方法所属的源代码文件;
    在源代码文件中,获取撰写位置位于所述第一方法后的下文信息;
    将所述下文信息的撰写位置调整至所述第一方法之前,使得调整位置后的下文信息成为所述第一方法的上文信息;
    基于所述第一方法的上文信息,获取所述第一上下文信息。
  11. 根据权利要求1至10任一所述的方法,其特征在于,在所述基于所述第一上下文信息和所述代码生成请求,生成所述第一可执行代码之前,所述方法还包括:
    获取所述第一上下文信息中多个内容与所述第一方法的相关性;
    基于所述多个内容对应的相关性,对所述多个内容进行排序,得到更新后的第一上下文信息。
  12. 根据权利要求11所述的方法,其特征在于,所述基于所述多个内容对应的相关性,对所述多个内容进行排序,包括:
    将所述多个内容排列在所述第一方法的任务描述之前,且任一内容到所述任务描述的距离与所述内容对应的相关性反相关。
  13. 根据权利要求11或12所述的方法,其特征在于,所述获取所述第一上下文信息中多个内容与所述第一方法的相关性,包括以下一种或多种的组合:
    获取每个内容的标识符与所述第一方法的相关信息的第一相似度,所述第一相似度与所述相关性正相关;
    获取每个内容与所述第一方法在所述编程项目中层次的距离,所述距离与所述相关性反相关;
    获取每个内容与所述第一方法的关联内容调用的上下文信息的第二相似度,所述第二相似度与所述相关性正相关,所述第一方法的关联内容包括以下一种或多种:所述第一方法所属的目标类的关联类,所述目标类中其它方法。
  14. 根据权利要求13所述的方法,其特征在于,所述标识符包括以下一种或多种:变量名、方法名、包名、类名和常量名。
  15. 根据权利要求13所述的方法,其特征在于,所述相关信息包括以下一种或多种:方法描述、方法名、返回类型和参数类型。
  16. 根据权利要求1至15任一所述的方法,其特征在于,在所述基于所述代码生成请求,从所述编程项目的信息中,获取生成所述第一可执行代码所需的第一上下文信息之前,所述方法还包括:
    生成所述编程项目的逻辑结构图,所述逻辑结构图用于指示所述编程项目中各项内容的关联关系;
    所述基于所述代码生成请求,从所述编程项目的信息中,获取生成所述第一可执行代码所需的第一上下文信息,包括:
    基于所述代码生成请求对所述逻辑结构图进行分析,从所述编程项目的信息中,获取生成所述第一可执行代码所需的第一上下文信息。
  17. 根据权利要求1至16任一所述的方法,其特征在于,所述基于所述第一上下文信息和所述代码生成请求,生成所述第一可执行代码,包括:
    将所述第一上下文信息和所述代码生成请求输入预训练模型,得到所述预训练模型输出的所述第一可执行代码。
  18. 根据权利要求17所述的方法,其特征在于,在所述基于所述第一上下文信息和所述代码生成请求,生成所述第一可执行代码之前,所述方法还包括:
    获取已成功编译的编程项目中实现第二方法的第二可执行代码;
    获取所述第二可执行代码的第二上下文信息;
    将所述第二上下文信息作为待训练模型的输入,将所述第二可执行代码作为所述待训练模型的期望输出,对所述待训练模型进行训练,得到所述预训练模型。
  19. 根据权利要求18所述的方法,其特征在于,所述待训练模型的输入还包括以下一种或多种:所述第二方法的方法注释和方法签名。
  20. 根据权利要求18所述的方法,其特征在于,所述获取已成功编译的编程项目中实现第二方法的第二可执行代码,包括:
    获取已成功编译的编程项目中的所有第二方法;
    对所述所有第二方法进行筛选,得到通过筛选的第二方法,通过筛选的第二方法用于表达运算逻辑;
    获取实现通过筛选的第二方法的第二可执行代码。
  21. 根据权利要求20所述的方法,其特征在于,未通过筛选的第二方法具有以下一种或多种特点:所述第二方法的方法体为空,所述第二方法具有特殊用途,所述第二方法的方法体不包括运算表达式。
  22. 根据权利要求21所述的方法,其特征在于,所述特殊用途包括以下一种或多种:获取、设置、构造和返回。
  23. 根据权利要求1至22任一所述的方法,其特征在于,所述上下文信息包括以下一种或多种:已定义的类、变量和方法的功能、访问权限以及调用方式。
  24. 一种基于云服务的代码生成装置,其特征在于,配置于云平台,所述装置包括:
    接收模块,用于接收代码生成请求,所述代码生成请求用于请求生成编程项目中实现第一方法的第一可执行代码;
    第一获取模块,用于基于所述代码生成请求,从所述编程项目的信息中,获取生成所述第一可执行代码所需的第一上下文信息;
    生成模块,用于基于所述第一上下文信息和所述代码生成请求,生成所述第一可执行代码。
  25. 根据权利要求24所述的装置,其特征在于,
    所述接收模块,还用于接收范围指示,所述范围指示用于指示所述第一上下文信息的获取范围;
    所述第一获取模块,具体用于:当所述范围指示用于指示所述第一上下文信息的获取范围为所述编程项目时,基于所述代码生成请求,从所述编程项目的信息中,获取所述第一上下文信息。
  26. 根据权利要求24或25所述的装置,其特征在于,
    所述接收模块,还用于接收预览指示,所述预览指示用于指示是否预览所述第一上下文信息;
    所述装置还包括:显示模块,用于当所述预览指示用于指示预览所述第一上下文信息时,显示所述第一上下文信息;
    所述接收模块,还用于接收指示同意所述第一上下文信息的同意指示;
    所述生成模块,具体用于:在接收到所述同意指示后,基于所述第一上下文信息和所述代码生成请求,生成所述第一可执行代码。
  27. 根据权利要求24至26任一所述的装置,其特征在于,所述第一获取模块,具体用于:
    基于所述代码生成请求,在所述编程项目的文件中,获取所述第一方法所属的源代码文件;
    在所述编程项目的信息中,获取所述源代码文件使用到的位于所述源代码文件外的外部信息;
    在所述源代码文件中补充所述外部信息;
    基于经过补充的源代码文件,获取所述第一上下文信息。
  28. 根据权利要求27所述的装置,其特征在于,所述第一获取模块,具体用于:
    在所述编程项目的信息中,获取所述第一方法有权限访问的权限范围;
    在所述权限范围中,获取所述外部信息。
  29. 根据权利要求28所述的装置,其特征在于,所述第一获取模块,具体用于:
    基于所述第一方法在所述编程项目中的位置、所述第一方法所属的目标类的访问控制权限、及所述目标类的层次和引用关系中的至少一个,获取所述权限范围。
  30. 根据权利要求24至29任一所述的装置,其特征在于,
    所述第一获取模块,还用于:在所述第一上下文信息中移除目标信息,得到更新后的第一上下文信息,所述目标信息包括以下一种或多种:代码注释、变量赋值、方法体和指示代码底层逻辑的信息。
  31. 根据权利要求30所述的装置,其特征在于,
    所述接收模块,还用于接收移除指示,所述移除指示用于指示是否移除所述第一上下文信息中的所述目标信息;
    所述第一获取模块,具体用于:当所述移除指示用于指示移除所述第一上下文信息中的所述目标信息时,在所述第一上下文信息中移除所述目标信息,得到更新后的第一上下文信息。
  32. 根据权利要求24至31任一所述的装置,其特征在于,
    所述第一获取模块,还用于:将所述第一上下文信息抽象为接口声明形式,得到更新后的第一上下文信息。
  33. 根据权利要求24至32任一所述的装置,其特征在于,所述第一获取模块,具体用于:
    基于所述代码生成请求,在所述编程项目的文件中,获取所述第一方法所属的源代码文件;
    在源代码文件中,获取撰写位置位于所述第一方法后的下文信息;
    将所述下文信息的撰写位置调整至所述第一方法之前,使得调整位置后的下文信息成为所述第一方法的上文信息;
    基于所述第一方法的上文信息,获取所述第一上下文信息。
  34. 根据权利要求24至33任一所述的装置,其特征在于,所述第一获取模块,还用于:
    获取所述第一上下文信息中多个内容与所述第一方法的相关性;
    基于所述多个内容对应的相关性,对所述多个内容进行排序,得到更新后的第一上下文信息。
  35. 根据权利要求34所述的装置,其特征在于,
    所述第一获取模块,具体用于:将所述多个内容排列在所述第一方法的任务描述之前,且任一内容到 所述任务描述的距离与所述内容对应的相关性反相关。
  36. 根据权利要求34或35所述的装置,其特征在于,所述获取所述第一上下文信息中多个内容与所述第一方法的相关性,包括以下一种或多种的组合:
    获取每个内容的标识符与所述第一方法的相关信息的第一相似度,所述第一相似度与所述相关性正相关;
    获取每个内容与所述第一方法在所述编程项目中层次的距离,所述距离与所述相关性反相关;
    获取每个内容与所述第一方法的关联内容调用的上下文信息的第二相似度,所述第二相似度与所述相关性正相关,所述第一方法的关联内容包括以下一种或多种:所述第一方法所属的目标类的关联类,所述目标类中其它方法。
  37. 根据权利要求36所述的装置,其特征在于,所述标识符包括以下一种或多种:变量名、方法名、包名、类名和常量名。
  38. 根据权利要求36所述的装置,其特征在于,所述相关信息包括以下一种或多种:方法描述、方法名、返回类型和参数类型。
  39. 根据权利要求24至38任一所述的装置,其特征在于,
    所述第一获取模块,还用于:生成所述编程项目的逻辑结构图,所述逻辑结构图用于指示所述编程项目中各项内容的关联关系;
    所述第一获取模块,具体用于:基于所述代码生成请求对所述逻辑结构图进行分析,从所述编程项目的信息中,获取生成所述第一可执行代码所需的第一上下文信息。
  40. 根据权利要求24至39任一所述的装置,其特征在于,
    所述生成模块,具体用于:将所述第一上下文信息和所述代码生成请求输入预训练模型,得到所述预训练模型输出的所述第一可执行代码。
  41. 根据权利要求40所述的装置,其特征在于,所述装置还包括:
    第二获取模块,用于获取已成功编译的编程项目中实现第二方法的第二可执行代码;
    第三获取模块,用于获取所述第二可执行代码的第二上下文信息;
    训练模块,用于将所述第二上下文信息作为待训练模型的输入,将所述第二可执行代码作为所述待训练模型的期望输出,对所述待训练模型进行训练,得到所述预训练模型。
  42. 根据权利要求41所述的装置,其特征在于,所述待训练模型的输入还包括以下一种或多种:所述第二方法的方法注释和方法签名。
  43. 根据权利要求41所述的装置,其特征在于,所述第二获取模块,具体用于:
    获取已成功编译的编程项目中的所有第二方法;
    对所述所有第二方法进行筛选,得到通过筛选的第二方法,通过筛选的第二方法用于表达运算逻辑;
    获取实现通过筛选的第二方法的第二可执行代码。
  44. 根据权利要求43所述的装置,其特征在于,未通过筛选的第二方法具有以下一种或多种特点:所述第二方法的方法体为空,所述第二方法具有特殊用途,所述第二方法的方法体不包括运算表达式。
  45. 根据权利要求44所述的装置,其特征在于,所述特殊用途包括以下一种或多种:获取、设置、构造和返回。
  46. 根据权利要求24至45任一所述的装置,其特征在于,所述上下文信息包括以下一种或多种:已定义的类、变量和方法的功能、访问权限以及调用方式。
  47. 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器,所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1至23任一项所述的方法。
  48. 一种包含指令的计算机程序产品,其特征在于,当所述指令被计算设备集群运行时,使得所述计算设备集群执行如权利要求1至23任一项所述的方法。
  49. 一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行如权利要求1至23任一项所述的方法。
PCT/CN2023/103999 2022-11-15 2023-06-29 基于云服务的代码生成方法及装置 WO2024103764A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202211430214 2022-11-15
CN202211430214.3 2022-11-15
CN202211600744.8 2022-12-13
CN202211600744.8A CN118092923A (zh) 2022-11-15 2022-12-13 基于云服务的代码生成方法及装置

Publications (1)

Publication Number Publication Date
WO2024103764A1 true WO2024103764A1 (zh) 2024-05-23

Family

ID=91083707

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/103999 WO2024103764A1 (zh) 2022-11-15 2023-06-29 基于云服务的代码生成方法及装置

Country Status (1)

Country Link
WO (1) WO2024103764A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160077831A1 (en) * 2014-09-11 2016-03-17 Microsoft Corporation Accurate and performant code design using memoization
CN106919434A (zh) * 2017-03-22 2017-07-04 恒生电子股份有限公司 一种代码生成方法及装置
CN113805882A (zh) * 2021-09-18 2021-12-17 上海波顿诺华智能科技有限公司 应用程序开发的方法、装置、电子设备及存储介质
WO2022089188A1 (zh) * 2020-11-02 2022-05-05 华为云计算技术有限公司 一种代码处理方法、装置、设备及介质
CN114461223A (zh) * 2022-02-16 2022-05-10 深圳壹账通智能科技有限公司 一种代码生成方法、装置及终端设备
CN115145574A (zh) * 2022-05-17 2022-10-04 拉扎斯网络科技(上海)有限公司 一种代码生成方法、装置、存储介质及服务器

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160077831A1 (en) * 2014-09-11 2016-03-17 Microsoft Corporation Accurate and performant code design using memoization
CN106919434A (zh) * 2017-03-22 2017-07-04 恒生电子股份有限公司 一种代码生成方法及装置
WO2022089188A1 (zh) * 2020-11-02 2022-05-05 华为云计算技术有限公司 一种代码处理方法、装置、设备及介质
CN113805882A (zh) * 2021-09-18 2021-12-17 上海波顿诺华智能科技有限公司 应用程序开发的方法、装置、电子设备及存储介质
CN114461223A (zh) * 2022-02-16 2022-05-10 深圳壹账通智能科技有限公司 一种代码生成方法、装置及终端设备
CN115145574A (zh) * 2022-05-17 2022-10-04 拉扎斯网络科技(上海)有限公司 一种代码生成方法、装置、存储介质及服务器

Similar Documents

Publication Publication Date Title
US11334692B2 (en) Extracting a knowledge graph from program source code
US10169471B2 (en) Generating and executing query language statements from natural language
US11687579B2 (en) Dictionary editing system integrated with text mining
US20170068409A1 (en) Computer implemented system and method for dynamically modeling relationships between entities
US11347891B2 (en) Detecting and obfuscating sensitive data in unstructured text
Erraissi et al. Data sources and ingestion big data layers: meta-modeling of key concepts and features
US11613008B2 (en) Automating a process using robotic process automation code
US10083031B2 (en) Cognitive feature analytics
US9940355B2 (en) Providing answers to questions having both rankable and probabilistic components
JP7358003B2 (ja) 複数のクエリ解釈に基づくファセットベースのクエリ絞り込み
US20200349179A1 (en) Dynamic faceted search on a document corpus
WO2022012327A1 (zh) 代码分析的方法、系统及计算设备
WO2023160327A1 (en) Container image management
JP2023057543A (ja) アプリケーションマイクロサービスのための異種グラフ作成のための方法、装置、コンピュータプログラム製品(アプリケーションマイクロサービスのための異種グラフ作成)
US20220374212A1 (en) Indexing and accessing source code snippets contained in documents
WO2024103764A1 (zh) 基于云服务的代码生成方法及装置
US11275796B2 (en) Dynamic faceted search on a document corpus
WO2023103814A1 (en) Extracting query-related temporal information from unstructured text documents
US11893033B2 (en) Automated analysis of unstructured computer text for generation of an enforcement action database using machine learning
Nguyen et al. Pydhnet: a python library for dynamic heterogeneous network representation learning and evaluation
US11074407B2 (en) Cognitive analysis and dictionary management
CN118092923A (zh) 基于云服务的代码生成方法及装置
Wang et al. Pre-implementation Method Name Prediction for Object-oriented Programming
US10789067B2 (en) System and method for identifying open source usage
Jain et al. Pact: Detecting and classifying privacy behavior of android applications