WO2022089188A1 - 一种代码处理方法、装置、设备及介质 - Google Patents

一种代码处理方法、装置、设备及介质 Download PDF

Info

Publication number
WO2022089188A1
WO2022089188A1 PCT/CN2021/123127 CN2021123127W WO2022089188A1 WO 2022089188 A1 WO2022089188 A1 WO 2022089188A1 CN 2021123127 W CN2021123127 W CN 2021123127W WO 2022089188 A1 WO2022089188 A1 WO 2022089188A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
user
candidate
completed
context
Prior art date
Application number
PCT/CN2021/123127
Other languages
English (en)
French (fr)
Inventor
王亚伟
帕维尔彼得罗琴科
德米特里卡彭科
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Priority to EP21884917.2A priority Critical patent/EP4220381A4/en
Priority to CN202180074635.4A priority patent/CN116406459A/zh
Publication of WO2022089188A1 publication Critical patent/WO2022089188A1/zh
Priority to US18/310,749 priority patent/US20230273776A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/36Software reuse
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/33Intelligent editors

Definitions

  • the present application relates to the technical field of software development, and in particular, to a code processing method, apparatus, device, and computer-readable storage medium.
  • Code completion means that the user enters part of the code, such as inputting a part of a keyword or a function, and the development tool can provide the user with at least one candidate to help the user complete the keyword or function. This can reduce user input operations and improve development efficiency.
  • IDEs integrated development environments
  • AI artificial intelligence
  • the present application provides a code processing method, which improves the prediction accuracy by statically analyzing the code according to the context characteristics of the code to be completed, and predicts the candidate of the code to be completed, thereby realizing Automatic code completion to improve development efficiency.
  • the present application also provides apparatuses, devices, computer-readable storage media, and computer program products corresponding to the above methods.
  • the present application provides a code processing method.
  • the method may be performed by a code processing system.
  • the code processing system is provided with a user interface, such as a graphical user interface (GUI) or a command user interface (CUI).
  • GUI graphical user interface
  • CLI command user interface
  • the code processing system may receive the code input by the user through the user interface, and then determine the contextual feature of the code to be completed according to the code input by the user.
  • the context feature refers to a feature that can express the context in which the code is located, for example, including any one or more of the type of the base class, the class name of the base class, a prefix, a return type, and a Boolean feature. Then, the code processing system determines at least one candidate for the code to be completed from the context database according to the contextual characteristics of the code to be completed.
  • the context database stores sample codes and context features of the sample codes.
  • the sample code may include any one or more identifiers such as class names, method names, function names, variable names, or parameter names in open source datasets or user-private datasets.
  • the code processing system may present the at least one candidate item to the user through a user interface such as a GUI, thereby implementing code completion.
  • the candidates of the code to be completed are obtained by static analysis of the code, such as static syntax analysis and semantic analysis, the candidates of the code to be completed conform to the grammar rules, and the candidates of the code to be completed are based on the code to be completed.
  • the contextual features of the whole code are determined from the upper and lower feature databases, not predicted by models such as long short-term memory (LSTM), and have a high probability of passing the compilation check. Therefore, the candidates predicted by this method have high accuracy.
  • Code completion based on the above candidates can effectively reduce the number of times the user manually completes the code, or effectively reduces the number of times the user corrects the completed code, which greatly improves development efficiency and user experience.
  • this method does not require complex models, requires less computing power, and does not require graphics processor unit (GPU) resources, and can be deployed locally to avoid network transmission delays and other phenomena such as freezing, improving user experience.
  • GPU graphics processor unit
  • the code processing system may also obtain statistical information of at least one candidate item in the context database, where the statistical information may include the number of invocations of the candidate item, such as the number of nested invocations, the number of loop invocations, and so on.
  • the statistical information can reflect the frequency of use of the candidates, and the code processing system can filter the at least one candidate according to the statistical information, for example, to filter candidates with a low frequency of use.
  • the code processing system can present the filtered candidates to the user through a user interface such as a GUI, so as to provide the user with candidates with higher frequency of use. In this way, the number of candidates can be effectively reduced, and outdated and deprecated recommendations can be avoided.
  • the application programming interface (API), etc. can improve the prediction accuracy.
  • the code processing system can also sort the candidates according to the statistical information.
  • the code processing system can also display the candidates in the order of the sorting results. In this way, the user can quickly learn the candidates that are ranked high and frequently used, which facilitates the user to quickly select the above candidates and improves the efficiency of code completion.
  • the code processing system may also input the at least one candidate item and the context feature of the code to be completed into the evaluation model to obtain the recommendation probability of the at least one candidate item.
  • the code processing system may further screen the candidates according to the recommendation probability of the at least one candidate, such as determining a target candidate in the at least one candidate according to the recommendation probability of the at least one candidate, and presenting the target candidate to the user through the user interface. item.
  • the evaluation model can be obtained by training the initial model with samples collected from open source datasets or user-private datasets.
  • the initial model can be a simple model including 2 or more hidden layers.
  • the hidden layer can be a fully connected layer (Dense layer), and the activation function of the hidden layer can be a hyperbolic function such as a hyperbolic tangent function TANH.
  • the output layer includes a loss function, which can be a cross-entropy loss function (cross entry, XENT) and so on.
  • the evaluation model trained by the above initial model does not need to consume GPU resources, and can be deployed locally (for example, a local computing device), which can reduce the transmission delay, avoid the phenomenon of freezing caused by too long network transmission delay, and improve the user experience. .
  • the evaluation model can be implemented by a binary classification model.
  • the binary classification model takes candidates and contextual features of the candidates as input, and recommends labels as output. Specifically, the binary classification model matches the input candidate and the contextual feature of the candidate with the existing identifier and the contextual feature of the identifier, so as to determine the recommended label.
  • the recommended label can take values of 0, 1, or true or false. When the recommended label is 0 or false, it indicates that the candidate is not recommended, and when the recommended label is 1 or true, it indicates that the candidate is recommended.
  • the evaluation model can further filter the candidates according to the above recommended labels to improve the accuracy of the predicted candidates, thereby improving the accuracy of code completion.
  • the evaluation model may also acquire statistical information of candidates whose recommendation label is 1 or true, and determine the recommendation probability of the candidate based on the statistical information, for example, determine the recommendation probability according to the score value. In this way, the code processing system can sequentially display candidates recommended by the evaluation model according to the recommendation probability.
  • the code processing system may also fill in the at least one code according to the code in the code file where the code input by the user is located (hereinafter referred to as native code for convenience of description) The parameter of the candidate.
  • the code processing system can present at least one candidate filled with the above parameters to the user through a user interface such as a GUI, thereby realizing multi-symbol completion.
  • the code processing system can use a depth-first search algorithm to search for parameters corresponding to functions, for example, search for parameters corresponding to functions from native code, and then perform parameter filling for candidates based on the parameters obtained by the search. Further, the code processing system may also fill multiple sets of parameters for a candidate item, and obtain multiple candidate items after filling the parameters. The code processing system can sort the candidate items after filling the parameters according to the information including the distance between the parameter and the code to be completed, filter the candidate items according to the sorting result, or display the candidate items in sequence. In this way, the candidate items that are close to the user's input intention can be displayed first, which is convenient for the user to quickly select and improves the efficiency of code completion.
  • function names include method names. Therefore, the code processing system can fill in the method parameters according to the native code, thereby realizing multi-symbol completion.
  • the code processing system may perform code analysis according to contextual features of the code, and determine at least one candidate for the code to be completed from the context database.
  • the code processing system may use a depth first search (DFS) algorithm to search the context database according to the context characteristics of the code to be completed, and determine at least one candidate for the code to be completed.
  • DFS depth first search
  • the code processing system can search for candidates matching the context features of the code to be completed through a depth-first search algorithm. For each candidate, the code processing system can continue to search until a static function call is found.
  • this method searches for candidates in the context feature library in combination with the context features of the code to be completed, and the sample code and its context features in the context feature library are extracted from the code that conforms to the grammar rules and passes the compilation check, so , the candidate obtained by this method conforms to the grammar rules, and has a high probability of passing the compilation check.
  • the method adopts the depth-first search algorithm for matching, which can match all matching candidates in the context feature library.
  • the sample code in the context database can also include uncommon identifiers, such as uncommon APIs. Based on this, even in complex contexts (contexts using uncommon APIs), the method can determine more accurate candidates from the context database and achieve high-precision code prediction.
  • the code input by the user includes the prefix of the code to be completed.
  • the prefix of the identifier to be completed Based on this, the code processing system can also determine the completion condition according to the input code. Specifically, the completion condition is that the candidate item of the code to be completed includes the above prefix.
  • the prefix of the identifier can be any one of the first 1-bit code to the first N-1-bit code.
  • the code processing system may determine at least one candidate matching the prefix of the code to be completed from the context database according to the context feature of the code to be completed item. In this way, the candidates can be predicted more accurately, and the prediction accuracy can be improved.
  • the context database includes at least one of a database constructed based on an open source dataset and a database constructed based on the user's private dataset.
  • the code processing system can index the code in open source datasets such as GitHub corpus, so as to identify the class name, method name, function name, variable name, parameter name, operator name in the code Equal identifiers, and then determine the contextual characteristics of each identifier, and store the identifiers and the contextual characteristics of the identifiers in the database, thereby obtaining a contextual database.
  • open source datasets such as GitHub corpus
  • the code processing system can also index the user's private data set, such as the code in the code warehouse provided by the user, so as to identify identifiers such as class name, method name, function name, variable name, parameter name, operator, etc. in the code, Then determine the context feature of each identifier, and obtain a context database according to the identifier and its context feature.
  • identifiers such as class name, method name, function name, variable name, parameter name, operator, etc.
  • the code processing system may construct a context database according to the open source dataset and the user's private dataset, respectively, and use the context database constructed according to the open source dataset and the context database constructed according to the user's private dataset to determine the candidates for the code to be completed. This improves prediction accuracy.
  • the code to be completed includes code in a method of a class (also called a class method), and the code entered by the user includes a return type.
  • the code processing system can predict candidates for the code to be completed in the class method according to the return type, and implement the completion of the class method based on the candidates.
  • a class method can be called in different contexts.
  • the context of a class method can be different.
  • the code processing system can, for each method invocation, determine the role of the method invocation according to the contextual characteristics of the method invocation. For example, for the getitem() method, it can be determined that the role invoked by the method includes get accessor (or read accessor); for the add() method, it can be determined that the role invoked by the method includes adder; for the removeitem() method, it can be determined that the The roles of method calls include remover.
  • the code processing system When the code processing system is training the evaluation model, it can also improve the accuracy of the evaluation model according to the characteristics such as the role of the added method call. In this way, the evaluation model can determine the recommendation probability of the candidate item in combination with the role of the method call, so that the candidate item recommended by the evaluation model is more in line with the user's intention, thereby obtaining higher completion accuracy.
  • the code processing system can also track the data flow and filter out at least one candidate with circular references. Candidates, the filtered candidates are presented to the user through a user interface such as a GUI. In this way, circular references can be avoided and the completion accuracy can be improved.
  • the present application provides a code processing apparatus.
  • the code processing device includes:
  • an interface unit for receiving a code input by a user through a user interface
  • a feature extraction unit configured to determine the context feature of the code to be completed according to the code input by the user
  • an analysis unit configured to determine at least one candidate of the code to be completed from a context database according to the context feature of the code to be completed, where the context database stores sample code and the context feature of the sample code;
  • the interface unit is further configured to present the at least one candidate item to the user through the user interface.
  • the analysis unit is also used for:
  • the interface unit is specifically used for:
  • the filtered candidates are presented to the user through the user interface.
  • the apparatus further includes:
  • the evaluation unit is configured to input the at least one candidate item and the context feature of the code to be completed into an evaluation model, obtain a recommendation probability of the at least one candidate item, and determine the recommended probability of the at least one candidate item. target candidate in at least one candidate;
  • the interface unit is specifically used for:
  • the target candidates are presented to the user through the user interface.
  • the apparatus further includes:
  • a parameter filling unit configured to fill in the parameters of the at least one candidate item according to the code in the file where the code input by the user is located when the candidate item includes a function name
  • the interface unit is specifically used for:
  • the at least one candidate populated with the parameters is presented to the user through the user interface.
  • the analysis unit is specifically used for:
  • a depth-first search algorithm is used to search a context database to determine at least one candidate of the code to be completed.
  • the code input by the user includes a prefix of the code to be completed
  • the analysis unit is specifically used for:
  • At least one candidate matching the prefix of the code to be completed is determined from a context database according to the contextual feature of the code to be completed.
  • the context database includes at least one of a database constructed based on an open source dataset and a database constructed based on the user's private dataset.
  • the code to be completed includes code in a method of a class, and the code input by the user includes a return type.
  • the apparatus further includes:
  • An evaluation unit configured to determine a role of a method call corresponding to the to-be-completion code according to a context feature of the to-be-completion code, where the role is used to assist in determining a recommendation probability of a candidate for the to-be-completion code.
  • the analysis unit is also used for:
  • the interface unit is specifically used for:
  • the filtered candidates are presented to the user through the user interface.
  • the present application provides an apparatus including a processor and a memory.
  • the processor and the memory communicate with each other.
  • the processor is configured to execute instructions stored in the memory to cause an apparatus to perform the method as in the first aspect or any one of the implementations of the first aspect.
  • the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and the instructions instruct a device to execute the first aspect or any implementation manner of the first aspect. method.
  • the present application provides a computer program product comprising instructions that, when executed on a device, cause the device to perform the method described in the first aspect or any one of the implementations of the first aspect.
  • the present application may further combine to provide more implementation manners.
  • FIG. 1 is a system architecture diagram of a code processing system provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an interface for displaying at least one candidate item provided by an embodiment of the present application
  • 3A is a schematic structural diagram of a code processing system provided by an embodiment of the present application.
  • 3B is a schematic structural diagram of a code processing system provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an interface for displaying at least one candidate item provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an interface before and after a code fragment completion provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of the effect of a code completion provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of the effect of a code completion provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a code processing apparatus provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a computing device according to an embodiment of the present application.
  • first and second in the embodiments of the present application are only used for the purpose of description, and cannot be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
  • source code refers to the computer language supported by developers using development tools such as code editors and integrated development environments (IDEs), such as at least one of C language, Java language, Python language, etc.
  • IDEs integrated development environments
  • Source code (also referred to as a source program) includes a series of human-readable computer language instructions.
  • Computer language instructions in the source code can be compiled by a compiler into computer-executable binary instructions.
  • the computer executes the binary instructions to implement corresponding functions.
  • the computer-executable binary instructions may also be referred to as object code. Code can include source code and object code.
  • the identifier is the smallest compilation unit in the source code compilation process.
  • Identifiers can include any one or more of keywords, class names, method names, function names, variable names, parameter names, and operators.
  • the keywords refer to words with special meanings specified in the computer language, such as break representing a breakpoint, return representing a return, and so on.
  • a class refers to a data structure in an object-oriented computer language that describes the common attributes and methods of a created object.
  • the class name is the name of the class.
  • the method name is the name of the method described by the class.
  • the class name can be DocumentBuilder, and the method name can be newDocument.
  • a function is a block of executable code that implements a function. Since methods are related to objects and classes and rely on objects to be invoked, methods can also be regarded as a special function in object-oriented computer languages.
  • the function name is the name of the function, such as count, print, and so on. The function supports passing in some parameters and processing the parameters. Further, the function can also return some data, that is, the function can also include the return value.
  • the parameter name is the name of the parameter.
  • the parameters used when defining the function name and function body are called formal parameters, or formal parameters for short.
  • a formal parameter is a dummy variable that does not occupy memory.
  • the parameters when the function is called are called actual parameters, or actual parameters for short.
  • An argument is a type of variable that takes up memory.
  • a variable is a data structure that stores variable data, which can be variable numerical values, such as function values, or variable text, such as text typed by a user.
  • the variable name refers to the name of the variable.
  • Code completion refers to predicting at least one identifier that the user intends to input based on the code (such as source code) that the user (such as a developer) has entered, and providing input suggestions for the user according to the prediction result, so that the user can The code is directly completed based on this input suggestion. Code completion can reduce the number of times users type characters, reduce spelling mistakes, and eliminate the need for users to spend time memorizing unfamiliar class names, method names, etc., which can improve development efficiency.
  • Code completion can be divided into single token completion and multi token completion.
  • single-symbol completion refers to predicting a single identifier, and then performing completion according to the prediction result.
  • single-symbol completion may include completion of class names, method names, function names, variable names, or parameter names.
  • Multi-symbol completion refers to predicting multiple identifiers, and then performing completion based on the prediction results.
  • multi-symbol completion may include completion of multiple types of class names, method names, and parameter names, or completion of code snippets that include multiple identifiers.
  • the code fragment is a small piece of source code, and the code fragment may include some functional statements, such as class declaration, function declaration, or a code block with start and end identifiers.
  • a typical application of multi-symbol completion is the completion of class methods (also called class methods for short), especially for class methods with return types.
  • AI artificial intelligence
  • LSTM Long Short-Term Memory
  • LSTM Long Short-Term Memory
  • the code input by the user is serialized to obtain an input sequence, and then the input sequence is input into the above-mentioned deep neural network to predict a candidate for the next identifier, thereby realizing single-symbol completion.
  • a candidate with a higher probability can also be selected, serialized after merging the candidate with the input code, generating a new input sequence, and inputting the input sequence into the above-mentioned deep neural network to predict the candidate of the next identifier. , thus realizing multi-symbol completion.
  • code modeling mainly includes sequence modeling and abstract syntax tree (abstract syntax code, AST) modeling. Sequence modeling is achieved by lexically analyzing the code and obtaining the token stream.
  • AST modeling is achieved by lexical analysis and syntax analysis of the code. Even if the code generated based on this method conforms to the grammar rules, there is a high probability that it cannot be compiled. That is, the accuracy of the completion code predicted by the above method is not high, and it is difficult to meet user needs.
  • the present application provides a code processing method.
  • the method may be performed by a code processing system.
  • the code processing system receives the code input by the user through a user interface such as a graphical user interface (GUI) or a command user interface (CUI), and then determines the context of the code to be completed according to the code input by the user feature, then determine at least one candidate of the code to be completed from the context database according to the context feature of the code to be completed, wherein the context database stores the sample code and the context feature of the sample code, and the code processing system uses a user interface such as a GUI The user is presented with at least one of the above-mentioned candidates, thereby implementing code completion.
  • GUI graphical user interface
  • CLI command user interface
  • the candidates of the code to be completed are obtained by static analysis of the code, such as static syntax analysis and semantic analysis, the candidates of the code to be completed conform to the grammar rules, and the candidates of the code to be completed are based on the code to be completed.
  • the contextual features of the whole code are determined from the upper and lower feature databases, not predicted by models such as LSTM, and have a higher probability of passing the compilation check. Therefore, the candidates predicted by this method have high accuracy.
  • Code completion based on the above candidates can effectively reduce the number of times the user manually completes the code, or effectively reduces the number of times the user corrects the completed code, which greatly improves development efficiency and user experience.
  • this method does not require complex models, requires less computing power, and does not require graphics processor unit (GPU) resources, and can be deployed locally to avoid network transmission delays and other phenomena such as freezing, improving user experience.
  • GPU graphics processor unit
  • the sample code in the above context database may include uncommon application programming interfaces (APIs), based on which the method can be determined from the context database even in complex contexts (contexts using uncommon APIs) More accurate candidates can be obtained to achieve high-precision code prediction.
  • APIs application programming interfaces
  • the code processing system may also acquire statistical information of the at least one candidate item in the context database, and then filter the at least one candidate item according to the statistical information.
  • the prediction accuracy can be further improved, and on the other hand, it can avoid recommending outdated or deprecated candidates, such as outdated or deprecated APIs.
  • the code processing system can not only predict the identifiers such as method names and function names, but also predict the parameters of methods and functions, that is, the code processing system can perform single-symbol prediction or multi-symbol prediction. Candidates are predicted from the context database, therefore, both single-symbol prediction and multi-symbol prediction have high accuracy.
  • the code processing system inputs the candidate item filled with parameters into the evaluation model, obtains the recommendation probability of the candidate item, and then performs accurate recommendation based on the recommendation probability, further improving the accuracy of code completion.
  • the code processing method provided by the embodiment of the present application may be provided to the user in the form of a plug-in (plug-in).
  • a plug-in is a program written in accordance with a certain standard application program interface, and the program runs under the platform specified by the program (may support multiple platforms at the same time), and cannot be run independently of the specified platform.
  • the service provider of the development tool or a third party may release a plug-in for the development tool, such as an IDE or a code editor, to enhance the function of the development tool.
  • a plug-in for the development tool such as an IDE or a code editor
  • the present application takes the development tool as the IDE for illustration.
  • the code processing system 100 includes an IDE 102 and a completion subsystem 104 located at the back end.
  • the IDE 102 includes an IDE core 1022 (IDE core) and an IDE plug-in 1024 installed in the IDE 102.
  • Completion subsystem 104 includes code analysis module 1042 and context database 1044 .
  • the completion subsystem 104 may further include any one or more of a parameter filling module 1046 , an evaluation module 1048 and an indexing module 1049 .
  • the IDE kernel 1022 is used to provide native functions of the IDE 102, such as code prompting, code spelling detection, etc.
  • the IDE plug-in 1024 is used to interact with the completion subsystem 104 to implement enhanced functions, such as intelligent code completion. The interaction process is described in detail below.
  • the IDE plug-in 1024 can receive the code input by the user through the user interface, and obtain the position of the input cursor in the code, where the position of the input cursor is the position of the code to be completed, and thus the code to be completed can be determined Context characteristics of the base class, such as the type of the base class (such as public, private, protected, etc.), the class name of the base class, the prefix, the return type, and any one or more of the Boolean characteristics.
  • the Boolean feature may include at least one of the following features:
  • the IDE plug-in 1024 can send the contextual characteristics of the code to be completed to the completion subsystem 104, and the completion subsystem 104 performs static analysis according to the contextual characteristics of the code to be completed, generates at least one candidate for the code to be completed, and generates at least one candidate for the code to be completed. At least one candidate is returned to IDE plugin 1024.
  • the IDE 102 may present at least one candidate for the code to be completed to the user, so that the user can select one candidate for code completion.
  • the context database 1044 of the completion subsystem 104 stores the sample code and the context features of the sample code.
  • the sample code may be an identifier such as a class name, method name, function name, variable name, or parameter name. It should be noted that the sample code may be a single identifier or multiple identifiers.
  • the context feature of the sample code is specifically a feature extracted based on the context of the sample code, for example, it may be a variable type, an object type, a return type, and the like.
  • the code analysis module 1042 of the completion subsystem 104 may determine at least one candidate for the code to be completed from the context database 1044 according to the contextual characteristics of the code to be completed.
  • this embodiment of the present application also provides a schematic diagram of an interface in which the IDE 102 presents candidate items to the user.
  • the code editing interface 200 presents the code input by the user, as shown in 202 in the figure.
  • 202 in Figure 2 shows a fragment of the code input by the user, specifically:
  • Some code fragments may also be included before the code fragment or after the code fragment, which are schematically illustrated by "" in FIG. 2 .
  • the context feature of the code to be completed may include that the return type is Document type.
  • the IDE plug-in 1024 sends the context feature to the completion subsystem 104, and the code analysis module 1042 determines at least one candidate for the code to be completed from the context database according to the context feature.
  • the code analysis module 1042 may also obtain statistical information of the candidate items in the context database 1044, and filter at least one candidate item according to the statistical information. Further, the code analysis module 1042 may also transmit the candidates (eg, filtered candidates) to the parameter filling module 1046 and the evaluation module 1048 for subsequent processing, such as parameter filling and recommendation probability evaluation.
  • the IDE 102 can obtain the above-mentioned candidates, for example, the candidates filtered by the code analysis module 1042, the candidates filled by the parameter filling module 1046, or the candidates determined by the evaluation module 1048 according to the recommendation probability, and display the above candidates, specifically as follows Shown at 204 in FIG. 2 .
  • the IDE kernel 1022 can also directly generate at least one candidate item according to the code input by the user through the text completion technology.
  • the completion method of the IDE kernel 1022 itself such as the text completion method, can be used compatible with the code completion method provided by the examples of this application.
  • the IDE 102 may display the candidate items generated by the text completion method and the candidate item generated by the code processing method of the embodiment of the present application together , specifically shown as 204 and 206 in FIG. 2 .
  • the IDE 102 when the IDE 102 displays the candidate items generated by different methods, it can also be distinguished by different display methods. For example, IDE 102 may prepend the candidate with an identification of the method used by the candidate. As shown in FIG. 2 , the IDE 102 may use a circle to identify candidates generated by the IDE kernel 1022 through the text completion method, and a plus sign with a box added to identify candidates generated through the code processing method of the embodiment of the present application. For another example, the IDE 102 can also use different colors or different fonts to distinguish candidates generated by different methods.
  • the IDE 102 may sequentially display the candidates according to the probability of each candidate. In this way, it is convenient for the user to quickly know the candidates with higher probability, and select the candidates with higher probability for code completion.
  • the code analysis module 1042 may determine more candidates from the code feature library according to the contextual features of the code to be completed. Considering that the probability of use of some candidates is low, the code analysis module 1042 may also, according to the statistical information of the candidates in the context database 1044, such as the number of calls callCount, the number of nested calls, etc., Candidates are filtered, and candidates with lower probability are filtered out.
  • the parameter filling module 1046 can also fill in parameters for the candidate item, for example, the filtered candidate item, thereby realizing multiple functions. Symbol completion to avoid users manually entering function parameters. Specifically, the parameter filling module 1046 may determine appropriate parameters by searching for local codes (specifically, the codes in the code file where the codes input by the user are located), and fill the parameters into the candidates. Then, the parameter filling module 1046 inputs the candidates after filling the parameters into the evaluation module 1048, and the evaluation module 1048 can evaluate the candidates after filling the parameters through the evaluation model, and determine the recommendation probability of the candidates after filling the parameters.
  • the parameter filling module 1046 may determine appropriate parameters by searching for local codes (specifically, the codes in the code file where the codes input by the user are located), and fill the parameters into the candidates. Then, the parameter filling module 1046 inputs the candidates after filling the parameters into the evaluation module 1048, and the evaluation module 1048 can evaluate the candidates after filling the parameters through the evaluation model, and determine the recommendation probability of the candidates after filling
  • the evaluation module 1048 may send the candidates filled with parameters and their recommended probabilities to the IDE plug-in 1024 .
  • the IDE 102 may display the top N candidates according to the recommendation probability, or display the candidates whose recommendation probability is greater than the preset probability.
  • the evaluation module 1048 can also filter the candidates after filling the parameters according to the recommendation probability, for example, filter the candidates with the top N in the ranking, or filter the candidates whose recommendation probability is greater than the preset probability, and then return to the IDE plug-in 1024. Screened candidates, the IDE displays the screened candidates, or displays the screened candidates and their recommendation probabilities.
  • the indexing module 1049 may also index local codes, such as codes in the user's code warehouse, obtain sample codes and context features of the sample codes according to the indexing results, and store them in the context database 1044 .
  • the candidates determined by the code analysis module 1042 from the context database 1044 according to the context characteristics of the code to be completed may include local code calls, such as including local variable names.
  • the indexing module 1049 enriches the context database by indexing the codes in the code warehouse of the user, thereby making the intelligent recommendation result more accurate when performing intelligent code completion.
  • the service provider of the development tool may also natively embed the relevant code of the code processing method provided by the embodiment of the present application in the kernel when developing the above-mentioned development tool, so that the need for installing plug-ins can be omitted. operation to avoid security risks introduced by plug-in installation.
  • each part of the code processing system 100 may be centrally deployed on a local computing device (a user device under the direct control of a user, such as a user terminal such as a notebook computer, a desktop computer, a smart phone, etc.) or a cloud In a computing cluster (including at least one cloud computing device, such as a cloud server, etc.).
  • a local computing device a user device under the direct control of a user, such as a user terminal such as a notebook computer, a desktop computer, a smart phone, etc.
  • a cloud In a computing cluster including at least one cloud computing device, such as a cloud server, etc.
  • various parts of the code processing system 100 can also be deployed in a cloud computing cluster in a distributed manner. The deployment method of the code processing system 100 will be described in detail below.
  • the IDE 102 and the completion subsystem 104 may be deployed in a local computing device, such as a terminal device such as a personal computer (PC). Since the code processing system 100 in the embodiment of the present application consumes less computing power when performing code analysis, and the evaluation model can be implemented through a simple (for example, including two hidden layers) network, therefore, for the terminal device The requirements for computing performance are low, and lightweight terminal devices can meet business needs.
  • the completion subsystem 104 and the IDE 102 are deployed together on the local computing device, which can reduce the interaction time between the IDE 102 and the completion subsystem 104, and avoid network transmission delays that cause the completion function to be stuck and affect user experience.
  • the IDE 102 and the completion subsystem 104 may be deployed in a cloud computing cluster. That is, the IDE 102 is a cloud IDE, and the IDE 102 and the completion subsystem 104 are provided to users in the form of cloud services.
  • the cloud service provider may integrate the intelligent code completion service provided by the completion subsystem 104 and the code development environment provision service provided by the cloud IDE into a cloud service for the user to use, or may separately provide the cloud IDE and the code intelligent completion service. Both cloud services are available for users to use.
  • the cloud service provider can use the intelligent code completion service as a value-added service of the cloud IDE. After the user purchases or leases the value-added service, the cloud service provider combines it in the cloud IDE and provides it to the user for use.
  • the IDE 102 and the completion subsystem 104 are provided by a cloud service provider, and the IDE 102 and the completion subsystem 104 can be deployed in the same cloud computing cluster .
  • the IDE 102 and the completion subsystem 104 may also be provided by different cloud service providers and deployed in different cloud computing clusters.
  • the IDE 102 can be deployed on a local computing device, and the completion subsystem 104 can be deployed on a cloud computing cluster.
  • the IDE 102 calls the completion subsystem 104 in the cloud computing cluster to obtain at least one candidate for the code to be completed.
  • users who use the completion service can pre-register the cloud service, so that the cloud service can be drained.
  • the evaluation model deployed in the cloud computing cluster can be based on the user's private data set, such as a custom model trained on the data set constructed based on the code warehouse provided by the user.
  • the model is more suitable for the user environment and has better performance. recommended effect.
  • 3A to 3B are only some specific examples of the deployment methods of the code processing system 100 in the embodiments of the present application.
  • the code processing system 100 may be deployed in other methods, for example, the IDE 102 is deployed in the In the cloud, the completion subsystem 104 is deployed on a local computing device, which is not limited in this embodiment of the present application.
  • the method includes:
  • the code processing system 100 receives the code input by the user through the user interface.
  • the code processing system 100 may receive, through a user interface (eg, GUI or CUI), codes that are typed by a user through a physical keyboard.
  • a user interface eg, GUI or CUI
  • codes that are typed by a user through a physical keyboard.
  • the code processing system 100 can also receive, through the user interface, the code entered by the user through the virtual keyboard in a touch-sensitive manner.
  • the code processing system 100 may also select a code file through a user interface to receive the code in the code file.
  • the code in the code file may include the code previously written by the current user, or may include the code previously written by other users.
  • the code input by the user may be code written according to a single computer language, for example, code written according to C language, code written according to Java language, or code written according to Python language.
  • the code input by the user may also be a code written in a mixed programming manner according to multiple computer languages, for example, a code written according to a C language or an embedded assembly language.
  • S404 The code processing system 100 determines the contextual feature of the code to be completed according to the code input by the user.
  • the code processing system 100 can capture the position of the input cursor, and when the code completion function is triggered, the position of the input cursor is the completion position.
  • the completion position can be the end position of the input line, or the middle position. Of course, in some embodiments, the completion position may also be the start position of the input line.
  • Context features refer to features that can express the context in which the code is located, such as any one or more of the base class type, base class class name, prefix, return type, and Boolean features.
  • the boolean type is_in_API is true. Among them, is_in_API is true to indicate that the current completion is class method completion.
  • the completion type can be single-symbol completion or multi-symbol completion.
  • the code processing system can determine whether the completion type is single-symbol completion or multi-symbol completion according to preset settings, or determine whether the completion behavior is single-symbol completion or whether the completion type is set when the user triggers the code completion function. Multi-symbol completion.
  • Trigger completion can be implemented in several ways.
  • the code processing system 100 may detect that the time when the user stops inputting reaches a preset time, and then determines to trigger the code completion function.
  • the code processing system 100 can set a trigger condition, such as double-clicking with the right mouse button or clicking a shortcut key (for example, the tab key). When the shortcut key is pressed, the code completion function is triggered.
  • the user can also set the completion type to single-symbol completion or multi-symbol completion when triggering the code completion function.
  • the input code may include a prefix of the code to be completed, such as a prefix of the identifier to be completed.
  • the code processing system 100 can also determine the completion condition according to the input code. Specifically, the completion condition is that the candidate item of the code to be completed includes the above prefix.
  • the prefix of the identifier can be any one of the first 1-bit code to the first N-1-bit code.
  • the code processing system 100 determines at least one candidate for the code to be completed from the context database according to the contextual feature of the code to be completed.
  • the context database stores sample code and context features of the sample code.
  • Sample code can include any one or more of identifiers such as class names, method names, function names, variable names, or parameter names.
  • the sample code can come from an open source dataset or a user's private dataset.
  • the code processing system 100 can index the code in an open source data set such as GitHub corpus, so as to identify the class name, method name, function name, variable name, parameter name in the code Name, operator and other identifiers, and then determine the context characteristics of each identifier, and store the identifier and the context characteristics of the identifier in the database, thereby obtaining a context database.
  • an open source data set such as GitHub corpus
  • the code processing system 100 may also index the user-private data set, such as the code in the code repository provided by the user, so as to identify the class name, method name, function name, variable name, Identifiers such as parameter names, operators, etc., and then determine the contextual characteristics of each identifier, and obtain a contextual database according to the identifier and its contextual characteristics.
  • the user-private data set such as the code in the code repository provided by the user
  • the code processing system 100 may also construct a context database according to the open source dataset and the user's private dataset, for example, construct a first context database and a second context database, and the first context database is used to store the identifiers in the open source dataset.
  • the identifier and the context feature of the identifier, and the second context database is used to store the identifier and the context feature of the identifier in the user private data set.
  • the code processing system 100 (eg, the code analysis module 1042 in the completion subsystem 104 ) can perform code analysis according to the contextual features of the code, and determine at least one candidate for the code to be completed from the context database.
  • the code processing system 100 uses a depth first search (DFS) algorithm to search the context database according to the contextual characteristics of the code to be completed, and determines the to-be-completed code. At least one candidate for full code.
  • DFS depth first search
  • a depth-first search algorithm is an algorithm for traversing a tree (such as an abstract syntax tree of code) or a graph. Taking traversing a tree as an example, the nodes of the tree are traversed along the depth of the tree, and the branches of the tree are searched as deep as possible. When the edges of node v have been explored, the search will backtrack to the starting node of the edge that found node v. This process continues until all nodes reachable from the source node have been discovered. If there are still undiscovered nodes, select one of them as the source node and repeat the above process. The whole process is repeated until all nodes are visited.
  • a tree such as an abstract syntax tree of code
  • the code processing system 100 may search for candidates matching the contextual features of the code to be completed through a depth-first search algorithm, and for each candidate, the code processing system 100 may continue to search until a static function call is found.
  • the code processing system 100 can search for function calls or API calls whose return value type is Document type, for example, including newDocument() or parse().
  • the code processing system 100 continues to perform a deep search, determines that newDocument is called by DocumentBuilder, and then searches for a call that returns DocumentBuilder, including, for example, newDocumentBuilder(), and then the code processing system 100 continues to perform a deep search according to newDocumentBuilder().
  • the code processing System 100 determines that DocumentBuilder is called by DocumentBuilderFactory, and code processing system 100 searches to return DocumentBuilderFactory.newInstance.
  • DocumentBuilderFactory.newInstance is a static function call
  • the static function call can be called directly on the right side of the equal sign, so the code processing system 100 can stop the deep search for newDocument and generate a candidate: DocumentBuilderFactory.newInstance().newdDocumentBuilder().newDocument ( ).
  • the deep search process for parse() can refer to the deep search process for newDocument(), based on which candidates can be generated:
  • the code processing system 100 may further determine the evaluation index value of at least one candidate item, and then filter the at least one candidate item according to the evaluation index value, or sort the evaluation index value.
  • the evaluation index value may be a score value determined according to the statistical information of the candidate item in the context database 1044, or a probability value of a recommendation probability determined based on an evaluation model.
  • Statistics may include usage information. For example, when the candidate includes a class name (typename), the statistical information may include class usage information; for example, when the candidate includes a method name, the statistical information may include method usage information.
  • the class usage information may specifically include any one or more of the following information:
  • the method usage information may specifically include any one or more of the following information:
  • the code processing system 100 (eg, the code analysis module 1042 in the completion subsystem 104 ) can determine the score value of the candidate item according to the above-mentioned usage information of the candidate item. Specifically, the code processing system 100 may assign weights to different usage information respectively, and then determine the score value of the candidate item through weighted operations (eg, weighted summation, weighted average calculation).
  • weighted operations eg, weighted summation, weighted average calculation.
  • the code processing system 100 can filter at least one candidate item according to the score value, for example, the code processing system 100 can filter out the candidate items whose score value is lower than a preset value, or the candidate items whose score value is ranked later (such as after Top 10) , which can avoid recommending outdated and deprecated APIs, etc., and improve accuracy. Further, the code processing system 100 may also sort the at least one candidate item according to the score value, so as to display the candidate items in the order of the highest score value.
  • the code processing system 100 may also fill in parameters for at least one candidate.
  • the code processing system 100 may use a depth-first search algorithm to search for parameters corresponding to functions, for example, search for parameters corresponding to functions from native code, and then fill in parameters for candidates based on the parameters obtained by searching.
  • the code processing system 100 (eg, the parameter filling module 1046 of the completion subsystem 104 ) can fill in multiple sets of parameters for a candidate item when filling parameters to obtain multiple candidate items after filling the parameters. As shown in FIG. 5 , the code processing system 100 can sort the candidates after filling the parameters according to the information including the distance between the parameter and the code to be completed. The parameter candidates 502 are displayed at the top of the list.
  • the code processing system 100 may also provide the user with a candidate filled with parameters and a candidate without parameters, so that when the candidate filled with parameters does not meet user requirements , the user can select the above-mentioned candidates for unfilled parameters and manually input the parameters to avoid unnecessary correction operations.
  • the code processing system 100 can be used not only to complete the entire line of code, but also to further complete the entire code fragment.
  • the completion code fragment is essentially multi-symbol completion
  • the specific implementation method can refer to the specific implementation method of completing the entire line of code.
  • Figure 6 also shows an interface diagram for completing the entire code fragment.
  • Figure 6 (A) is a schematic diagram of the interface before the code fragment is completed.
  • the code input by the user includes:
  • the code processing system 100 can identify the local variables path and file, and based on the document type, determine at least one candidate including:
  • the code processing system 100 identifies file as a parameter of the above-mentioned candidate, and needs to create a file before this, and use path as a parameter of the file.
  • the code processing system 100 recognizes the parser configuration exception ParserConfigurationException, and completes the try catch statement in the code fragment according to the exception, as shown in (B) in FIG. 6 , the bold and italic code is the completed code.
  • the code processing system 100 may also input at least one candidate (for example, a candidate filled with parameters) and contextual features of the code to be completed into the evaluation model to obtain a recommendation probability of the at least one candidate.
  • the code processing system 100 may determine a target candidate in the at least one candidate according to the recommendation probability of the at least one candidate.
  • the target candidate is a candidate whose recommendation probability satisfies a preset condition, for example, a recommendation probability is greater than a preset probability value, or a candidate whose recommendation probability ranks high (eg, top N, where N is a positive integer).
  • the evaluation model can be obtained by training the initial model with samples collected from open source datasets or user-private datasets.
  • the code processing system 100 may construct an initial model, and the initial model may be a model including two or more hidden layers.
  • the initial model may include one input layer, two hidden layers, and one output layer.
  • the hidden layer may be a fully connected layer (Dense layer), and the activation function of the hidden layer may be a hyperbolic function such as a hyperbolic tangent function TANH.
  • the output layer includes a loss function, which can be a cross-entropy loss function (cross entry, XENT) and so on.
  • the code processing system 100 may input the samples (including identifiers and contextual features of the identifiers) collected from open source datasets or user-private datasets into the initial model for training to iteratively update parameters of the initial model.
  • the loss function of the model satisfies the training end condition, such as the loss function tends to converge or is smaller than the preset value
  • the training can be stopped.
  • the trained model can be used as an evaluation model to evaluate the probability that the candidates for filling the parameters are correct. Among them, the correct probability of the candidate for filling parameters can be used as the recommendation probability of the candidate.
  • the evaluation model can also be realized by a binary classification model.
  • the binary classification model takes a candidate item (for example, a candidate item filled with parameters) and contextual features of the candidate item as input, and outputs a recommended label.
  • the binary classification model matches the input candidate and the contextual feature of the candidate with the existing identifier and the contextual feature of the identifier, so as to determine the recommended label.
  • the recommended label can take values of 0, 1, or true or false. When the recommended label is 0 or false, it indicates that the candidate is not recommended, and when the recommended label is 1 or true, it indicates that the candidate is recommended.
  • the code processing system 100 can further filter the candidate items according to the above-mentioned recommended tags, so as to improve the accuracy of predicting the candidate items, thereby improving the accuracy of code completion.
  • the evaluation model may also obtain statistical information of a candidate whose recommendation label is 1, and determine the recommendation probability of the candidate based on the statistical information, for example, determine the recommendation probability according to the score value.
  • a class method can be called in different contexts.
  • the context of a class method can be different.
  • the code processing system 100 may, for each method call, determine the role of the method call according to the context characteristics of the method call. For example, for the getitem() method, it can be determined that the role invoked by the method includes get accessor (or read accessor); for the add() method, it can be determined that the role invoked by the method includes adder; for the removeitem() method, it can be determined that the The roles of method calls include remover.
  • the accuracy of the evaluation model can also be improved according to features such as roles added to the method call.
  • the evaluation model can determine the recommendation probability of the candidate item in combination with the role of the method call, so that the candidate item recommended by the evaluation model is more in line with the user's intention, thereby obtaining higher completion accuracy.
  • variable A is the consumer of variable B.
  • variable B also acts as the consumer of the variable A, which is usually illegal. Therefore, the code processing system 100 can also track the data flow, thereby avoiding the occurrence of circular references and improving the completion accuracy.
  • S408 The code processing system 100 presents at least one candidate to the user through the user interface.
  • the code processing system 100 may present to the user at least one candidate item determined by the code processing system 100 from the context database according to the context characteristics of the code to be completed, through a user interface such as a GUI.
  • the code processing system 100 further filters the at least one candidate item according to the statistical information, and the code processing system 100 may present to the user through a user interface such as a GUI that the code processing system 100 filters according to the statistical information later candidates.
  • the code processing system 100 may also sort the candidate items according to the statistical information, and then display the candidate items in sequence. For example, the code processing system 100 determines the score value of the candidate items according to the statistical information, and displays the candidate items in the order of the score value.
  • the candidate item includes a function name
  • the code processing system 100 further fills the candidate item with parameters according to the native code, then the code processing system 100 can present the code processing system 100 fill to the user through a user interface such as a GUI At least one candidate after the parameter.
  • the code processing system 100 also inputs the candidates into the evaluation model for evaluation, obtains the recommendation probability of the candidate, and determines the target candidate in the at least one candidate according to the recommendation probability of the at least one candidate, then The code processing system 100 may present target candidates to the user through a user interface, such as a GUI. When the code processing system 100 displays the target candidates, the candidates may be displayed in order of recommendation probability.
  • the code to be completed is the code in the class method, and the code processing system 100 also determines the recommended probability of the candidate according to the role of the method call corresponding to the code to be completed, and further determines the target candidate, the code processing system 100 also Target candidates based on the role of the method call can be displayed.
  • the code processing system 100 can determine the context of "ret" Features, such as the return type is string, and name and LastName are both of type string.
  • the code processing system 100 can determine that the probability of returning LastName is higher than the probability of returning LastName according to the distance between the variable and the code to be completed. The probability of returning name, the code processing system 100 recommends returning LastName first, and then recommends returning name.
  • the code processing system 100 When the evaluation model adopted by the code processing system 100 is also evaluated in combination with the role of the method invocation, as shown in (B) in FIG.
  • the model can determine whether LastName, name, and builder have the function corresponding to the get accessor in order from near to far. Obviously, LastName and name do not have the corresponding function, while the builder has the corresponding function. Therefore, the code processing system 100 returns the builder first. , considering that the return type is string, so call toString through the builder to return the string type. At this time, the code processing system preferably recommends returning builder.toString.
  • the code processing system 100 further filters candidates with circular references from at least one candidate, then the code processing system 100 may present to the user through a user interface such as a GUI after filtering the candidates for circular references candidate.
  • the code processing system 100 can also receive the candidate item selected by the user, and update the context database according to the candidate item and the context feature.
  • the code processing system 100 may also update the dataset used to train the model or test the model according to the candidate selected by the user and the contextual characteristics of the candidate.
  • the above embodiment mainly takes the class method completion as an example for detailed description.
  • the context characteristics of the code to be completed can be directly used.
  • Input a pre-trained completion model to get candidates.
  • the completion model may specifically be a completion model based on statistical information.
  • an embodiment of the present application provides a code processing system 100 , the system is configured to execute steps S402 to S408 in the foregoing method embodiments, and the system optionally executes optional methods in the foregoing steps.
  • the system includes IDE 102 and completion subsystem 104 .
  • the components of the IDE 102 and the completion subsystem 104 and the functions of each component can be found in the description of the relevant content above, and will not be repeated here.
  • an embodiment of the present application further provides a code processing apparatus 900, where the apparatus 900 is configured to execute the foregoing code processing method.
  • the code processing apparatus 900 may include the IDE plug-in 1024 in the system architecture described in the foregoing FIG. 1 and some or all of the modules in the foregoing completion subsystem 104 .
  • the functional division of the code processing apparatus 900 may be the same as the division in the aforementioned FIG.
  • the code processing apparatus 900 includes an IDE plug-in 1024 and a completion subsystem 104, and the completion subsystem 104 further includes a code analysis module 1042 and a context database 1044, Optionally, the completion subsystem 104 may further include a parameter filling module 1046 , an evaluation module 1048 , and an index module 1049 .
  • the code processing apparatus 900 may also have other division methods of functional units. The embodiment of the present application does not limit the division of the functional units in the apparatus 900. The following exemplarily provides a division:
  • the code processing apparatus 900 includes an interface unit 902 , a feature extraction unit 904 and an analysis unit 906 .
  • an interface unit 902 configured to receive a code input by a user through a user interface
  • a feature extraction unit 904 configured to determine the context feature of the code to be completed according to the code input by the user;
  • An analysis unit 906, configured to determine at least one candidate of the code to be completed from a context database according to the context feature of the code to be completed, where the context database stores sample code and the context feature of the sample code ;
  • the interface unit 902 is further configured to present the at least one candidate item to the user through the user interface.
  • the analysis unit 906 is further configured to:
  • the interface unit is specifically used for:
  • the filtered candidates are presented to the user through the user interface.
  • the apparatus 900 further includes:
  • the evaluation unit is configured to input the at least one candidate item and the context feature of the code to be completed into an evaluation model, obtain a recommendation probability of the at least one candidate item, and determine the recommended probability of the at least one candidate item. target candidate in at least one candidate;
  • the interface unit 902 is specifically used for:
  • the target candidates are presented to the user through the user interface.
  • the apparatus 900 further includes:
  • a parameter filling unit used for filling the parameters of the at least one candidate item according to the code in the code file where the code input by the user is located when the candidate item includes a function name;
  • the interface unit is specifically used for:
  • the at least one candidate populated with the parameters is presented to the user through the user interface.
  • the analysis unit 906 is specifically configured to:
  • a depth-first search algorithm is used to search a context database to determine at least one candidate of the code to be completed.
  • the code input by the user includes a prefix of the code to be completed
  • the analysis unit 906 is specifically used for:
  • At least one candidate matching the prefix of the code to be completed is determined from a context database according to the contextual feature of the code to be completed.
  • the context database includes at least one of a database constructed based on an open source dataset and a database constructed based on the user's private dataset.
  • the code to be completed includes code in a method of a class, and the code input by the user includes a return type.
  • the apparatus 900 further includes:
  • An evaluation unit configured to determine a role of a method call corresponding to the to-be-completion code according to a context feature of the to-be-completion code, where the role is used to assist in determining a recommendation probability of a candidate for the to-be-completion code.
  • the analysis unit 906 is further configured to:
  • the interface unit 902 is specifically used for:
  • the filtered candidates are presented to the user through the user interface.
  • the code processing apparatus 900 may correspond to executing the methods described in the embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules/units of the code processing apparatus 900 are respectively in order to realize the implementation shown in FIG. 4 .
  • the corresponding flow of each method in the example will not be repeated here.
  • the above code processing apparatus 900 may be implemented by a computing device.
  • FIG. 10 provides a computing device. As shown in FIG. 10 , the computing device 1000 can specifically be used to implement the functions of the code processing apparatus 900 in the embodiment shown in FIG. 9 above.
  • Computing device 1000 includes bus 1001 , processor 1002 , display 1003 , and memory 1004 . Communication between the processor 1002 , the memory 1004 and the display 1003 is through the bus 1001 .
  • the bus 1001 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 10, but it does not mean that there is only one bus or one type of bus.
  • the processor 1002 may be a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP), or a digital signal processor (DSP), etc. any one or more of the devices.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • the display 1003 is an input/output (I/O) device.
  • the device can display electronic files, such as code files, on the screen for viewing by the user.
  • the display 1003 can be classified into a liquid crystal display (LCD), an organic light emitting diode (OLED) display, and the like.
  • the display 1003 can display the code input by the user through the GUI, present the GUI to the user with candidate items of the code to be completed, and the like.
  • Memory 1004 may include volatile memory, such as random access memory (RAM).
  • RAM random access memory
  • the memory 1004 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard drive (HDD), or solid state drive , SSD).
  • ROM read-only memory
  • HDD hard drive
  • SSD solid state drive
  • the memory 1004 stores executable program codes, and the processor 1002 executes the executable program codes to execute the aforementioned code processing method.
  • the processor 1002 executes the above-mentioned program codes to control the display 1003 to receive the code input by the user through a user interface such as GUI, and then controls the display 1003 to transmit the code input by the user to the processor 1002 through the bus 1001, and the processor 1002 can receive the code input by the user according to the user input
  • the code determines the context feature of the code to be completed, then determines at least one candidate for the code to be completed from the context database according to the context feature of the code to be completed, and then controls the display 1003 to present at least one candidate to the user through a user interface such as a GUI item.
  • the processor 1002 may also control other interfaces to receive user-input codes.
  • the other interface may be a microphone or the like. Specifically, the microphone can receive the code entered in the form of speech.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that a computing device can store, or a data storage device such as a data center that contains one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state drives), and the like.
  • the computer-readable storage medium includes instructions, the instructions instructing the computing device to execute the above-mentioned code processing method applied to the code processing apparatus.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computing device, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted over a wire from a website site, computer or data center. (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) to another website site, computer or data center.
  • a website site e.g coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg infrared, wireless, microwave, etc.
  • the computer program product may be a software installation package, and when any one of the aforementioned code processing methods needs to be used, the computer program product may be downloaded and executed on a computing device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种代码处理方法,该方法应用于软件开发技术领域,包括:通过用户界面接收用户输入的代码,根据用户输入的代码确定待补全代码的上下文特征,然后根据待补全代码的上下文特征从上下文数据库中确定待补全代码的至少一个候选项,该上下文数据库中存储有样本代码以及样本代码的上下文特征,接着通过用户界面向用户呈现至少一个候选项。由于候选项是通过对代码进行静态分析,如静态语法分析、静态语义分析得到,候选项符合语法规则,有较高概率通过编译检查。故该方法预测的候选项具有较高准确度,基于此进行代码补全,可以提高补全精度和效率。

Description

一种代码处理方法、装置、设备及介质
本申请要求于2020年11月02日提交俄罗斯知识产权局、申请号为RU2020135915、申请名称为“一种代码智能补全方法”的俄罗斯专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及软件开发技术领域,尤其涉及一种代码处理方法、装置、设备以及计算机可读存储介质。
背景技术
在进行软件开发时,许多开发工具如集成开发环境(integrated development environment,IDE)提供有代码补全(code completion)功能。代码补全是指用户输入部分代码,如输入关键字或函数的一部分,开发工具可以向用户提供至少一个候选项,用于帮助用户补全关键字或函数。如此可以减少用户的输入操作,提高开发效率。
随着人工智能(artificial intelligence,AI)技术尤其是深度学习在文本生成中取得的进展,通过AI进行代码自动生成和补全成为了热门的研究方向。然而,目前基于AI进行代码补全时,对于待补全代码的预测准确度较低。很多情况下,用户仍需要手动补全代码,或者在接受待补全代码的预测结果之后对预测结果再手动进行修正。
业界亟需提供一种预测准确度较高的代码处理方法,进行自动地代码补全,进而提高开发效率。
发明内容
有鉴于此,本申请提供了一种代码处理方法,该方法通过根据待补全代码的上下文特征,对代码进行静态分析的方式预测待补全代码的候选项,提高了预测准确度,进而实现自动代码补全,提高开发效率。本申请还提供了上述方法对应的装置、设备、计算机可读存储介质以及计算机程序产品。
第一方面,本申请提供了一种代码处理方法。该方法可以由代码处理系统执行。代码处理系统提供有用户界面,例如是图形用户界面(graphical user interface,GUI)或者命令用户界面(command user interface,CUI)。
代码处理系统可以通过用户界面接收用户输入的代码,然后根据用户输入的代码确定待补全代码的上下文特征。其中,上下文特征是指能够表达代码所在语境的特征,例如包括基类的类型、基类的类名、前缀、返回类型和布尔特征中的任意一种或多种。接着代码处理系统根据待补全代码的上下文特征从上下文数据库中确定待补全代码的至少一个候选项。其中,上下文数据库中存储有样本代码以及样本代码的上下文特征。样本代码可以包括开源数据集或用户私有数据集中类名、方法名、函数名、变量名或者参数名等标识符中的任意一个或多个。代码处理系统可以通过用户界面如GUI向用户呈现上述至少一个候选项,进而实现代码补全。
由于待补全代码的候选项是通过对代码进行静态分析,如静态地语法分析、语义分析得到,因此待补全代码的候选项符合语法规则,而且待补全代码的候选项是根据待补全代码的上下文特征从上下特征数据库中确定的,而不是通过长短期记忆(long short time memory, LSTM)等模型预测得到,有较高的概率通过编译检查。因此,该方法预测的候选项具有较高的准确度。基于上述候选项进行代码补全可以有效减少用户手动补全代码的次数,或者有效减少用户对补全的代码进行修正的次数,极大地提高了开发效率,提高了用户体验。
并且,该方法无需复杂模型,对算力要求较低,不需要图形处理器(graphic processor unit,GPU)资源,可以部署在本地,避免网络传输时延导致卡顿等现象,提高了用户体验。
在一些可能的实现方式中,代码处理系统还可以获取至少一个候选项在上下文数据库中的统计信息,该统计信息可以包括候选项的调用次数,如嵌套调用次数、循环调用次数等等,该统计信息可以反映候选项的使用频率,代码处理系统可以根据统计信息对至少一个候选项进行过滤,例如过滤使用频率较低的候选项。对应地,代码处理系统可以通过用户界面如GUI向用户呈现过滤后的候选项,从而为用户提供使用频率较高的候选项,如此,可以有效减少候选项的数量,避免推荐过期的、弃用的应用程序编程接口(application programming interface,API)等,提高预测准确度。
其中,代码处理系统还可以根据统计信息对候选项进行排序,对应地,代码处理系统在提供候选项时,还可以按照排序结果顺序显示候选项。如此用户可以快速获知排序靠前、使用频率较高的候选项,方便用户快速选中上述候选项,提高代码补全效率。
在一些可能的实现方式中,代码处理系统还可以将至少一个候选项和待补全代码的上下文特征输入评估模型,获得至少一个候选项的推荐概率。对应地,代码处理系统可以根据至少一个候选项的推荐概率对候选项进行进一步筛选,如根据至少一个候选项的推荐概率确定至少一个候选项中的目标候选项,通过用户界面向用户呈现目标候选项。由此可以进一步提高预测准确度,提高代码补全精度和效率。
其中,评估模型可以通过从开源数据集或用户私有数据集中收集的样本对初始模型进行训练得到。初始模型可以是包括2层或者2层以上隐藏层的简单模型。隐藏层可以是全连接层(Dense layer),该隐藏层的激活函数可以是双曲函数如双曲正切函数TANH。输出层包括损失函数,该损失函数可以是交叉熵损失函数(cross entry,XENT)等等。
通过上述初始模型训练的评估模型无需消耗GPU资源,可以部署在在本地(例如是本地计算设备),如此可以降低传输时延,避免网络传输时延过长导致卡顿等现象,提高了用户体验。
在一些可能的实现方式中,评估模型可以通过二元分类模型实现。该二元分类模型以候选项以及候选项的上下文特征为输入,以推荐标签为输出。二元分类模型具体是将输入的候选项以及候选项的上下文特征与已有标识符以及该标识符的上下文特征进行匹配,从而确定推荐标签。其中,推荐标签可以取值为0,1,或者是true、false,当推荐标签为0或者是false时,表征不推荐该候选项,当推荐标签为1或者true时,表征推荐该候选项。
评估模型可以根据上述推荐标签进一步过滤候选项,提高预测候选项的准确度,从而提高代码补全精度。其中,评估模型还可以获取推荐标签为1或true的候选项的统计信息,基于统计信息确定候选项的推荐概率,例如根据评分值确定推荐概率。如此,代码处理系统可以根据该推荐概率顺序显示评估模型推荐的候选项。
在一些可能的实现方式中,候选项包括函数名时,代码处理系统还可以根据所述用户输入的代码所在代码文件中的代码(为了便于描述,下文称之为本地代码)填充所述至少一个候选项的参数。对应地,代码处理系统可以通过用户界面如GUI向用户呈现填充有上述参数的至少一个候选项,由此可以实现多符号补全。
具体地,代码处理系统可以利用深度优先搜索算法搜索函数对应的参数,例如从本地代 码中搜索得到函数对应的参数,然后基于搜索得到的参数对候选项进行参数填充。进一步地,代码处理系统还可以针对一个候选项填充多组参数,得到多个填充参数后的候选项。代码处理系统可以根据参数与待补全代码的距离在内的信息为多个填充参数后的候选项排序,按照排序结果过滤候选项,或者顺序显示候选项。如此,可以实现将接近用户输入意图的候选项在先显示,方便用户快速选中,提高代码补全效率。
需要说明的是,在面向对象的计算机语言中,函数名包括方法名。因此,代码处理系统可以根据本地代码填充方法参数,由此实现多符号补全。
在一些可能的实现方式中,代码处理系统可以根据代码的上下文特征进行代码分析,从上下文数据库中确定待补全代码的至少一个候选项。具体地,代码处理系统可以根据待补全代码的上下文特征,利用深度优先搜索(deep first search,DFS)算法搜索上下文数据库,确定待补全代码的至少一个候选项。
其中,代码处理系统可以通过深度优先搜索算法搜索出与待补全代码的上下文特征匹配的候选项。针对每一个候选项,代码处理系统可以继续进行搜索,直至搜索到静态函数调用为止。
由于该方法是结合待补全代码的上下文特征在上下文特征库中搜索出候选项,而上下文特征库中的样本代码及其上下文特征是从符合语法规则的、通过编译检查的代码提取的,因此,通过该方法得到的候选项符合语法规则,较大概率通过编译检查。
而且,该方法采用深度优先搜索算法进行匹配,能够匹配出上下文特征库中所有相符的候选项。而上下文数据库中的样本代码还可以包括生僻的标识符,如生僻的API。基于此,即使在复杂上下文环境(使用生僻API的上下文环境)下,该方法也可以从上下文数据库中确定出较为准确的候选项,实现高精度的代码预测。
在一些可能的实现方式中,用户输入的代码中包括所述待补全代码的前缀。例如是待补全的标识符的前缀。基于此,代码处理系统还可以根据输入的代码确定补全条件。该补全条件具体为待补全代码的候选项包括上述前缀。对于一个长度为N(N为大于1的正整数)的标识符而言,该标识符的前缀可以是前1位代码至前N-1位代码中的任意一个。
其中,用户输入的代码包括待补全代码的前缀的情况下,代码处理系统可以根据所述待补全代码的上下文特征从上下文数据库中确定与所述待补全代码的前缀匹配的至少一个候选项。如此,可以更精准地预测候选项,提高预测准确度。
在一些可能的实现方式中,所述上下文数据库包括基于开源数据集构建的数据库和基于所述用户的私有数据集构建的数据库中的至少一个。
以基于Java的代码补全场景为例,代码处理系统可以对开源数据集如GitHub corpus中的代码进行索引,从而识别代码中的类名、方法名、函数名、变量名、参数名、运算符等标识符,然后确定每一个标识符的上下文特征,在数据库中存储标识符以及标识符的上下文特征,从而得到上下文数据库。
代码处理系统也可以对用户的私有数据集,如用户提供的代码仓中的代码进行索引,从而识别代码中的类名、方法名、函数名、变量名、参数名、运算符等标识符,然后确定每一个标识符的上下文特征,根据该标识符及其上下文特征,得到上下文数据库。
具体地,代码处理系统可以根据开源数据集和用户私有数据集分别构建上下文数据库,利用根据开源数据集构建的上下文数据库以及根据用户私有数据集构建的上下文数据库确定待补全代码的候选项,由此提高预测准确度。
在一些可能的实现方式中,待补全代码包括类的方法(也称作类方法)中的代码,用户 输入的代码中包括返回类型。代码处理系统可以根据返回类型对类方法中的待补全代码预测候选项,基于该候选项实现类方法的补全。
在一些可能的实现方式中,一个类方法可以在不同环境中被调用。对应地,一个类方法的上下文可以是不同的。基于此,代码处理系统可以针对每一个方法调用,根据该方法调用的上下文特征确定该方法调用的角色。例如,针对getitem()方法,可以确定该方法调用的角色包括get accessor(或称作read accessor),针对add()方法可以确定该方法调用的角色包括adder,针对removeitem()方法,可以确定该方法调用的角色包括remover。
当代码处理系统在训练评估模型时,还可以根据加入方法调用的角色等特征,以此提升评估模型的精度。如此,评估模型可以结合方法调用的角色确定候选项的推荐概率,使得评估模型推荐的候选项更符合用户的意图,由此可以获得更高的补全精度。
在一些可能的实现方式中,考虑到代码中变量之间会产生关系,如生产者-消费者(producer-consumer)关系,大多数场景下,变量之间的循环引用是不合法的,例如,变量A作为变量B的消费者,同时变量B也作为变量A的消费者,通常是不合法的,因此,代码处理系统还可以对数据流进行追踪,从至少一个候选项中过滤存在循环引用的候选项,通过用户界面如GUI向用户呈现过滤后的候选项。如此可以避免循环引用的情况发生,提高补全准确度。
第二方面,本申请提供了一种代码处理装置。所述代码处理装置包括:
接口单元,用于通过用户界面接收用户输入的代码;
特征提取单元,用于根据所述用户输入的代码确定待补全代码的上下文特征;
分析单元,用于根据所述待补全代码的上下文特征从上下文数据库中确定所述待补全代码的至少一个候选项,所述上下文数据库中存储有样本代码以及所述样本代码的上下文特征;
所述接口单元,还用于通过所述用户界面向所述用户呈现所述至少一个候选项。
在一些可能的实现方式中,所述分析单元还用于:
获取所述至少一个候选项在所述上下文数据库中的统计信息;
根据所述统计信息对所述至少一个候选项进行过滤;
所述接口单元具体用于:
通过所述用户界面向所述用户呈现过滤后的候选项。
在一些可能的实现方式中,所述装置还包括:
评估单元,用于将所述至少一个候选项和所述待补全代码的上下文特征输入评估模型,获得所述至少一个候选项的推荐概率,根据所述至少一个候选项的推荐概率确定所述至少一个候选项中的目标候选项;
所述接口单元具体用于:
通过所述用户界面向所述用户呈现所述目标候选项。
在一些可能的实现方式中,所述装置还包括:
参数填充单元,用于在所述候选项包括函数名时,根据所述用户输入的代码所在文件中的代码填充所述至少一个候选项的参数;
所述接口单元具体用于:
通过所述用户界面向所述用户呈现填充有所述参数的所述至少一个候选项。
在一些可能的实现方式中,所述分析单元具体用于:
根据所述待补全代码的上下文特征,利用深度优先搜索算法搜索上下文数据库,确定所述待补全代码的至少一个候选项。
在一些可能的实现方式中,所述用户输入的代码中包括所述待补全代码的前缀;
所述分析单元具体用于:
根据所述待补全代码的上下文特征从上下文数据库中确定与所述待补全代码的前缀匹配的至少一个候选项。
在一些可能的实现方式中,所述上下文数据库包括基于开源数据集构建的数据库和基于所述用户的私有数据集构建的数据库中的至少一个。
在一些可能的实现方式中,所述待补全代码包括类的方法中的代码,且所述用户输入的代码中包括返回类型。
在一些可能的实现方式中,所述装置还包括:
评估单元,用于根据所述待补全代码的上下文特征确定所述待补全代码对应的方法调用的角色,所述角色用于辅助确定所述待补全代码的候选项的推荐概率。
在一些可能的实现方式中,所述分析单元还用于:
从所述至少一个候选项中过滤存在循环引用的候选项;
所述接口单元具体用于:
通过所述用户界面向所述用户呈现过滤后的所述候选项。
第三方面,本申请提供一种设备,所述设备包括处理器和存储器。所述处理器、所述存储器进行相互的通信。所述处理器用于执行所述存储器中存储的指令,以使得设备执行如第一方面或第一方面的任一种实现方式中的方法。
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,所述指令指示设备执行上述第一方面或第一方面的任一种实现方式所述的方法。
第五方面,本申请提供了一种包含指令的计算机程序产品,当其在设备上运行时,使得设备执行上述第一方面或第一方面的任一种实现方式所述的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。
图1为本申请实施例提供的一种代码处理系统的系统架构图;
图2为本申请实施例提供的一种显示至少一个候选项的界面示意图;
图3A为本申请实施例提供的一种代码处理系统的架构示意图;
图3B为本申请实施例提供的一种代码处理系统的架构示意图;
图4为本申请实施例提供的一种代码处理方法的流程图;
图5为本申请实施例提供的一种显示至少一个候选项的界面示意图;
图6为本申请实施例提供的一种补全代码片段之前以及之后的界面示意图;
图7为本申请实施例提供的一种代码补全的效果示意图;
图8为本申请实施例提供的一种代码补全的效果示意图;
图9为本申请实施例提供的一种代码处理装置的结构示意图;
图10为本申请实施例提供的一种计算设备的结构示意图。
具体实施方式
本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
首先对本申请实施例中所涉及到的一些技术术语进行介绍。
在软件开发领域,源代码是指开发人员利用开发工具如代码编辑器、集成开发环境(integrated development environment,IDE)所支持的计算机语言,如C语言、Java语言、Python语言等中的至少一种语言编写的代码文件。
源代码(也可以称作源程序)包括一系列人类可读的计算机语言指令。源代码中的计算机语言指令可以被编译器编译为计算机可执行的二进制指令。计算机执行该二进制指令从而实现相应的功能。其中,计算机可执行的二进制指令也可以称为目标代码。代码可以包括源代码和目标代码。
标识符(token)是源代码编译过程中最小的编译单位。标识符可以包括关键字、类名、方法名、函数名、变量名、参数名、运算符中的任意一种或多种。其中,关键字是指计算机语言中规定的具有特殊意义的字,如表征断点的break、表征返回的return等等。类(class)是指面向对象的计算机语言中的一种描述所创建对象的共同属性和方法(method)的数据结构。类名即为类的名称。方法名即为类所描述的方法的名称。例如类名可以是DocumentBuilder,方法名可以是newDocument。
函数(function)是用于实现某种功能的可执行代码块。由于方法与对象和类相关,依赖对象进行调用,因此,方法也可以视为面向对象的计算机语言中一种特殊的函数。函数名即为函数的名称,例如count、print等等。函数支持传入一些参数,对参数进行处理,进一步地,函数还可以返回一些数据,即函数还可以包括返回值。参数名即为参数的名称。定义函数名和函数体时使用的参数称为形式参数,简称形参。形参是一种虚拟变量,不占用内存。函数被调用时的参数称为实际参数,简称实参。实参是一种变量,占用内存。变量(variable)是一种存储可变数据的数据结构,该可变数据可以是可变数值,例如函数值,或者是可变文本,例如用户键入的文本。变量名是指变量的名称。
代码补全(code completion)是指根据用户(例如是开发人员)已输入的代码(例如是源代码)对用户意图输入的至少一个标识符进行预测,根据预测结果为用户提供输入建议,以便用户根据该输入建议直接补全代码。代码补全可以减少用户键入字符的次数,减少拼写错误,而且无需用户花费时间记忆不熟悉的类名、方法名等,如此可以提高开发效率。
代码补全可以分为单符号补全(single token completion)和多符号补全(multi token completion)。其中,单符号补全是指对单个标识符进行预测,进而根据该预测结果进行补全。在一些实施例中,单符号补全可以包括对类名、方法名、函数名、变量名或者参数名进行补全。多符号补全是指对多个标识符进行预测,进而根据该预测结果进行补全。在一些实施例中,多符号补全可以包括对类名、方法名和参数名中的多种进行补全,或者是对包括多个标识符的代码片段(code snippet)进行补全。其中,代码片段是一小段的源代码,该代码片段可以包括一些功能性的语句,例如类声明、函数声明,或者是有起止标识符的代码块。多符号补全的一个典型应用是对类的方法(也可以简称为类方法)进行补全,尤其是针对具有返回类型的类方法进行补全。
随着人工智能(artificial intelligence,AI)尤其是深度学习在自然语言文本生成任务中取得的突破性进展,业界提出了通过AI对计算机语言文本(具体是代码)进行处理,实现代码 自动补全的技术方案。例如,构建基于长短期记忆网络(Long Short-Term Memory,LSTM)的深度神经网络,该网络是一种时间递归神经网络,适于处理和预测具有时间顺序的序列。然后将用户输入的代码进行序列化得到输入序列,接着将该输入序列输入上述深度神经网络预测下一个标识符的候选项,由此实现单符号补全。进一步地,还可以挑选概率较高的候选项,将该候选项与输入的代码合并后进行序列化,生成新的输入序列,将该输入序列输入上述深度神经网络预测下一个标识符的候选项,由此实现多符号补全。
上述方法的关键在于代码建模。目前代码建模主要包括序列建模和抽象语法树(abstract syntax code,AST)建模。序列建模是通过对代码进行词法分析,获取token流实现,然而这种方式容易产生不符合语法的代码。AST建模是通过对代码进行词法分析和语法分析实现,基于这种方法产生的代码即使符合语法规则,也有较高的概率不能编译。也即上述方法预测的补全代码的准确度不高,难以满足用户需求。
有鉴于此,本申请提供了一种代码处理方法。该方法可以由代码处理系统执行。具体地,代码处理系统通过用户界面例如图形用户界面(graphical user interface,GUI)或者命令用户界面(command user interface,CUI)接收用户输入的代码,然后根据用户输入的代码确定待补全代码的上下文特征,接着根据待补全代码的上下文特征从上下文数据库中确定待补全代码的至少一个候选项,其中,上下文数据库中存储有样本代码以及样本代码的上下文特征,代码处理系统通过用户界面如GUI向用户呈现上述至少一个候选项,进而实现代码补全。
由于待补全代码的候选项是通过对代码进行静态分析,如静态地语法分析、语义分析得到,因此待补全代码的候选项符合语法规则,而且待补全代码的候选项是根据待补全代码的上下文特征从上下特征数据库中确定的,而不是通过LSTM等模型预测得到,有较高的概率通过编译检查。因此,该方法预测的候选项具有较高的准确度。基于上述候选项进行代码补全可以有效减少用户手动补全代码的次数,或者有效减少用户对补全的代码进行修正的次数,极大地提高了开发效率,提高了用户体验。
并且,该方法无需复杂模型,对算力要求较低,不需要图形处理器(graphic processor unit,GPU)资源,可以部署在本地,避免网络传输时延导致卡顿等现象,提高了用户体验。
上述上下文数据库中的样本代码可以包括生僻的应用程序编程接口(application programming interface,API),基于此,即使在复杂上下文环境(使用生僻API的上下文环境)下,该方法也可以从上下文数据库中确定出较为准确的候选项,实现高精度的代码预测。
进一步地,代码处理系统还可以获取至少一个候选项在上下文数据库中的统计信息,然后根据统计信息对至少一个候选项进行过滤。一方面可以进一步提高预测准确度,另一方面可以避免推荐过期的(outdated)或者弃用的(deprecated)候选项,如过期的或者弃用的API。
代码处理系统不仅可以对方法名、函数名等标识符进行预测,还可以对方法、函数的参数进行预测,即代码处理系统可以进行单符号预测或多符号预测,由于代码处理系统是通过静态分析从上下文数据库中预测候选项,因此,无论是对单符号预测还是多符号预测均具有较高准确度。此外,代码处理系统将填充有参数的候选项输入评估模型,获得候选项的推荐概率,然后基于该推荐概率进行精准推荐,进一步提高代码补全准确度。
需要说明的是,本申请实施例提供的代码处理方法可以是以插件(plug-in)的形式提供给用户使用。插件是一种遵循一定规范的应用程序接口编写出来的程序,该程序运行在程序规定的平台下(可能同时支持多个平台),而不能脱离规定的平台单独运行。
具体地,开发工具的服务提供商或者是第三方可以发布针对开发工具如IDE或代码编辑器的插件,以增强该开发工具的功能。为了便于描述,本申请以开发工具为IDE进行示例说 明。
参见图1所示的代码处理系统的系统架构图,如图1所示,代码处理系统100包括IDE102和位于后端的补全子系统104。其中,IDE102包括IDE内核1022(IDE core)和安装在该IDE102中的IDE插件1024。补全子系统104包括代码分析模块1042和上下文数据库1044。可选地,补全子系统104还可以包括参数填充模块1046、评估模块1048和索引模块1049中的任意一个或多个。
具体地,IDE内核1022用于提供IDE102的原生功能,如代码提示、代码拼写检测等等,IDE插件1024用于和补全子系统104交互实现增强功能,如实现智能代码补全。下面对交互过程进行详细说明。
在一些实施例中,IDE插件1024可以通过用户界面接收用户输入的代码,获取输入光标在代码中的位置,其中输入光标所在位置即为待补全代码的位置,由此可以确定待补全代码的上下文特征,例如基类的类型(如public、private、protected等)、基类的类名、前缀、返回类型以及布尔特征中的任意一种或多种。其中,布尔特征可以包括如下特征中的至少一种:
private boolean is_in_direct_new;
private boolean is_in_binary_op;
private boolean is_in_variable_name;
private boolean inClassDeclarationName;
private boolean is_in_interface;
private boolean is_in_for_declaration。
IDE插件1024可以将待补全代码的上下文特征发送至补全子系统104,补全子系统104根据待补全代码的上下文特征进行静态分析,生成待补全代码的至少一个候选项,并将至少一个候选项返回给IDE插件1024。IDE102可以向用户呈现待补全代码的至少一个候选项,以便用户从中选择一个候选进行代码补全。
其中,补全子系统104的上下文数据库1044中存储有样本代码以及样本代码的上下文特征。其中,样本代码可以是类名、方法名、函数名、变量名或者参数名等标识符,需要说明,样本代码可以是单标识符,也可以是多标识符。样本代码的上下文特征具体为基于样本代码的上下文提取的特征,例如可以是变量类型、对象类型、返回类型等等。补全子系统104的代码分析模块1042可以根据待补全代码的上下文特征,从上下文数据库1044中确定待补全代码的至少一个候选项。
为了便于理解,本申请实施例还提供了IDE102向用户呈现候选项的界面示意图。如图2所示,代码编辑界面200呈现有用户输入的代码,具体如图中202所示。需要说明,图2中202展示了用户输入的代码的一个片段,具体为:
public static void basicString(){
Document doc=
在该代码片段之前或者在该代码片段之后还可以包括一些代码片段,图2中以“…”进行示意说明。IDE插件1024获取输入光标的位置,具体为“=”之后,IDE插件1024可以根据该位置确定待补全代码的上下文,进而从中提取待补全代码的上下文特征。在该示例中,待补全代码的上下文特征可以包括返回类型为Document类型。IDE插件1024将上下文特征发送至补全子系统104,代码分析模块1042根据该上下文特征,从上下文数据库中确定待补全代码的至少一个候选项。可选地,代码分析模块1042还可以获取候选项在上下文数据库1044中的统计信息,根据统计信息对至少一个候选项进行过滤。进一步地,代码分析模块1042 还可以将候选项(例如是过滤后的候选项)传输至参数填充模块1046、评估模块1048进行后续处理,如进行参数填充、推荐概率评估。IDE102可以获取上述候选项,例如是代码分析模块1042过滤后的候选项,参数填充模块1046填充参数的候选项,或者是评估模块1048根据推荐概率确定的候选项,并显示上述候选项,具体如图2中204所示。
需要说明的是,IDE内核1022也可以通过文本补全技术,根据用户输入的代码直接生成至少一个候选项。在一些实施例中,IDE内核1022本身具有的补全方法,如文本补全方法可以和本申请实例提供的代码补全方法兼容使用。当IDE内核1022提供的文本补全方法与本申请实施例提供的代码处理方法一起使用时,IDE102可以一并显示文本补全方法生成的候选项和本申请实施例的代码处理方法生成的候选项,具体如图2中204和206所示。
其中,IDE102在显示不同方法生成的候选项时,还可以通过不同显示方式进行区分。例如,IDE102可以在候选项之前添加该候选项所采用的方法的标识。如图2中所示,IDE102可以采用圆圈标识IDE内核1022通过文本补全方法生成的候选项,采用添加有方框的加号标识通过本申请实施例的代码处理方法生成的候选项。又例如,IDE102也可以通过不同颜色或者不同字体区分不同方法生成的候选项。
需要说明的是,IDE102可以根据每个候选项的概率依次显示候选项。如此可以方便用户快速获知概率较高的候选项,以及选中概率较高的候选项进行代码补全。
在一些可能的实现方式中,代码分析模块1042可以根据待补全代码的上下文特征,从代码特征库中确定出较多的候选项。考虑到有些候选项的使用概率较低,代码分析模块1042还可以根据候选项在上下文数据库1044中的统计信息,如被调用的次数callCount、嵌套调用次数nestedCount等,根据该统计信息对至少一个候选项进行过滤,滤除概率较低的候选项。
进一步地,候选项为函数名(面向对象的计算机语言中,函数名可以是方法名)时,参数填充模块1046还可以对候选项,例如对过滤后的候选项进行参数填充,由此实现多符号补全,避免用户手动输入函数的参数。具体地,参数填充模块1046可以通过搜索本地代码(具体是用户输入的代码所在代码文件中的代码)的方式确定合适的参数,并将该参数填充到候选项。接着参数填充模块1046将填充参数后的候选项输入评估模块1048,评估模块1048可以通过评估模型对填充参数后的候选项进行评估,确定填充参数后的候选项的推荐概率。
评估模块1048可以将各填充参数后的候选项及其推荐概率发送至IDE插件1024。对应地,IDE102可以根据推荐概率显示排序前N的候选项,或者显示推荐概率大于预设概率的候选项。需要说明的是,评估模块1048也可以根据推荐概率对填充参数后的候选项进行筛选,例如筛选排序前N的候选项,或者筛选推荐概率大于预设概率的候选项,然后向IDE插件1024返回筛选的候选项,IDE显示筛选的候选项,或者显示筛选的候选项及其推荐概率。
索引模块1049还可以对本地代码如用户的代码仓中的代码进行索引,根据索引结果获取样本代码以及样本代码的上下文特征,存储在上下文数据库1044中。对应地,代码分析模块1042根据待补全代码的上下文特征从上下文数据库1044中确定的候选项可以包括本地代码调用,如包括本地变量名。索引模块1049通过对用户的代码仓中的代码进行索引,丰富了上下文数据库,从而使得在进行智能代码补全时,智能推荐结果更准确。
在一些可能的实现方式中,开发工具的服务提供商也可以在开发上述开发工具时,将本申请实施例提供的代码处理方法的相关代码原生地嵌入在内核中,如此可以省去安装插件的操作,避免插件安装引入的安全隐患。
在本申请实施例中,代码处理系统100的各个部分可以集中地部署在本地计算设备(处于用于用户直接控制之下的用户设备,如笔记本电脑、台式机、智能手机等用户终端)或者 云计算集群(包括至少一个云计算设备,例如:云服务器等)中。当然,代码处理系统100的各个部分也可以分布式地部署在云计算集群中。下面对代码处理系统100的部署方式进行详细说明。
在一些可能的实现方式中,IDE102和补全子系统104可以部署在本地计算设备,如个人计算机(personal computer,PC)等终端设备中。由于本申请实施例中的代码处理系统100在进行代码分析时的算力消耗较小,而且,评估模型通过简单的(例如包括2层隐藏层)的网络即可实现,因此,对于终端设备的计算性能的要求较低,轻量级的终端设备即可满足业务需求。此外,补全子系统104与IDE102一同部署在本地计算设备,可以减少IDE102和补全子系统104的交互时间,避免网络传输时延导致补全功能卡顿,影响用户体验。
在另一些可能的实现方式中,IDE102和补全子系统104可以部署在云计算集群中。也即IDE 102为cloud IDE,IDE102和补全子系统104以云服务形式提供给用户使用。
其中,云服务提供商可以将补全子系统104提供的智能代码补全服务和cloud IDE提供的代码开发环境提供服务整合成一个云服务提供给用户使用,也可以分别提供cloud IDE和代码智能补全两个云服务供用户使用。在一些情况下,云服务提供商可以将代码智能补全服务作为cloud IDE的增值服务,用户购买或者租赁该增值服务后,云服务提供商将其结合在cloud IDE中提供给用户使用。
参见图3A所示的代码处理系统100的架构示意图,如图3A所示,IDE102和补全子系统104由一个云服务提供商提供,IDE102和补全子系统104可以部署在同一个云计算集群。在本申请实施例其他可能的实现方式中,IDE102和补全子系统104也可以由不同云服务提供商提供,部署在不同云计算集群。
接着参见图3B所示的代码处理系统100的架构示意图,IDE102可以部署在本地计算设备,补全子系统104可以部署在云计算集群。用户通过本地计算设备中部署的IDE102进行代码处理时,如果触发代码补全,则IDE102调用云计算集群中的补全子系统104获得待补全代码的至少一个候选项。其中,使用该补全服务的用户可以预先注册云服务,如此可以实现云服务引流。
而且,部署在云计算集群中的评估模型可以是基于用户的私有数据集,例如基于用户提供的代码仓构建的数据集训练得到的定制模型,该模型与用户环境更贴合,具有较好的推荐效果。
上述图3A至图3B仅是本申请实施例中代码处理系统100部署方式的一些具体示例,在本申请实施例其他可能的实现方式中,代码处理系统100可以采用其他方式部署,例如IDE102部署在云端,补全子系统104部署在本地计算设备,本申请实施例对此不作限定。
接下来,从代码处理系统100的角度,对本申请实施例提供的代码处理方法进行详细介绍。
参见图4所示的代码处理方法的流程图,该方法包括:
S402:代码处理系统100通过用户界面接收用户输入的代码。
代码处理系统100(例如是IDE102)可以通过用户界面(如GUI或CUI)接收用户通过物理键盘键入的代码。当然,代码处理系统100也可通过用户界面接收用户通过虚拟键盘以触控方式键入的代码。
在一些可能的实现方式中,代码处理系统100也可以通过用户界面选中代码文件,以接收该代码文件中的代码。其中,该代码文件中的代码可以包括当前用户在先编写的代码,也 可以包括其他用户在先编写的代码。
用户输入的代码可以是根据单一计算机语言编写的代码,例如是根据C语言编写的代码,根据Java语言编写的代码,或者是根据Python语言编写的代码。在一些实施例中,用户输入的代码也可以是根据多种计算机语言、采用混合编程方式编写的代码,例如是根据C语言或嵌入式汇编语言编写的代码。
S404:代码处理系统100根据用户输入的代码确定待补全代码的上下文特征。
代码处理系统100(例如是IDE102中的IDE插件1024)可以捕获输入光标所在的位置,当触发代码补全功能时,该输入光标所在的位置即为补全位置。需要说明的是,补全位置可以是输入行的末尾位置,或者是中间位置。当然,在一些实施例中,补全位置也可以是输入行的起始位置。
在计算机语言中,代码的上下文(context)可以理解为代码所在的语境或者环境。上下文特征(features of context)是指能够表达代码所在语境的特征,例如包括基类的类型、基类的类名、前缀、返回类型和布尔特征中的任意一种或多种。
以输入的代码包括“Document doc=”为例,待补全代码的补全位置为“=”之后的位置,该位置具体为输入的末尾位置,待补全代码的上下文特征可以包括返回值类型为Document类型,布尔类型is_in_API为true。其中,is_in_API为true指示当前补全为类方法补全。
补全类型可以是单符号补全或者是多符号补全。代码处理系统可以根据预先设置确定补全类型为单符号补全或者多符号补全,也可以根据用户触发代码补全功能时设置的补全类型确定本次补全行为为单符号补全或者是多符号补全。
触发补全功能可以有多种实现方式。例如代码处理系统100可以检测用户停止输入的时间达到预设时间,则确定触发代码补全功能。又例如代码处理系统100可以设置触发条件,如鼠标右键双击、单击快捷键(例如是tab键),当代码处理系统100检测到触发条件被满足时,如检测到用户双击鼠标右键或者单击快捷键时,触发代码补全功能。进一步地,用户还可以在触发代码补全功能时,设置补全类型为单符号补全或多符号补全。
在一些可能的实现方式中,输入的代码可以包括待补全代码的前缀,例如是待补全的标识符的前缀。基于此,代码处理系统100还可以根据输入的代码确定补全条件。该补全条件具体为待补全代码的候选项包括上述前缀。对于一个长度为N(N为大于1的正整数)的标识符而言,该标识符的前缀可以是前1位代码至前N-1位代码中的任意一个。
S406:代码处理系统100根据待补全代码的上下文特征从上下文数据库中确定待补全代码的至少一个候选项。
上下文数据库中存储有样本代码以及样本代码的上下文特征。样本代码可以包括类名、方法名、函数名、变量名或者参数名等标识符中的任意一个或多个。该样本代码可以来自开源数据集或者用户的私有数据集。
具体地,以基于Java的代码补全场景为例,代码处理系统100可以对开源数据集如GitHub corpus中的代码进行索引,从而识别代码中的类名、方法名、函数名、变量名、参数名、运算符等标识符,然后确定每一个标识符的上下文特征,在数据库中存储标识符以及标识符的上下文特征,从而得到上下文数据库。
在一些可能的实现方式中,代码处理系统100也可以对用户私有的数据集,如用户提供的代码仓中的代码进行索引,从而识别代码中的类名、方法名、函数名、变量名、参数名、运算符等标识符,然后确定每一个标识符的上下文特征,根据该标识符及其上下文特征,得到上下文数据库。
考虑到准确度,代码处理系统100也可以根据开源数据集和用户的私有数据集分别构建上下文数据库,例如构建第一上下文数据库和第二上下文数据库,第一上下文数据库用于存储开源数据集中的标识符以及标识符的上下文特征,第二上下文数据库用于存储用户私有数据集中的标识符以及标识符的上下文特征。
在构建上下文数据库后,代码处理系统100(例如是补全子系统104中的代码分析模块1042)可以根据代码的上下文特征进行代码分析,从上下文数据库中确定待补全代码的至少一个候选项。
具体地,代码处理系统100(例如是补全子系统104中的代码分析模块1042)根据待补全代码的上下文特征,利用深度优先搜索(deep first search,DFS)算法搜索上下文数据库,确定待补全代码的至少一个候选项。
深度优先搜索算法是一种用于遍历树(例如是代码的抽象语法树)或图的算法。以遍历树为例进行说明,沿着树的深度遍历树的节点,尽可能深地搜索树的分支。当节点v的所在边都己被探寻过,搜索将回溯到发现节点v的那条边的起始节点。这一过程一直进行到已发现从源节点可达的所有节点为止。如果还存在未被发现的节点,则选择其中一个作为源节点并重复以上过程,整个进程反复进行直到所有节点都被访问为止。
代码处理系统100可以通过深度优先搜索算法搜索出与待补全代码的上下文特征匹配的候选项,针对每一个候选项,代码处理系统100可以继续进行搜索,直至搜索到静态函数调用为止。
以“Document doc=”为例,返回值类型为Document类型,代码处理系统100可以搜索返回值类型为Document类型的函数调用或者API调用,例如包括newDocument()或parse()。针对newDocument,代码处理系统100继续进行深度搜索,确定newDocument被DocumentBuilder调用,然后搜索返回DocumentBuilder的调用,例如包括newDocumentBuilder(),接着代码处理系统100根据newDocumentBuilder()继续进行深度搜索,具体地,代码处理系统100确定DocumentBuilder被DocumentBuilderFactory调用,代码处理系统100搜索返回DocumentBuilderFactory.newInstance。
由于DocumentBuilderFactory.newInstance为静态函数调用,该静态函数调用可以在等号右边直接被调用,因此代码处理系统100可以停止针对newDocument的深度搜索,生成候选项:DocumnetBuilderFactory.newInstance().newdDocumentBuilder().newDocument()。
针对parse()的深度搜索过程可以参考针对newDocument()的深度搜索过程,基于该搜索过程可以生成候选项:
DocumnetBuilderFactory.newInstance().newdDocumentBuilder().parse()。
进一步地,当用户输入的代码中包括待补全代码的前缀时,代码处理系统100(例如是代码分析模块1042)可以从搜索得到的候选项中确定与待补全代码的前缀匹配的至少一个候选项。以“Document doc=Doc”为例,前缀包括Doc,则代码处理系统100可以从搜索到的返回值为Document类型的候选项中筛选前缀包括Doc的候选项,滤除前缀不包括Doc的候选项,如DOMUtils.getOwnerDocument(new IIOMetadataNode(“”))。
在一些可能的实现方式中,代码处理系统100还可以确定至少一个候选项的评价指标值,然后根据该评价指标值对至少一个候选项过滤,或者是对评价指标值进行排序。
其中,评价指标值可以是根据候选项在上下文数据库1044中的统计信息确定的评分值,或者是基于评估模型确定的推荐概率的概率值。统计信息可以包括使用(useage)信息。例如候选项包括类名(typename)时,统计信息可以包括类使用信息;又例如候选项包括方法 名时,统计信息可以包括方法使用信息。
其中,类使用信息具体可以包括如下信息中的任意一种或多种:
public int nestedCount;
public int extendsCount;
public int fieldCount;
public int assignCount;
public int ifCount;
public int finallyCount;
public int localVariableCount;
public int parameterCount;
public int newCount;
public int callBaseCount;
public int totalCount;
public int localCount;
public int samePackage。
类似地,方法使用信息具体可以包括如下信息中的任意一种或多种:
public int callCount;
public int constructorCount;
public int methodCount;
public int repeatCount;
public int firstCount;
public int boolCount;
public int finallyCount;
public int nestedCount;
public int rightAssignSide;
public int inReturn。
代码处理系统100(例如是补全子系统104中的代码分析模块1042)可以根据候选项的上述使用信息,确定候选项的评分值。具体地,代码处理系统100可以为不同使用信息分别赋予权重,然后通过加权运算(如加权求和、加权求平均值)确定候选项的评分值。
代码处理系统100可以根据评分值对至少一个候选项进行过滤,例如代码处理系统100可以滤除评分值低于预设值的候选项,或者评分值排序靠后(如Top 10之后)的候选项,如此可以避免推荐过期的、弃用的API等,提高准确度。进一步地,代码处理系统100还可以根据评分值对至少一个候选项进行排序,以便按照评分值高低顺序显示候选项。
在一些可能的实现方式中,候选项包括函数名时,代码处理系统100还可以对至少一个候选项填充参数。具体地,代码处理系统100可以利用深度优先搜索算法搜索函数对应的参数,例如从本地代码中搜索得到函数对应的参数,然后基于搜索得到的参数对候选项进行参数填充。
其中,代码处理系统100(例如是补全子系统104的参数填充模块1046)在填充参数时,可以针对一个候选项填充多组参数,得到多个填充参数后的候选项。如图5所示,代码处理系统100可以根据参数与待补全代码的距离(distance)在内的信息为多个填充参数后的候选项排序,例如代码处理系统100可以将填充有距离较近的参数的候选项502排在靠前的位置 进行显示。
在一些可能的实现方式中,候选项包括函数名时,代码处理系统100还可以向用户提供填充有参数的候选项以及未填充参数的候选项,以便填充有参数的候选项不满足用户需求时,用户可以选择上述未填充参数的候选项,手动输入参数,避免不必要的修正操作。
在填充参数后,代码处理系统100不仅可以用于补全整行代码,还可以用于进一步补全整个代码片段。其中,补全代码片段实质是多符号补全,其具体实现方式可以参考补全整行代码的具体实现方式。
图6还使出了补全整个代码片段的界面图,图6中所示(A)为补全代码片段之前的界面示意图,如图6中(A)所示,用户输入的代码包括:
public static void documentBuilderMethod(){
String path=“/path/to/file”;
Document doc=
基于上述代码,代码处理系统100可以识别本地变量path和file,基于document类型,确定至少一个候选项包括:
DocumnetBuilderFactory.newInstance().newdDocumentBuilder().parse()
接着,代码处理系统100识别file为上述候选项的参数,同时需要在此之前创建file,path作为file的参数。此外,代码处理系统100识别到分析器配置异常ParserConfigurationException,根据该异常补全代码片段中的try catch语句,如图6中(B)所示,加粗以及倾斜的代码即为补全的代码。
在一些可能的实现方式中,代码处理系统100还可以将至少一个候选项(例如是填充有参数的候选项)和待补全代码的上下文特征输入评估模型,获得至少一个候选项的推荐概率。对应地,代码处理系统100可以根据至少一个候选项的推荐概率确定至少一个候选项中的目标候选项。其中,目标候选项是推荐概率满足预设条件的候选项,例如推荐概率大于预设概率值,或者推荐概率排名靠前(例如是前N,N为正整数)的候选项。
其中,评估模型可以通过从开源数据集或用户私有数据集中收集的样本对初始模型进行训练得到。具体地,代码处理系统100可以构建初始模型,该初始模型可以是包括2层或者2层以上隐藏层的模型。在一些实施例中,初始模型可以包括一层输入层,两层隐藏层,以及一层输出层。其中,隐藏层可以是全连接层(Dense layer),该隐藏层的激活函数可以是双曲函数如双曲正切函数TANH。输出层包括损失函数,该损失函数可以是交叉熵损失函数(cross entry,XENT)等等。
然后,代码处理系统100可以将从开源数据集或用户私有数据集中收集的样本(包括标识符以及标识符的上下文特征)输入初始模型进行训练,以迭代更新初始模型的参数。当模型的损失函数满足训练结束条件,如损失函数趋于收敛或者小于预设值时,则可以停止训练。经过训练的模型可以作为评估模型,用于评估填充参数的候选项正确的概率。其中,填充参数的候选项正确的概率可以作为该候选项的推荐概率。
其中,评估模型也可以通过二元分类模型实现。该二元分类模型以候选项(例如可以是填充有参数的候选项)以及候选项的上下文特征为输入,以推荐标签为输出。二元分类模型具体是将输入的候选项以及候选项的上下文特征与已有标识符以及该标识符的上下文特征进行匹配,从而确定推荐标签。其中,推荐标签可以取值为0,1,或者是true、false,当推荐标签为0或者是false时,表征不推荐该候选项,当推荐标签为1或者true时,表征推荐该候选项。
代码处理系统100可以根据上述推荐标签进一步过滤候选项,提高预测候选项的准确度,从而提高代码补全精度。其中,评估模型还可以获取推荐标签为1的候选项的统计信息,基于统计信息确定候选项的推荐概率,例如根据评分值确定推荐概率。
在一些可能的实现方式中,一个类方法可以在不同环境中被调用。对应地,一个类方法的上下文可以是不同的。基于此,代码处理系统100可以针对每一个方法调用,根据该方法调用的上下文特征确定该方法调用的角色。例如,针对getitem()方法,可以确定该方法调用的角色包括get accessor(或称作read accessor),针对add()方法可以确定该方法调用的角色包括adder,针对removeitem()方法,可以确定该方法调用的角色包括remover。
当代码处理系统100在训练评估模型时,还可以根据加入方法调用的角色等特征,以此提升评估模型的精度。如此,评估模型可以结合方法调用的角色确定候选项的推荐概率,使得评估模型推荐的候选项更符合用户的意图,由此可以获得更高的补全精度。
考虑到代码中变量之间会产生关系,如生产者-消费者(producer-consumer)关系,大多数场景下,变量之间的循环引用是不合法的,例如,变量A作为变量B的消费者,同时变量B也作为变量A的消费者,通常是不合法的,因此,代码处理系统100还可以对数据流进行追踪,从而避免循环引用的情况发生,提高补全准确度。
S408:代码处理系统100通过所述用户界面向用户呈现至少一个候选项。
具体地,代码处理系统100可以通过用户界面如GUI,向用户呈现该代码处理系统100根据待补全代码的上下文特征从上下文数据库中确定的至少一个候选项。在一些可能的实现方式中,代码处理系统100还根据统计信息对上述至少一个候选项进行过滤处理,则代码处理系统100可以通过用户界面如GUI,向用户呈现该代码处理系统100根据统计信息过滤后的候选项。其中,代码处理系统100还可以根据统计信息对候选项排序,然后顺序显示候选项。例如,代码处理系统100根据统计信息确定候选项的评分值,按照评分值高低顺序显示候选项。
在一些可能的实现方式中,候选项包括函数名,代码处理系统100还根据本地代码对候选项进行参数填充,则代码处理系统100可以通过用户界面如GUI,向用户呈现该代码处理系统100填充参数后的至少一个候选项。
在一些可能的实现方式中,代码处理系统100还将候选项输入评估模型进行评估,获得候选项的推荐概率,并根据至少一个候选项的推荐概率确定至少一个候选项中的目标候选项,则代码处理系统100可以通过用户界面如GUI向用户呈现目标候选项。代码处理系统100显示目标候选项时,可以按照推荐概率顺序显示候选项。
其中,待补全的代码为类方法中的代码,代码处理系统100还根据待补全代码对应的方法调用的角色,确定候选项的推荐概率,进而确定目标候选项时,代码处理系统100还可以显示基于方法调用的角色确定的目标候选项。
下面结合一具体示例进行说明。参见图7所示的代码补全的效果示意图,如图7所示,用户对方法getFullName()进行补全,具体地,用户输入“ret”之后,代码处理系统100可以确定“ret”的上下文特征,如返回类型为string,而name、LastName均为string类型。
当代码处理系统100采用的评估模型未结合方法调用的角色进行评估时,如图7中(A)所示,代码处理系统100可以根据变量与待补全代码的距离确定返回LastName的概率高于返回name的概率,代码处理系统100优先推荐返回LastName、然后推荐返回name。
当代码处理系统100采用的评估模型还结合方法调用的角色进行评估时,如图7中(B)所示,代码处理系统100可以确定该方法调用的角色包括get accessor,代码处理系统100的 评估模型可以按照由近及远的顺序依次确定LastName、name、builder是否具有get accessor对应的功能,显然LastName、name不具有相应的功能,而builder具有相应的功能,因此,代码处理系统100优先返回builder,考虑到返回类型为string,因此,通过builder调用toString,返回string类型。此时代码处理系统优先推荐返回builder.toString。
虽然图7中(A)所示的推荐结果如return LastName虽然语法正确,能够通过编译,然而与用户的意图仍存在一定差距,而图7中(B)所示的推荐结果更接近用户的真实意图,进一步提高了准确度。并且,通过图7中(B)对应的方法可以实现对未知pattern的预测,具有较好的泛化性能。
在一些可能的实现方式中,代码处理系统100还从至少一个候选项中过滤存在循环引用的候选项,则代码处理系统100可以通过用户界面如GUI向用户呈现对循环引用的候选项进行过滤之后的候选项。
下面结合一具体示例进行说明。参见图8所示的代码补全的效果示意图,首先新建一个frame,然后新建一个panel,接着通过add方法将panel加入frame,然后用户输入panel.a触发有前缀的代码补全,如果不进行数据流追踪,如图8(A)所示,代码处理系统100根据距离优先返回add(frame),frame和panel之间形成循环引用,如果进行数据流追踪,如图8(B)所示,代码处理系统100可以过滤循环引用的候选项add(frame),优先返回add(label)。由此可见,通过对数据流进行追踪,可以使得代码补全的准确度更高,提高了代码补全的用户体验。
进一步地,代码处理系统100在向用户呈现候选项(例如是填充有参数的候选项)之后,还可以接收用户选择的候选项,根据该候选项以及上下文特征更新上下文数据库。在一些实施例中,代码处理系统100还可以根据用户选择的候选项以及该候选项的上下文特征,更新用于训练模型或测试模型的数据集。
需要说明的是,上述实施例主要以类方法补全作为示例进行详细说明,在对包括类名、变量名以及无返回类型的方法等进行补全时,可以直接将待补全代码的上下文特征输入预先训练的补全模型得到候选项。其中,补全模型具体可以是基于统计信息的补全模型。
上文结合图1至图8对本申请实施例提供的代码处理方法进行了详细介绍,接下来结合附图对本申请实施例提供的代码处理系统、代码处理装置以及用于实现代码处理功能的计算设备进行介绍。
参见图1,本申请实施例提供一种代码处理系统100,该系统用于执行前述方法实施例中步骤S402至S408,且该系统可选地执行前述各步骤中可选的方法。该系统包括IDE102和补全子系统104。IDE102和补全子系统104的组成以及各组成部分的功能参见上文相关内容描述,在此不再赘述。
如图9所示,本申请实施例还提供一种代码处理装置900,该装置900用于执行前述代码处理方法。代码处理装置900可以包括前述图1描述的系统架构中的IDE插件1024和前述补全子系统104中的部分或者全部模块。代码处理装置900的功能划分可以与前述图1中的划分相同,例如:代码处理装置900包括IDE插件1024和补全子系统104,补全子系统104进一步包括代码分析模块1042、上下文数据库1044,可选的,补全子系统104,还可以包括参数填充模块1046、评估模块1048、索引模块1049。代码处理装置900还可以有其他的功能单元的划分方式,本申请实施例对该装置900中的功能单元的划分不做限定,下面示例性地提供一种划分:
代码处理装置900包括接口单元902、特征提取单元904和分析单元906。
接口单元902,用于通过用户界面接收用户输入的代码;
特征提取单元904,用于根据所述用户输入的代码确定待补全代码的上下文特征;
分析单元906,用于根据所述待补全代码的上下文特征从上下文数据库中确定所述待补全代码的至少一个候选项,所述上下文数据库中存储有样本代码以及所述样本代码的上下文特征;
接口单元902,还用于通过所述用户界面向所述用户呈现所述至少一个候选项。
在一些可能的实现方式中,所述分析单元906还用于:
获取所述至少一个候选项在所述上下文数据库中的统计信息;
根据所述统计信息对所述至少一个候选项进行过滤;
所述接口单元具体用于:
通过所述用户界面向所述用户呈现过滤后的候选项。
在一些可能的实现方式中,所述装置900还包括:
评估单元,用于将所述至少一个候选项和所述待补全代码的上下文特征输入评估模型,获得所述至少一个候选项的推荐概率,根据所述至少一个候选项的推荐概率确定所述至少一个候选项中的目标候选项;
所述接口单元902具体用于:
通过所述用户界面向所述用户呈现所述目标候选项。
在一些可能的实现方式中,所述装置900还包括:
参数填充单元,用于所述候选项包括函数名时,根据所述用户输入的代码所在代码文件中的代码填充所述至少一个候选项的参数;
所述接口单元具体用于:
通过所述用户界面向所述用户呈现填充有所述参数的所述至少一个候选项。
在一些可能的实现方式中,所述分析单元906具体用于:
根据所述待补全代码的上下文特征,利用深度优先搜索算法搜索上下文数据库,确定所述待补全代码的至少一个候选项。
在一些可能的实现方式中,所述用户输入的代码中包括所述待补全代码的前缀;
所述分析单元906具体用于:
根据所述待补全代码的上下文特征从上下文数据库中确定与所述待补全代码的前缀匹配的至少一个候选项。
在一些可能的实现方式中,所述上下文数据库包括基于开源数据集构建的数据库和基于所述用户的私有数据集构建的数据库中的至少一个。
在一些可能的实现方式中,所述待补全代码包括类的方法中的代码,且所述用户输入的代码中包括返回类型。
在一些可能的实现方式中,所述装置900还包括:
评估单元,用于根据所述待补全代码的上下文特征确定所述待补全代码对应的方法调用的角色,所述角色用于辅助确定所述待补全代码的候选项的推荐概率。
在一些可能的实现方式中,所述分析单元906还用于:
从所述至少一个候选项中过滤存在循环引用的候选项;
所述接口单元902具体用于:
通过所述用户界面向所述用户呈现过滤后的所述候选项。
根据本申请实施例的代码处理装置900可对应于执行本申请实施例中描述的方法,并且 代码处理装置900的各个模块/单元的上述和其它操作和/或功能分别为了实现图4所示实施例中的各个方法的相应流程,为了简洁,在此不再赘述。
上述代码处理装置900可以通过计算设备实现。图10提供了一种计算设备,如图10所示,计算设备1000具体可以用于实现上述图9所示实施例中代码处理装置900的功能。
计算设备1000包括总线1001、处理器1002、显示器1003和存储器1004。处理器1002、存储器1004和显示器1003之间通过总线1001通信。
总线1001可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器1002可以为中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。
显示器1003是一种输入输出(input/output,I/O)设备。该设备可以将电子文件如代码文件显示到屏幕上,以供用户查看。根据制造材料不同,显示器1003可以分为液晶显示器(liquid crystal display,LCD)、有机电激光(organic light emitting diode,OLED)显示器等。具体地,显示器1003可以通过GUI显示用户输入的代码,向GUI向用户呈现待补全代码的候选项等等。
存储器1004可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器1004还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard drive drive,HDD)或固态硬盘(solid state drive,SSD)。
存储器1004中存储有可执行的程序代码,处理器1002执行该可执行的程序代码以执行前述代码处理方法。具体地,处理器1002执行上述程序代码,以控制显示器1003通过用户界面如GUI接收用户输入的代码,然后控制显示器1003通过总线1001传输用户输入的代码至处理器1002,处理器1002可以根据用户输入的代码确定待补全代码的上下文特征,接着根据待补全代码的上下文特征从上下文数据库中确定待补全代码的至少一个候选项,然后控制显示器1003通过用户界面如GUI向用户呈现至少一个候选项。
在一些可能的实现方式中,处理器1002还可以控制其他接口接收用户输入的代码。其中,其他接口可以是麦克风等。具体地,麦克风可以接收以语音形式输入的代码。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行上述应用于代码处理装置的代码处理方法。
本申请实施例还提供了一种计算机程序产品。所述计算机程序产品包括一个或多个计算机指令。在计算设备上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。
所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机或数据 中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机或数据中心进行传输。
所述计算机程序产品可以为一个软件安装包,在需要使用前述代码处理方法的任一方法的情况下,可以下载该计算机程序产品并在计算设备上执行该计算机程序产品。
上述各个附图对应的流程或结构的描述各有侧重,某个流程或结构中没有详述的部分,可以参见其他流程或结构的相关描述。

Claims (22)

  1. 一种代码处理方法,其特征在于,所述方法包括:
    通过用户界面接收用户输入的代码;
    根据所述用户输入的代码确定待补全代码的上下文特征;
    根据所述待补全代码的上下文特征从上下文数据库中确定所述待补全代码的至少一个候选项,所述上下文数据库中存储有样本代码以及所述样本代码的上下文特征;
    通过所述用户界面向所述用户呈现所述至少一个候选项。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取所述至少一个候选项在所述上下文数据库中的统计信息;
    根据所述统计信息对所述至少一个候选项进行过滤;
    所述通过所述用户界面向所述用户呈现所述至少一个候选项,包括:
    通过所述用户界面向所述用户呈现过滤后的候选项。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    将所述至少一个候选项和所述待补全代码的上下文特征输入评估模型,获得所述至少一个候选项的推荐概率;
    所述通过所述用户界面向所述用户呈现所述至少一个候选项,包括:
    根据所述至少一个候选项的推荐概率确定所述至少一个候选项中的目标候选项;
    通过所述用户界面向所述用户呈现所述目标候选项。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述候选项包括函数名,所述方法还包括:
    根据所述用户输入的代码所在代码文件中的代码填充所述至少一个候选项的参数;
    所述通过所述用户界面向所述用户呈现所述至少一个候选项,包括:
    通过所述用户界面向所述用户呈现填充有所述参数的所述至少一个候选项。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述根据所述待补全代码的上下文特征从上下文数据库中确定所述待补全代码的至少一个候选项,包括:
    根据所述待补全代码的上下文特征,利用深度优先搜索算法搜索上下文数据库,确定所述待补全代码的至少一个候选项。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述用户输入的代码中包括所述待补全代码的前缀;
    所述根据所述待补全代码的上下文特征从上下文数据库中确定所述待补全代码的至少一个候选项,包括:
    根据所述待补全代码的上下文特征从上下文数据库中确定与所述待补全代码的前缀匹配的至少一个候选项。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述上下文数据库包括基于开源数据集构建的数据库和基于所述用户的私有数据集构建的数据库中的至少一个。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述待补全代码包括类的方法中的代码,且所述用户输入的代码中包括返回类型。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    根据所述待补全代码的上下文特征确定所述待补全代码对应的方法调用的角色,所述角色用于辅助确定所述待补全代码的候选项的推荐概率。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述方法还包括:
    从所述至少一个候选项中过滤存在循环引用的候选项;
    所述通过所述用户界面向所述用户呈现所述至少一个候选项,包括:
    通过所述用户界面向所述用户呈现过滤后的所述候选项。
  11. 一种代码处理装置,其特征在于,所述装置包括:
    接口单元,用于通过用户界面接收用户输入的代码;
    特征提取单元,用于根据所述用户输入的代码确定待补全代码的上下文特征;
    分析单元,用于根据所述待补全代码的上下文特征从上下文数据库中确定所述待补全代码的至少一个候选项,所述上下文数据库中存储有样本代码以及所述样本代码的上下文特征;
    所述接口单元,还用于通过所述用户界面向所述用户呈现所述至少一个候选项。
  12. 根据权利要求11所述的装置,其特征在于,所述分析单元还用于:
    获取所述至少一个候选项在所述上下文数据库中的统计信息;
    根据所述统计信息对所述至少一个候选项进行过滤;
    所述接口单元具体用于:
    通过所述用户界面向所述用户呈现过滤后的候选项。
  13. 根据权利要求11或12所述的装置,其特征在于,所述装置还包括:
    评估单元,用于将所述至少一个候选项和所述待补全代码的上下文特征输入评估模型,获得所述至少一个候选项的推荐概率,根据所述至少一个候选项的推荐概率确定所述至少一个候选项中的目标候选项;
    所述接口单元具体用于:
    通过所述用户界面向所述用户呈现所述目标候选项。
  14. 根据权利要求11至13任一项所述的装置,其特征在于,所述装置还包括:
    参数填充单元,用于所述候选项包括函数名时,根据所述用户输入的代码所在代码文件中的代码填充所述至少一个候选项的参数;
    所述接口单元具体用于:
    通过所述用户界面向所述用户呈现填充有所述参数的所述至少一个候选项。
  15. 根据权利要求11至14任一项所述的装置,其特征在于,所述分析单元具体用于:
    根据所述待补全代码的上下文特征,利用深度优先搜索算法搜索上下文数据库,确定所述待补全代码的至少一个候选项。
  16. 根据权利要求11至15任一项所述的装置,其特征在于,所述用户输入的代码中包括所述待补全代码的前缀;
    所述分析单元具体用于:
    根据所述待补全代码的上下文特征从上下文数据库中确定与所述待补全代码的前缀匹配的至少一个候选项。
  17. 根据权利要求11至16任一项所述的装置,其特征在于,所述上下文数据库包括基于开源数据集构建的数据库和基于所述用户的私有数据集构建的数据库中的至少一个。
  18. 根据权利要求11至17任一项所述的装置,其特征在于,所述待补全代码包括类的方法中的代码,且所述用户输入的代码中包括返回类型。
  19. 根据权利要求18所述的装置,其特征在于,所述装置还包括:
    评估单元,用于根据所述待补全代码的上下文特征确定所述待补全代码对应的方法调用的角色,所述角色用于辅助确定所述待补全代码的候选项的推荐概率。
  20. 根据权利要求11至19任一项所述的装置,其特征在于,所述分析单元还用于:
    从所述至少一个候选项中过滤存在循环引用的候选项;
    所述接口单元具体用于:
    通过所述用户界面向所述用户呈现过滤后的所述候选项。
  21. 一种计算设备,其特征在于,所述设备包括处理器、存储器和显示器;
    所述处理器用于执行所述存储器中存储的指令,以使得所述设备执行如权利要求1至10中任一项所述的方法。
  22. 一种计算机可读存储介质,其特征在于,包括指令,当其在计算设备上运行时,使得所述计算设备执行如权利要求1至10中任一项所述的方法。
PCT/CN2021/123127 2020-11-02 2021-10-11 一种代码处理方法、装置、设备及介质 WO2022089188A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP21884917.2A EP4220381A4 (en) 2020-11-02 2021-10-11 CODE PROCESSING METHOD, APPARATUS, DEVICE AND MEDIUM
CN202180074635.4A CN116406459A (zh) 2020-11-02 2021-10-11 一种代码处理方法、装置、设备及介质
US18/310,749 US20230273776A1 (en) 2020-11-02 2023-05-02 Code Processing Method and Apparatus, Device, and Medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
RURU2020135915 2020-11-02
RU2020135915 2020-11-02

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/310,749 Continuation US20230273776A1 (en) 2020-11-02 2023-05-02 Code Processing Method and Apparatus, Device, and Medium

Publications (1)

Publication Number Publication Date
WO2022089188A1 true WO2022089188A1 (zh) 2022-05-05

Family

ID=81381884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/123127 WO2022089188A1 (zh) 2020-11-02 2021-10-11 一种代码处理方法、装置、设备及介质

Country Status (4)

Country Link
US (1) US20230273776A1 (zh)
EP (1) EP4220381A4 (zh)
CN (1) CN116406459A (zh)
WO (1) WO2022089188A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114895908A (zh) * 2022-05-17 2022-08-12 北京志凌海纳科技有限公司 基于Web应用表达式的实现方法及系统、设备和存储介质
CN116301796A (zh) * 2023-02-15 2023-06-23 四川省气象探测数据中心 一种基于人工智能技术的气象数据分析系统及方法
WO2024103764A1 (zh) * 2022-11-15 2024-05-23 华为云计算技术有限公司 基于云服务的代码生成方法及装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117289919B (zh) * 2023-11-24 2024-02-20 浙江口碑网络技术有限公司 一种数据处理方法、装置及电子设备
CN117289929B (zh) * 2023-11-24 2024-03-19 浙江口碑网络技术有限公司 一种插件框架、插件及数据处理方法、装置及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563433A (zh) * 2018-03-20 2018-09-21 北京大学 一种基于lstm自动补全代码的装置
US20190332968A1 (en) * 2018-04-29 2019-10-31 Microsoft Technology Licensing, Llc. Code completion for languages with hierarchical structures
CN110502227A (zh) * 2019-08-28 2019-11-26 网易(杭州)网络有限公司 代码补全的方法及装置、存储介质、电子设备
CN110673836A (zh) * 2019-08-22 2020-01-10 阿里巴巴集团控股有限公司 一种代码补全方法、装置、计算设备及存储介质
US20200097261A1 (en) * 2018-09-22 2020-03-26 Manhattan Engineering Incorporated Code completion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563433A (zh) * 2018-03-20 2018-09-21 北京大学 一种基于lstm自动补全代码的装置
US20190332968A1 (en) * 2018-04-29 2019-10-31 Microsoft Technology Licensing, Llc. Code completion for languages with hierarchical structures
US20200097261A1 (en) * 2018-09-22 2020-03-26 Manhattan Engineering Incorporated Code completion
CN110673836A (zh) * 2019-08-22 2020-01-10 阿里巴巴集团控股有限公司 一种代码补全方法、装置、计算设备及存储介质
CN110502227A (zh) * 2019-08-28 2019-11-26 网易(杭州)网络有限公司 代码补全的方法及装置、存储介质、电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4220381A4

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114895908A (zh) * 2022-05-17 2022-08-12 北京志凌海纳科技有限公司 基于Web应用表达式的实现方法及系统、设备和存储介质
CN114895908B (zh) * 2022-05-17 2023-02-28 北京志凌海纳科技有限公司 基于Web应用表达式的实现方法及系统、设备和存储介质
WO2024103764A1 (zh) * 2022-11-15 2024-05-23 华为云计算技术有限公司 基于云服务的代码生成方法及装置
CN116301796A (zh) * 2023-02-15 2023-06-23 四川省气象探测数据中心 一种基于人工智能技术的气象数据分析系统及方法

Also Published As

Publication number Publication date
EP4220381A1 (en) 2023-08-02
CN116406459A (zh) 2023-07-07
EP4220381A4 (en) 2024-04-03
US20230273776A1 (en) 2023-08-31

Similar Documents

Publication Publication Date Title
WO2022089188A1 (zh) 一种代码处理方法、装置、设备及介质
US10671355B2 (en) Code completion with machine learning
US9798648B2 (en) Transitive source code violation matching and attribution
US10866791B2 (en) Transforming non-Apex code to Apex code
CN110737899B (zh) 一种基于机器学习的智能合约安全漏洞检测方法
US11126930B2 (en) Code completion for dynamically-typed programming languages using machine learning
US10990358B2 (en) Code completion for overloaded methods
US10628130B2 (en) Code completion of custom classes with machine learning
US20140173563A1 (en) Editor visualizations
US11816456B2 (en) Notebook for navigating code using machine learning and flow analysis
EP3891599B1 (en) Code completion of method parameters with machine learning
US20220129448A1 (en) Intelligent dialogue method and apparatus, and storage medium
US11243750B2 (en) Code completion with machine learning
US11003426B1 (en) Identification of code for parsing given expressions
US11500619B1 (en) Indexing and accessing source code snippets contained in documents
US10831473B2 (en) Locating business rules in application source code
CN115794858A (zh) 查询语句处理方法、装置、设备及存储介质
CN111898762B (zh) 深度学习模型目录创建
EP3942404B1 (en) Code completion for overloaded methods
WO2024031983A1 (zh) 一种代码管理方法及相关设备
EP4147122A1 (en) Code completion with machine learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21884917

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021884917

Country of ref document: EP

Effective date: 20230425

NENP Non-entry into the national phase

Ref country code: DE