WO2024031983A1 - 一种代码管理方法及相关设备 - Google Patents

一种代码管理方法及相关设备 Download PDF

Info

Publication number
WO2024031983A1
WO2024031983A1 PCT/CN2023/081059 CN2023081059W WO2024031983A1 WO 2024031983 A1 WO2024031983 A1 WO 2024031983A1 CN 2023081059 W CN2023081059 W CN 2023081059W WO 2024031983 A1 WO2024031983 A1 WO 2024031983A1
Authority
WO
WIPO (PCT)
Prior art keywords
party library
party
annotation
user
tag
Prior art date
Application number
PCT/CN2023/081059
Other languages
English (en)
French (fr)
Inventor
王泽宇
程啸
王梦缘
俞俊杰
梁广泰
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202211665526.2A external-priority patent/CN117632224A/zh
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024031983A1 publication Critical patent/WO2024031983A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management

Definitions

  • This application relates to the technical field of software development, and in particular to a code management method, system, computing device cluster, computer-readable storage medium, and computer program product.
  • Mainstream code generation technology includes code generation based on model implementation.
  • the core of code generation based on model implementation is model driven architecture (MDA).
  • MDA model driven architecture
  • the functionality of the software system is defined in the form of a platform-independent model using an appropriate specification language, and then the functionality is defined for the actual Implement translation to one or more platform-specific models.
  • the above platform-specific models can undergo a series of transformations to obtain corresponding code files. In this way, the limitations of design language are broken and the efficiency of multi-platform software development is improved.
  • code files generated by the above method may contain very hidden errors, which may introduce security risks and have a huge impact on the security of the software.
  • This application provides a code management method that introduces a tag warehouse and recommends at least one third-party library corresponding to the comments based on the comments of the target code to be generated input by the user and the tag warehouse. Since the mapping relationship between third-party libraries and tags is extracted from massive source codes, it ensures that the recommended third-party libraries are trustworthy and safe, have no conflicts with local dependencies, and functionally meet the actual business demands of users and avoid different third-party libraries. Third-party libraries with the same class name lead to scenarios where errors are introduced, reducing security risks. This application also provides corresponding systems, computing device clusters, computer-readable storage media, and computer program products.
  • this application provides a code management method.
  • This method can be executed by a code management system.
  • a code management system can be a software system.
  • the code management system can be a software development platform such as an integrated development environment (IDE), which is integrated with third-party library recommendation capabilities, or the code management system can be a plug-in or extension integrated into the software development platform.
  • the code management system is deployed in a computing device cluster, and the computing device cluster executes the code of the software system, thereby executing the code management method of the present application.
  • the code management system can also be a hardware system, such as a computing device cluster with code management capabilities, including but not limited to a cloud server that provides code management capabilities such as third-party library recommendation capabilities.
  • the cloud server provides software as a service ( software as a service (SaaS). When the hardware system is running, the code management method of this application is executed.
  • the code management system receives comments input by the user, which are natural language descriptions of the target code to be generated, and then determines at least one corresponding to the comment based on the comments and a tag warehouse that stores the mapping relationship between third-party libraries and tags.
  • the third-party library then recommends at least one third-party library corresponding to the annotation to the user.
  • This method is further optimized for the currently popular model-based code generation technology.
  • the user uses natural language to describe the function (i.e., the requirement) that he wants to implement, he can give priority recommendations based on the mapping relationship between the third-party library and the tag in the tag warehouse.
  • the code management system when determining a third-party library, can search the tag warehouse based on comments to obtain at least one third-party library corresponding to the comments.
  • This method supports recommending appropriate third-party libraries by searching tag warehouses, which improves the efficiency of recommending third-party libraries while ensuring the accuracy of the recommendation results.
  • the code management system can decompose the annotation, obtain the tags of the annotation at multiple levels, and then search the tag warehouse respectively according to the tags of the annotation at multiple levels to obtain The annotated recommended third-party libraries at multiple levels are then used to obtain at least one third-party library corresponding to the annotated based on the three-party libraries annotated at multiple levels.
  • This method decomposes the annotations into multiple levels of tags, and then searches the tag warehouse separately according to the multiple levels of tags. This can achieve fine-grained third-party library recommendation and improve the accuracy of the recommendation results.
  • multiple levels of tags may include any combination of Name level tags, Manual level tags, and Auto level tags.
  • the Name level tag is the tag that most intuitively reflects the information of the third-party library. It usually uses the name of the third-party library as the tag.
  • Manual-level tags are tags with medium weight. They are tags formed through label disassembly and introduction through the official website. These tags will describe the functions of the third-party library and are also a relatively accurate expression of the third-party library.
  • Auto-level tags are the tags with the lowest weight, but they are also the tags with the largest magnitude. Auto tags can be tags mined from massive data. These tags are the developers’ reasons for using the third party in the current scenario in different scenarios. Description of the library. The Auto tag may be a bit one-sided when it appears alone, but when the amount of data is large enough, it can be understood as the reason for using the third-party library in a large number of scenarios, and it is a side description of the third-party library.
  • this method can mine the annotations from different dimensions to obtain the user's true intentions, and then recommend a more comprehensive third-party library to select the appropriate third-party library for the user. Code generation helps.
  • the code management system may determine the third-party library selected by the user from the at least one third-party library, and then update the weights of the multiple levels according to the third-party library selected by the user. .
  • the weights can be continuously updated to avoid using the same weights for a long time, which will lead to the current weights not being suitable for the business, thus improving the accuracy of third-party library recommendations.
  • the code management system can also input the annotation into a classifier trained by the mapping relationship between third-party libraries and tags in the tag warehouse to obtain at least one third-party library corresponding to the annotation. .
  • the classifier uses a deep learning neural network model to train third-party libraries and their labels to obtain a complete classification model.
  • the classifier can classify the annotation tags to obtain the most likely third-party library.
  • classifiers can mine the intrinsic relationship between third-party libraries and tags, and have higher accuracy when recommending third-party libraries.
  • the code management system can search the tag warehouse according to comments, obtain the first set of third-party libraries corresponding to the comments, and input the comments into the third-party library and tags in the tag warehouse.
  • the classifier trained on the mapping relationship obtains a second set of third-party libraries corresponding to the annotation, and then determines at least one third-party library corresponding to the annotation based on the first set of third-party libraries and the second set of third-party libraries. Recommended to users.
  • the first group of third-party libraries and the second group of third-party libraries are respectively set with weights, and a weighted operation is performed based on the weights to obtain the comprehensive score of each third-party library. Based on the comprehensive score, at least one third-party corresponding to the annotation can be determined. library.
  • This method jointly determines at least one third-party library corresponding to the annotation by combining the search method and the classifier method to recommend it to the user, ensuring the accuracy of the recommended third-party library.
  • the code management system may also recommend to the user the target code generated based on the at least one third-party library. In this way, in the automatic code generation stage, the generated content can be optimized through the third-party library to select good framing behaviors, which can increase the security of the generated code.
  • the code management system may, in response to a user's selection operation on the at least one third-party library, recommend to the user the target code generated based on the third-party library selected by the user.
  • code generation through recommended third-party libraries not only ensures code security, but also meets the personalized needs of users.
  • the code management system can generate the target code corresponding to each third-party library based on each of the at least one third-party library corresponding to the annotation, and then provide the target code to the third-party library. Users recommend the target code corresponding to each third-party library.
  • Users can select third-party libraries and corresponding target codes in one operation, which simplifies user operations and improves user experience.
  • the code management system can determine at least one third-party library corresponding to the comment based on the comment and the tag warehouse.
  • the code editing interface can include a third-party library recommendation control, and users can click on the control to trigger third-party library recommendations.
  • the code management system can time the last time the user entered a code. When the time reaches the set time, it can trigger third-party library recommendations.
  • This method supports users to trigger the third-party library recommendation function in different ways, and based on this function, selects the target code generated by the appropriate third-party library to avoid introducing security risks.
  • the code management system can also obtain a source code data set, which includes multiple source code files, and then determine the source code file that calls the third-party library based on the call points of the multiple source code files. Then, according to the source code file that calls the third-party library, the calling mode of the source code file is obtained, and then the tag warehouse is constructed according to the tag corresponding to each third-party library in the calling mode and the frequency of occurrence of the tag. Wherein, the tag corresponding to the third-party library is extracted from the calling mode.
  • This method obtains a large number of tags and third-party tags by mining and analyzing the real source code files in the source code data set.
  • the mapping relationship between libraries, thus building a tag warehouse, can lay the foundation for third-party library recommendations.
  • this application provides a code management system.
  • the system includes:
  • An interactive subsystem used to receive user-input comments, which are natural language descriptions of the target code to be generated
  • a recommendation subsystem configured to determine at least one third-party library corresponding to the annotation based on the annotation and a tag warehouse, where the tag warehouse stores a mapping relationship between the third-party library and tags;
  • the recommendation subsystem is also used to recommend at least one third-party library corresponding to the annotation to the user.
  • the recommendation subsystem is specifically used to:
  • the recommendation subsystem is specifically used to:
  • tags of the annotations at multiple levels search the tag warehouse respectively to obtain recommended third-party libraries for the annotations at multiple levels;
  • At least one third-party library corresponding to the annotations is obtained.
  • the recommendation subsystem is also used to:
  • the weights of the multiple levels are updated according to the third-party library selected by the user.
  • the recommendation subsystem is specifically used to:
  • the annotation is input into a classifier trained by the mapping relationship between third-party libraries and tags in the tag warehouse, and at least one third-party library corresponding to the annotation is obtained.
  • the recommendation subsystem is also used to:
  • the recommendation subsystem is specifically used to:
  • the target code generated based on the third-party library selected by the user is recommended to the user.
  • system further includes a generation subsystem
  • the generation subsystem is configured to generate the target code corresponding to each third-party library according to each third-party library in at least one third-party library corresponding to the annotation;
  • the recommendation subsystem is specifically used for:
  • the recommendation subsystem is specifically used to:
  • At least one third-party library corresponding to the comment is determined based on the comment and the tag warehouse.
  • the system also includes:
  • a data mining subsystem is used to obtain a source code data set.
  • the source code data set includes multiple source code files.
  • the source code file that calls the third-party library is determined.
  • source code file obtain the calling mode of the source code file, construct the tag warehouse according to the tag corresponding to each third-party library in the calling mode and the frequency of occurrence of the tag, and the tag corresponding to the third-party library is from The calling mode draws obtained.
  • this application provides a computing device cluster.
  • the cluster of computing devices includes at least one computing device including at least one processor and at least one memory.
  • the at least one processor and the at least one memory communicate with each other.
  • the at least one processor is configured to execute instructions stored in the at least one memory, so that the computing device or a cluster of computing devices executes the code management method as described in the first aspect or any implementation of the first aspect.
  • the present application provides a computer-readable storage medium in which instructions are stored, and the instructions instruct a computing device or a cluster of computing devices to execute the above-mentioned first aspect or any one of the first aspects. Implement the code management method described in the method.
  • the present application provides a computer program product containing instructions that, when run on a computing device or a cluster of computing devices, cause the computing device or a cluster of computing devices to execute the first aspect or any one of the first aspects. Implement the code management method described in the method.
  • Figure 1 is a schematic architectural diagram of a code management system provided by an embodiment of the present application.
  • Figure 2 is a schematic flow chart of a code generation method provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of a shallow integration solution provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of a deep integration solution provided by an embodiment of the present application.
  • Figure 5 is a flow chart of a code management method provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of inputting comments through a code editing interface provided by an embodiment of the present application.
  • Figure 7 is a schematic flowchart of searching for a third-party library provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of a third-party library jointly recommended by combining intelligent search and model classification provided by an embodiment of the present application
  • Figure 9 is a schematic diagram of a recommended third-party library displayed on a code editing interface provided by an embodiment of the present application.
  • Figure 10 is a schematic flowchart of introducing a third-party library provided by the embodiment of the present application.
  • Figure 11 is a schematic flowchart of a data mining subsystem building a tag warehouse provided by an embodiment of the present application
  • Figure 12 is a schematic diagram of source code analysis performed by a source code analysis engine provided by an embodiment of the present application.
  • Figure 13 is a schematic diagram of the mapping relationship between third-party libraries and tags in a tag warehouse provided by an embodiment of the present application;
  • Figure 14 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • Figure 15 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application.
  • Figure 16 is a schematic structural diagram of another computing device cluster provided by an embodiment of the present application.
  • Figure 17 is a schematic structural diagram of another computing device cluster provided by an embodiment of the present application.
  • first and second in the embodiments of this application are only used for descriptive purposes and cannot be understood as indicating or implying the same. Quantity of technical features indicated by significance or implicit indication. Therefore, features defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • Code Generation refers to automatically generating code fragments or complete code based on known information.
  • the known information may include one or more of context (for example, entered code) and comments (also called annotations).
  • Generating code snippets based on context is also called code completion.
  • Code completion provides real-time predictions of class names, method names, keywords, etc. to assist developers in writing application (APP) code.
  • APP application
  • the code generated by the code generation tool can include calls to third-party libraries (Third-party Library).
  • Third-party libraries refer to reusable software components developed and released by entities other than developers on the software development platform.
  • Typical third-party libraries include, but are not limited to, the work base package numpy for scientific computing and data work, python-docx for document processing, scrapy for data mining and monitoring, django for web development, or django for images. Pillow, the domain graphics library.
  • the third-party libraries recommended by the code generation tool play a decisive role in the quality of the generated code.
  • the industry provides a variety of ways to implement third-party library recommendations, including but not limited to intelligent recommendations based on user portraits and intelligent recommendations based on developer habits.
  • template-based code generation and model-based code generation can also implement the recommendation of third-party libraries.
  • model-based automatic code generation is the current mainstream automatic code generation method in the industry.
  • This method is implemented based on model driven architecture (MDA).
  • MDA model driven architecture
  • Application design refers to establishing a system model that meets user needs through a specified model language or modeling tool.
  • the structure and actions of the system model can effectively reflect system information.
  • the system model can be converted into a code file that can be used independently.
  • model conversion includes two conversion modes: model to model and model to code.
  • UML Unified Modeling Language
  • MDA models Compared with traditional UML, MDA models have been significantly optimized in terms of abstraction, independence and operability. Therefore, the system model can be called repeatedly during operation and converted into description files suitable for different platforms. Based on this, the MDA architecture breaks the limitations of design language and elevates the model language to the programming language level.
  • model-based code generation scheme does not determine the version of the introduced third-party library.
  • class names of different third-party libraries are likely to be the same. Once errors are introduced, it may lead to very hidden errors in the generated code, or even introduce third-party libraries with security risks, posing a huge challenge to the security of the application.
  • a code management system can be a software system.
  • the code management system can be a software development platform such as an integrated development environment (IDE), which is integrated with third-party library recommendation capabilities, or the code management system can be a plug-in or extension integrated into the software development platform.
  • IDE integrated development environment
  • the code management system is deployed in a computing device cluster, and the computing device cluster executes the code of the software system, thereby executing the code management method of the present application.
  • the code management system can also be a hardware system, such as a computing device cluster with code management capabilities, including but not limited to a cloud server that provides code management capabilities such as third-party library recommendation capabilities.
  • the cloud server provides Software is Service (software as a service, SaaS). When the hardware system is running, the code management method of this application is executed.
  • the code management system receives a comment input by the user, which is a natural language description of the target code to be generated, and then determines at least one corresponding to the comment based on the comment and a tag warehouse that stores the mapping relationship between third-party libraries and tags.
  • the third-party library then recommends at least one third-party library corresponding to the annotation to the user.
  • This method is further optimized for the currently popular model-based code generation technology.
  • the user uses natural language to describe the function (i.e., the requirement) that he wants to implement, he can give priority recommendations based on the mapping relationship between the third-party library and the tag in the tag warehouse.
  • the generated content can be optimized through the third-party library to select good framing behaviors, which can increase the security of the generated code.
  • the code management system 100 includes an interactive subsystem 102 and a recommendation subsystem 104.
  • the interactive subsystem 102 is used to receive comments input by the user, which are natural language descriptions of the target code to be generated. Among them, natural language description can describe the user's needs.
  • the recommendation subsystem 104 is configured to determine at least one third-party library corresponding to the annotation based on the annotation and the mapping relationship between the third-party library and the tag stored in the tag warehouse 106, and then recommend at least one third-party library corresponding to the annotation to the user.
  • Figure 1 illustrates the tag warehouse 106 as an example of the tag warehouse built into the code management system 100.
  • the tag warehouse 106 can also be provided by a third party (such as other platforms or developers). In this way, the code management system 100
  • the above-mentioned tag warehouse 106 can be externally connected to recommend third-party libraries based on the mapping relationship between third-party libraries and tags in the tag warehouse 106 .
  • the code management system 100 may also include a data mining subsystem 107.
  • the data mining subsystem 107 is used to build a tag warehouse to support third-party library recommendation.
  • the data mining subsystem 107 can obtain a source code data set, which includes multiple source code files, and then determine the source code file that calls the third-party library based on the call points of the multiple source code files, and then determines the source code file that calls the third-party library based on the source code of the third-party library. file, obtain the calling pattern of the source code file, and finally build a tag warehouse based on the tags corresponding to each third-party library in the calling pattern and the frequency of occurrence of the tags.
  • the tags corresponding to the third-party library are extracted from the calling mode.
  • the code management system 100 may also include a code generation subsystem 108 .
  • the code generation subsystem 108 is used to generate target code according to at least one recommended third-party library.
  • the recommendation subsystem 104 is also used to recommend target code generated based on at least one third-party library to the user.
  • the embodiment of this application integrates the third-party library recommendation capability with the code generation capability, thereby solving the pain point problem of failure to introduce third-party libraries in the code generation solution shown in Figure 2.
  • the code generation tool can generate detailed code snippets. If third-party libraries are used in the code snippets, users need to manually introduce these warehouses. Errors may be introduced during this process, such as introducing other third-party libraries with the same class name, or introducing different versions of the same third-party library.
  • the code management system 100 of the embodiment of the present application integrates the third-party library recommendation capability of the recommendation subsystem 104 and the code generation capability of the code generation subsystem 108 to solve the above problems.
  • the above recommendation subsystem 14 and code generation subsystem 108 can be shallowly integrated or deeply integrated.
  • the shallow integration and deep integration solutions are described in detail below.
  • the recommendation subsystem 104 determines the third-party library corresponding to the above requirements.
  • the third-party library is safe and trustworthy.
  • the code generation subsystem 108 code generation tool
  • the code generation subsystem 108 (code generation tool) can generate multiple candidate code fragments for the requirement, and the recommendation subsystem 104 can In response to this requirement, determine the third-party library corresponding to the above requirements. Since code snippets usually have corresponding third-party libraries that they depend on, when the third-party libraries that a code snippet depends on include recommended third-party libraries, it means that the effect of using this code snippet is good, and the code snippet can be recommended to users.
  • the method includes:
  • S502 The code management system 100 receives comments input by the user.
  • Comments are typically natural language descriptions of the target code to be generated. Comments can be written by users (such as developers) based on natural language, for example, the user uses natural language to describe the function of the target code to be generated. In some examples, the user-entered comment could be "I want to test the class and try to create a http method client.”
  • the code management system 100 can provide a code editing interface.
  • the code editing interface can be a graphical user interface (graphical user interface, GUI) or a command user interface (command user interface, CUI).
  • GUI graphical user interface
  • CUI command user interface
  • the user can use the above code editing interface.
  • the code management system 100 may receive comments input by the user through the code editing interface.
  • the code editing interface 600 is used as a GUI example below.
  • the code editing interface 600 carries a menu component 602.
  • the user can create a new code file through the controls in the menu component 602.
  • the code editing window 604 of the code editing interface 600 can display the New code file.
  • the user can enter comments to the code file in the code editing window 604.
  • users can enter relatively clear natural language in the form of comments in specific classes to describe the functions they wish to implement. For example, users can enter the comment "I want to test the class and try to create a http method client" in the JavaFile class to test the above JavaFile class and generate an http client method.
  • the code management system 100 determines at least one third-party library corresponding to the annotation based on the annotation and the tag warehouse.
  • the user inputs comments to express the function that the user really wants to implement.
  • the code management system 100 (for example, the recommendation subsystem 104) can parse the comments input by the user, understand the true intention expressed by the user, and determine the corresponding comment based on the algorithm model.
  • the third-party library can be regarded as a third-party library that matches the user's true intention.
  • the code management system 100 can also set trigger conditions for third-party library recommendations.
  • the code management system 100 can determine, based on the annotation and tag warehouse, the corresponding annotation corresponding to the annotation. At least one third-party library should be used.
  • the tag warehouse stores the mapping relationship between third-party libraries and tags.
  • the code management system 100 can search the tag warehouse according to the annotation to obtain at least one third-party library corresponding to the annotation, or input the annotation into a classifier trained by the mapping relationship between the third-party library and tags in the tag warehouse to obtain At least one third-party library corresponding to the annotation.
  • the first way is search.
  • the code management system 100 may use a search engine such as an elastic search (ES) search engine to search the tag warehouse, thereby determining at least one third-party library corresponding to the annotation.
  • ES elastic search
  • the mapping relationship between third-party libraries and tags in the tag warehouse can be stored in the ES search engine (ES engine) in advance.
  • ES engine ES search engine
  • users pass comments, they can use interception first.
  • Interceptor Interceptor
  • the interceptor can use heuristic rules to eliminate invalid comments entered by users and input comments with functional intent into the search engine.
  • the code management system 100 can also split the comments to obtain labels of the comments at multiple levels.
  • the code management system 100 can separately search the tag warehouse according to the tags of the annotations at multiple levels, and obtain the recommended third-party libraries with the annotations at multiple levels.
  • the code management system 100 may obtain at least one third-party library corresponding to the annotation based on the third-party library annotated at multiple levels. Each level has a corresponding weight, and the code management system 100 can perform a weighting operation based on the weight to obtain at least one third-party library corresponding to the annotation.
  • multiple levels of tags may include any combination of Name level tags, Manual level tags, and Auto level tags.
  • the Name level tag is the tag that most intuitively reflects the information of the third-party library. It usually uses the name of the third-party library as the tag.
  • Manual-level tags are tags with medium weight. They are tags formed through label disassembly and introduction through the official website. These tags will describe the functions of the third-party library and are also a relatively accurate expression of the third-party library.
  • Auto-level tags are the tags with the lowest weight, but they are also the tags with the largest magnitude. Auto tags can be tags mined from massive data. These tags are the developers’ reasons for using the third party in the current scenario in different scenarios. Description of the library. The Auto tag may be a bit one-sided when it appears alone, but when the amount of data is large enough, it can be understood as the reason for using the third-party library in a large number of scenarios, and it is a side description of the third-party library.
  • ES engine can search Label at the Name level, Manual level and Auto level respectively. Each search can obtain the recommended third-party libraries of the corresponding level.
  • the recommended third-party libraries of the three levels can be arranged from high to low according to the weight.
  • the code management system 100 can filter out the third-party libraries that ultimately need to be recommended through the aggregation model.
  • the second way is model classification.
  • the code management system 100 can use the mapping relationship between the third-party library and the tags in the tag warehouse to train the classifier. For example, the code management system 100 can construct a sample based on a third-party library and its label, and use the sample to train a neural network model through a deep learning algorithm to obtain a classifier. In this way, the code management system 100 can input the comments input by the user to the classifier, and the classifier can extract the labels of the comments and classify them, thereby outputting at least one third-party library corresponding to the comments.
  • the code management system 100 can also combine intelligent search and model classification to jointly determine at least one third-party library corresponding to the annotation, thereby solving the problem that users cannot accurately obtain the required third-party library.
  • the comment You can enter the search module of the code management system 100 (specifically the recommendation subsystem 104).
  • the search module can search the tag warehouse according to tags at different levels to obtain at least one third-party library corresponding to the tag.
  • the annotations can also be input into the classifier, and the classifier can classify the labels of the annotations and obtain at least one third-party library corresponding to the annotations.
  • At least one third-party library determined by the search module and the classifier can perform a weighting operation by combining the respective weights of the search module and the classifier to determine the final result.
  • the code management system 100 recommends at least one third-party library corresponding to the annotation to the user.
  • the code management system 100 can display at least one third-party library corresponding to the annotation to the user through the code management editing interface, thereby recommending at least one third-party library corresponding to the annotation to the user.
  • the code management system 100 can also determine the version of the third-party library corresponding to the annotation.
  • the code management system can also display to the user the version of at least one third-party library corresponding to the annotation.
  • 4.2.1-atlassian-1 and 2.4.1 represent the versions of third-party libraries.
  • the code management system 100 can determine the recommendation degree of each third-party library, and then sort the third-party libraries based on the recommendation degree.
  • the code management system 100 can sort the third-party libraries according to the degree of recommendation. Displayed in order from high to low, third-party libraries with high recommendations can be placed at the top of the recommended list.
  • the code management system 100 can also display the recommendation degree of each third-party library.
  • the recommendation degree of the first third-party library can be 1.0
  • the recommendation degree of the second third-party library is relatively low.
  • the versions of these third-party libraries are usually safe, bug-free, and have a high probability of achieving the functions that users want.
  • S508 The code management system 100 generates target code based on at least one recommended third-party library.
  • the code management system 100 may generate target code respectively for each third-party library in the recommended at least one third-party library.
  • the user can also select a third-party library from at least one recommended third-party library, and then the code management system 100 can respond to the user's selection operation of the at least one third-party library.
  • the library generates object code.
  • the code management system 100 can also add the information of the third-party library in the project file to introduce the third-party library.
  • users can choose to trigger the add operation of the third-party library through the corresponding Add control of the third-party library, or trigger all recommended third-party libraries by triggering the Add All control.
  • the code management system 100 can respond to the above operation and automatically add the corresponding third-party library information to the pom file (as shown in the marked box in the figure), further facilitating the user's development work.
  • S510 The code management system 100 recommends the target code to the user.
  • the code management system 100 may recommend to the user the target code generated based on the third-party library selected by the user in response to the user's selection operation on the at least one third-party library.
  • the code management system 100 may recommend the target code corresponding to each third-party library in at least one third-party library to the user.
  • the code management system 100 can display the target code corresponding to each third-party library when displaying at least one third-party library to the user.
  • the above-mentioned S506 and S510 can be executed in parallel. In this way, users can directly select the appropriate code for code development.
  • the code management system 100 may also determine the third-party library selected by the user from the at least one third-party library, for example, the user Directly select the third-party library, or select the target code to select the third-party library that the target code depends on, and then update the weights of multiple levels according to the third-party library selected by the user, thereby improving the recommendation accuracy.
  • the recommendation subsystem 108 uses intelligent search and model classification to jointly recommend third-party libraries, it can also update the weights of the search module and classifier based on the third-party library selected by the user to continuously optimize the recommendation accuracy.
  • S508 and S510 are optional steps in the embodiment of the present application, and the above steps may not be performed when performing the code management method in the embodiment of the present application.
  • the code management system 100 may not execute the above-mentioned S508 and S510.
  • embodiments of the present application provide a code management method.
  • This method is further optimized for the currently popular model-based code generation technology.
  • the user uses natural language to describe the function (i.e., the requirement) that he wants to implement, he can give priority recommendations based on the mapping relationship between the third-party library and the tag in the tag warehouse.
  • the generated content can be optimized through the third-party library to select good framing behaviors, which can increase the security of the generated code.
  • the tag warehouse can be obtained by data mining analysis by the data mining subsystem 107. Specifically, the data mining subsystem 107 can obtain a source code data set that includes multiple source code files, and then determine the source code file that calls the third-party library based on the call points of the multiple source code files, and then call the third-party library according to the call points.
  • the source code file of the library is obtained, the calling mode of the source code file is obtained, and then the tag warehouse is constructed according to the tag corresponding to each third-party library in the calling mode and the frequency of occurrence of the tag.
  • the data mining subsystem 107 supports data mining in multiple languages, such as Java, C, C++, Go, Python and other languages. For ease of understanding, the following uses Java language examples to illustrate.
  • the data mining subsystem 107 may include a source code analysis engine, a data acquisition module, and a tag acquisition module. The processing flow of each module is described below.
  • the data acquisition module first downloads the source code of a large number of third-party libraries.
  • the data acquisition module can download a large number of open source source code projects from open source websites to form a Java source code warehouse.
  • the data acquisition module can use the source code analysis engine to analyze these source codes. Split it and obtain the full amount of information, such as class name ClassName, method name MethodName, etc.
  • the knowledge graph includes the most granular information from third-party libraries. This completes the accumulation of original data.
  • the source code analysis engine can parse the source code file to obtain the syntax tree (abstract syntax tree, AST), and then perform fine-grained splitting based on each node of the AST tree. , split it into class (class) level, attribute (field) level, method (method) level, you can To get class declaration (class declaration), attribute declaration (field declaration), method declaration (method declaration), etc. In this way, the source code analysis engine can extract all declarations and call points.
  • the source code analysis engine can adapt the call point to the data of the third-party library in the knowledge graph to obtain the calling pattern of the third-party library.
  • the above calling pattern can be used to build a training set and store it in the training set warehouse.
  • the training set warehouse can store the three coordinates (method, field, class) of the third-party library, the JavaDoc of the call point, the context of the call point, the name of the client library and other related information.
  • the label acquisition module obtains labels from third-party libraries in the training set to build a label warehouse.
  • the tag acquisition module can obtain the name of the third-party library as a Name-level tag, and perform tag disassembly from the promotional document of the third-party library (such as the introduction document of the official website), thereby obtaining the Manual-level tag, from the Java source code warehouse Get the Auto level label in the Client project.
  • the tag acquisition module splits the Client projects in the Java source code warehouse in sequence, down to each Java file, and then puts all the Java files into the source code analysis engine and disassembles them into the finest-grained atomic nodes. These nodes can be Method level, Field level, etc. After the disassembly is completed, compare it with the call point in the knowledge graph. Once Mapping is successful, it can be considered that this file in the Client project calls a third-party library.
  • the source code analysis engine extracts the JavaDoc of the method corresponding to the call point, and then obtains the calling pattern corresponding to the description of the third-party library and the call point.
  • the source code analysis engine stores all calling patterns to form a training set warehouse.
  • the tag acquisition module can extract the comments in the calling pattern and use a word segmenter to split them.
  • Tokenizers use open source algorithms to split natural language into fine-grained phrases, which can be called tags.
  • tags By counting all tags and third-party libraries, you can obtain the full number of tags corresponding to each third-party library, as well as the frequency of occurrence of the tags.
  • the long tail effect of each tag is discovered based on the frequency of occurrence. When the amount of data is huge, the long tail part can be effectively cut off.
  • the remaining mapping relationship between the third-party library and the tag is a trusted mapping relationship.
  • the tag acquisition module can store these newly mined call patterns to obtain a fine-grained description of each third-party library.
  • Figure 13 shows the mapping relationship between third-party libraries and tags in the tag warehouse.
  • the third-party library Guava As an example, its name-level tag is guava, and its manual-level tag is mined from the description on the official website, which is google, read . Its Auto tag is mined from a large number of third-party projects, among which read appears most frequently, and write and io appear relatively less frequently.
  • this application also provides a code management system 100.
  • the system 100 includes:
  • the interactive subsystem 102 is used to receive comments input by the user, which are natural language descriptions of the target code to be generated;
  • the recommendation subsystem 104 is configured to determine at least one third-party library corresponding to the comment based on the comment and the tag warehouse 106, which stores the mapping relationship between the third-party library and tags;
  • the recommendation subsystem 104 is also used to recommend at least one third-party library corresponding to the annotation to the user.
  • the above-mentioned interactive subsystem 102 and recommendation subsystem 104 can be implemented by hardware, or can be implemented by software.
  • the interactive subsystem 102 and the recommendation subsystem 104 can be software or functional modules of software; for another example, the interactive subsystem 102 and the recommendation subsystem 104 can also be hardware with corresponding functions, such as deployed with corresponding A cluster of computing devices with functional software.
  • the recommendation subsystem 104 is used as an example below.
  • the recommendation subsystem 104 may be an application program running on a computing device, such as a computing engine, etc.
  • the application can be provided to users as a virtualized service.
  • Virtualization services can include virtual machine (VM) services, bare metal server (BMS) services, and container (container) services.
  • the VM service can be a service that uses virtualization technology to virtualize a virtual machine (VM) resource pool on multiple physical hosts (such as computing devices) to provide users with VMs for use on demand.
  • the BMS service is a service that virtualizes BMS resource pools on multiple physical hosts to provide users with BMS on demand.
  • Container service is a service that virtualizes container resource pools on multiple physical hosts to provide users with containers on demand.
  • VM is a simulated virtual computer, that is, a logical computer.
  • BMS is an elastically scalable high-performance computing service. Its computing performance is the same as that of traditional physical machines, and it has the characteristics of safe physical isolation.
  • Containers are a kernel virtualization technology that can provide lightweight virtualization to isolate user space, processes and resources. It should be understood that the VM service, BMS service and container service in the above virtualization services are only specific examples. In actual applications, the virtualization service can also be other lightweight or heavyweight virtualization services, which are not discussed here. Specific limitations.
  • the recommendation subsystem 104 may include at least one computing device, such as a server.
  • the recommendation subsystem 104 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • the recommendation subsystem 104 is specifically used to:
  • the recommendation subsystem 104 is specifically used to:
  • tags of the annotations at multiple levels search the tag warehouse 106 respectively to obtain recommended third-party libraries for the annotations at multiple levels;
  • At least one third-party library corresponding to the annotations is obtained.
  • the recommendation subsystem 104 is also used to:
  • the weights of the multiple levels are updated according to the third-party library selected by the user.
  • the recommendation subsystem 104 is specifically used to:
  • the annotation is input into a classifier trained by the mapping relationship between third-party libraries and tags in the tag warehouse, and at least one third-party library corresponding to the annotation is obtained.
  • the recommendation subsystem 104 is also used to:
  • the recommendation subsystem 104 is specifically used to:
  • the target code generated based on the third-party library selected by the user is recommended to the user.
  • system 100 also includes a generation subsystem 108;
  • the generation subsystem 108 is configured to generate the target code corresponding to each third-party library according to each third-party library in at least one third-party library corresponding to the annotation;
  • the recommendation subsystem 104 is specifically used to:
  • the generation subsystem 108 may be implemented in software or in hardware.
  • the generation subsystem 108 may be an application program running on a computing device, such as a computing engine, etc.
  • the application can be provided to users as a virtualized service.
  • the generation subsystem 108 may include at least one computing device, such as a server.
  • the generation subsystem 108 may also be a device implemented using an application specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • the recommendation subsystem 104 is specifically used to:
  • At least one third-party library corresponding to the comment is determined based on the comment and the tag warehouse.
  • system 100 further includes:
  • the data mining subsystem 107 is used to obtain a source code data set.
  • the source code data set includes multiple source code files. According to the calling points of the multiple source code files, determine the source code file that calls the third-party library. According to the calling third-party library, The source code file of the library is obtained, the calling mode of the source code file is obtained, and the tag warehouse is constructed according to the tag corresponding to each third-party library in the calling mode and the frequency of occurrence of the tag. The tag corresponding to the third-party library is Extracted from the calling pattern.
  • the data mining subsystem 107 can be implemented in software or implemented in hardware.
  • the data mining subsystem 107 may be an application program running on a computing device, such as a computing engine, etc.
  • the application can be provided to users as a virtualized service.
  • the data mining subsystem 107 may include at least one computing device, such as a server.
  • the data mining subsystem 107 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • computing device 1400 includes: bus 1402, processor 1404, memory 1406, and communication interface 1408.
  • the processor 1404, the memory 1406 and the communication interface 1408 communicate through a bus 1402.
  • Computing device 1400 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 1400.
  • the bus 1402 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 14, but it does not mean that there is only one bus or one type of bus.
  • Bus 1402 may include a path that carries information between various components of computing device 1400 (eg, memory 1406, processor 1404, communications interface 1408).
  • the processor 1404 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • Memory 1406 may include volatile memory, such as random access memory. access memory, RAM).
  • the processor 1404 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive). drive, SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • the memory 1406 stores executable program code, and the processor 1404 executes the executable program code to implement the foregoing code management method. Specifically, the memory 1406 stores instructions for the code management system 100 to execute the code management method.
  • the communication interface 1408 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 1400 and other devices or communication networks.
  • An embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device may be a server, such as a central server, an edge server, or a local server in a local data center.
  • the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
  • the computing device cluster includes at least one computing device 1400. Instructions for the same code management system 100 to perform the code management method may be stored in the memory 1406 of one or more computing devices 1400 in the computing device cluster.
  • one or more computing devices 1400 in the computing device cluster may also be used to execute part of the instructions of the code management system 100 for executing the code management method.
  • a combination of one or more computing devices 1400 may collectively execute instructions of the code management system 100 for performing the code management method.
  • the memory 1406 in different computing devices 1400 in the computing device cluster may store different instructions for executing part of the functions of the code management system 100 .
  • Figure 16 shows a possible implementation.
  • two computing devices 1400A and 1400B are connected through a communication interface 1408.
  • the memory in the computing device 1400A stores instructions for performing the functions of the interactive subsystem 10
  • the memory in the computing device 1400B stores instructions for performing the functions of the recommendation subsystem 104 .
  • the memory in the computing device 1400A may also store instructions for performing the functions of the code generation subsystem 108
  • the memory in the computing device 1400B may also store instructions for performing the functions of the data mining subsystem 107.
  • the data warehouse 106 built by the data mining subsystem 107 can be stored in the memory of the computing device 1400B.
  • the memories 1406 of the computing devices 1400A and 1400B collectively store instructions for the code management system 100 to perform the code management method.
  • connection method between computing device clusters shown in Figure 16 can be based on the fact that the code management method provided by this application consumes more computing resources to recommend third-party libraries. Therefore, it is considered that the functions implemented by the recommendation subsystem 104 are performed by the computing device 1400B.
  • computing device 1400A shown in FIG. 16 may also be performed by multiple computing devices 1400.
  • computing device 1400B may also be performed by multiple computing devices 1400.
  • one or more computing devices in a cluster of computing devices may be connected through a network.
  • the network may be a wide area network or a local area network, etc.
  • Figure 17 shows a possible implementation. As shown in Figure 17, two computing devices 1400C and 1400D are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device.
  • the memory in the computing device 1400C stores instructions for executing the functions of the interactive subsystem 10
  • the memory in the computing device 1400D stores instructions for executing the push function.
  • Recommendation subsystem 104 functions.
  • the memory in the computing device 1400C may also store instructions for performing the functions of the code generation subsystem 108
  • the memory in the computing device 1400D may also store instructions for performing the functions of the data mining subsystem 107.
  • the data warehouse 106 constructed by the data mining subsystem 107 may be stored in the memory of the computing device 1400D.
  • the memories 1406 of the computing devices 1400C and 1400D jointly store instructions for the code management system 100 to perform the code management method.
  • connection method between computing device clusters shown in Figure 17 can be based on the fact that the code management method provided by this application consumes more computing resources to recommend third-party libraries. Therefore, it is considered that the functions implemented by the recommendation subsystem 104 are performed by the computing device 1400D.
  • computing device 1400C shown in FIG. 17 may also be performed by multiple computing devices 1400.
  • computing device 1400D may also be performed by multiple computing devices 1400.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.
  • the computer-readable storage medium includes instructions that instruct the computing device to execute the above-described application to the code management system 100 for performing the code management method.
  • An embodiment of the present application also provides a computer program product containing instructions.
  • the computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium.
  • the computer program product is run on at least one computer device, at least one computer device is caused to execute the above code management method.

Abstract

本申请提供了一种代码管理方法,由代码管理系统执行,包括:接收用户输入的待生成的目标代码的注释,根据注释以及存储有第三方库和标签的映射关系的标签仓库,确定与注释对应的至少一个第三方库,向用户推荐与注释对应的至少一个第三方库。由于第三方库和标签的映射关系是从海量的源码中提取得到,保障了推荐的第三方库可信、安全,与本地依赖没有冲突,并且在功能上满足用户实际的业务诉求,避免不同第三方库具有相同类名称导致引入错误的场景,降低了安全风险。

Description

一种代码管理方法及相关设备
本申请要求于2022年08月10日提交中国国家知识产权局、申请号为202210953688.X、发明名称为“确定第三方库的方法、装置、服务器及存储介质”的中国专利申请的优先权,以及要求于2022年12月23日提交中国国家知识产权局、申请号为202211665526.2、发明名称为“一种代码管理方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及软件开发技术领域,尤其涉及一种代码管理方法、系统、计算设备集群、计算机可读存储介质、计算机程序产品。
背景技术
随着计算机和互联网的不断发展,许多传统行业也开始进行数字化转型。为了实现数字化转型,通常需要进行软件开发,从而将需求转化为满足需求的软件。其中,软件开发的速度和质量对需求转化率起到至关重要的作用。
在开发软件时,开发人员可以利用代码生成(Code Generation)技术辅助开发,从而提高开发效率,降低开发成本。主流的代码生成技术包括基于模型实现的代码生成。基于模型实现的代码生成的核心是模型驱动架构(model driven architecture,MDA),在该架构下,软件系统的功能性是用合适的规约语言以平台无关的模型的方式定义的,然后为实际的实现翻译到一个或多个平台特定模型(platform-specific model)。上述平台特定模型可以经过一系列转换可以得到相应的代码文件。如此,打破了设计语言的限制,提高了多平台软件开发的效率。
然而,上述方法生成的代码文件可能存在非常隐蔽的错误,进而引入安全风险,对软件的安全性产生巨大的影响。
发明内容
本申请提供了一种代码管理方法,该方法引入标签仓库,根据用户输入的待生成的目标代码的注释以及标签仓库,推荐与注释对应的至少一个第三方库。由于第三方库和标签的映射关系是从海量的源码中提取得到,保障了推荐的第三方库可信、安全,与本地依赖没有冲突,并且在功能上满足用户实际的业务诉求,避免不同第三方库具有相同类名称导致引入错误的场景,降低了安全风险。本申请还提供了对应的系统、计算设备集群、计算机可读存储介质以及计算机程序产品。
第一方面,本申请提供一种代码管理方法。该方法可以由代码管理系统执行。代码管理系统可以是软件系统。例如,代码管理系统可以是集成开发环境(Integrated Development Environment,IDE)等软件开发平台,该软件开发平台集成有第三方库推荐能力,或者代码管理系统可以是集成于软件开发平台的插件、扩展。代码管理系统部署在计算设备集群中,计算设备集群执行软件系统的代码,从而执行本申请的代码管理方法。在一些可能的 实现方式中,代码管理系统也可以是硬件系统,例如是具有代码管理能力的计算设备集群,包括但不限于提供代码管理能力如第三方库推荐能力的云服务器,该云服务器提供软件即服务(software as a service,SaaS)。硬件系统运行时,执行本申请的代码管理方法。
具体地,代码管理系统接收用户输入的注释,该注释为待生成的目标代码的自然语言描述,然后根据注释以及存储有第三方库和标签的映射关系的标签仓库,确定与注释对应的至少一个第三方库,接着向用户推荐与注释对应的至少一个第三方库。
该方法针对当下流行的基于模型的代码生成技术进行进一步优化,在用户使用自然语言描述完想要实现的功能(即需求)时,可以基于标签仓库中第三方库和标签的映射关系,优先推荐用户一些可能需要的第三方库。由于第三方库和标签的映射关系是从海量的源码中提取得到,保障了推荐的第三方库可信、安全,与本地依赖没有冲突,并且在功能上满足用户实际的业务诉求,避免不同第三方库具有相同类名称导致引入错误的场景,降低了安全风险。
在一些可能的实现方式中,代码管理系统在确定第三方库时,可以根据注释搜索所述标签仓库,获得与所述注释对应的至少一个第三方库。
该方法支持通过搜索标签仓库方式,推荐合适的第三方库,在保障推荐结果的准确度的情况下,提高了推荐第三方库的效率。
在一些可能的实现方式中,代码管理系统可以对所述注释进行分解,获得所述注释在多个层级的标签,然后根据所述注释在多个层级的标签,分别搜索所述标签仓库,获得所述注释在多个层级的推荐三方库,接着根据所述注释在多个层级的三方库,获得与所述注释对应的至少一个第三方库。
该方法通过将注释分解为多个层级的标签,然后根据多个层级的标签分别搜索标签仓库,如此可以实现细粒度的第三方库推荐,提高推荐结果的准确度。
在一些可能的实现方式中,多个层级的标签可以包括Name级别标签、Manual级别标签和Auto级别标签中的任意组合。Name级别标签是最直观反映第三方库的信息的标签,通常以三方库的名称作为标签。Manual级别标签是权重中等的标签,它是通过官方网站的介绍,进行标签化拆解形成的标签,这些标签将从第三方库的功能角度进行描述,也是对第三方库相对准确的表达。Auto级别的标签是权重最低的标签,但同时也是量级最大的标签,Auto标签可以是从海量数据中挖掘而来的标签,这些标签是开发者在不同的场景下对当前场景为何使用该三方库的描述。Auto标签在单独出现时可能会有些片面,但在数据量足够大时,可以理解成在大量场景中使用该三方库使用的原因,是第三方库一种侧面的描述。
该方法通过将注释分解为Name级别标签、Manual级别标签或Auto级别标签,可以实现从不同维度挖掘注释获得用户真实意图,进而推荐出较为全面的第三方库,为用户选择合适的第三方库进行代码生成提供帮助。
在一些可能的实现方式中,代码管理系统可以确定所述用户从所述至少一个第三方库中选择的第三方库,然后根据所述用户选择的第三方库,更新所述多个层级的权重。
如此,可以实现持续更新权重,避免长期采用相同权重导致当前权重与业务不适配,进而提高了第三方库推荐的精度。
在一些可能的实现方式中,代码管理系统也可以将所述注释输入由所述标签仓库中第三方库和标签的映射关系训练得到的分类器,获得与所述注释对应的至少一个第三方库。
分类器采用深度学习的神经网络模型,将第三方库和其标签进行训练,得到一套完整的分类模型。当用户输入注释时,分类器可以将注释的标签进行分类,得到最有可能的第三方库。相较于搜索方式,分类器能够挖掘第三方库和标签的内在联系,在进行第三方库推荐时,具有较高的准确度。
在一些可能的实现方式中,代码管理系统可以根据注释搜索所述标签仓库,获得与所述注释对应的第一组第三方库,以及将注释输入由所述标签仓库中第三方库和标签的映射关系训练得到的分类器,获得与所述注释对应的第二组第三方库,然后根据上述第一组第三方库、第二组第三方库确定与注释对应的至少一个第三方库,以推荐给用户。
例如,第一组第三方库和第二组第三方库分别设置有权重,通过该权重进行加权运算,得到各第三方库的综合评分,基于该综合评分可以确定与注释对应的至少一个第三方库。
该方法通过结合搜索方式以及分类器方式共同确定与注释对应的至少一个第三方库,以推荐给用户,保障了推荐的第三方库的准确度。
在一些可能的实现方式中,代码管理系统还可以向所述用户推荐基于所述至少一个第三方库生成的所述目标代码。如此实现了在代码自动生成阶段,通过第三方库选择好的框定行为来优化生成的内容,可以增加生成代码的安全性。
在一些可能的实现方式中,代码管理系统可以响应于用户对所述至少一个第三方库的选择操作,向所述用户推荐基于所述用户选择的第三方库生成的所述目标代码。
如此,不仅通过推荐的第三方库进行代码生成保障代码安全性,还满足了用户的个性化需求。
在一些可能的实现方式中,代码管理系统可以根据与所述注释对应的至少一个第三方库中的每个第三方库,分别生成每个第三方库对应的所述目标代码,然后向所述用户推荐每个第三方库对应的所述目标代码。
用户可以通过一次操作选择三方库以及对应的目标代码,简化了用户操作,提升了用户体验。
在一些可能的实现方式中,当所述用户触发三方库推荐时,代码管理系统可以根据注释以及标签仓库,确定与所述注释对应的至少一个第三方库。例如,代码编辑界面可以包括三方库推荐控件,用户可以点击该控件触发三方库推荐。又例如,代码管理系统可以对用户最近一次输入代码的时间进行计时,当该时间达到设定时长,则可以触发三方库推荐。
该方法支持用户通过不同方式触发三方库推荐功能,并基于该功能,选择合适的三方库生成的目标代码,避免引入安全风险。
在一些可能的实现方式中,代码管理系统还可以获取源码数据集,所述源码数据集包括多个源码文件,然后根据所述多个源码文件的调用点,确定调用第三方库的源码文件,接着根据所述调用第三方库的源码文件,获得所述源码文件的调用模式,然后根据所述调用模式中每个第三方库对应的标签以及所述标签的出现频率,构建所述标签仓库。其中,所述第三方库对应的标签从所述调用模式中抽取得到。
该方法通过对源码数据集中的真实源码文件进行挖掘分析,获得大量的标签和第三方 库之间的映射关系,由此构建标签仓库,可以为第三方库推荐奠定基础。
第二方面,本申请提供一种代码管理系统。所述系统包括:
交互子系统,用于接收用户输入的注释,所述注释为待生成的目标代码的自然语言描述;
推荐子系统,用于根据所述注释以及标签仓库,确定与所述注释对应的至少一个第三方库,所述标签仓库存储有第三方库和标签的映射关系;
所述推荐子系统,还用于向所述用户推荐与所述注释对应的至少一个第三方库。
在一些可能的实现方式中,所述推荐子系统具体用于:
根据所述注释搜索所述标签仓库,获得与所述注释对应的至少一个第三方库。
在一些可能的实现方式中,所述推荐子系统具体用于:
对所述注释进行分解,获得所述注释在多个层级的标签;
根据所述注释在多个层级的标签,分别搜索所述标签仓库,获得所述注释在多个层级的推荐三方库;
根据所述注释在多个层级的三方库,获得与所述注释对应的至少一个第三方库。
在一些可能的实现方式中,所述推荐子系统还用于:
确定所述用户从所述至少一个第三方库中选择的第三方库;
根据所述用户选择的第三方库,更新所述多个层级的权重。
在一些可能的实现方式中,所述推荐子系统具体用于:
将所述注释输入由所述标签仓库中第三方库和标签的映射关系训练得到的分类器,获得与所述注释对应的至少一个第三方库。
在一些可能的实现方式中,所述推荐子系统还用于:
向所述用户推荐基于所述至少一个第三方库生成的所述目标代码。
在一些可能的实现方式中,所述推荐子系统具体用于:
响应于用户对所述至少一个第三方库的选择操作,向所述用户推荐基于所述用户选择的第三方库生成的所述目标代码。
在一些可能的实现方式中,所述系统还包括生成子系统;
所述生成子系统,用于根据与所述注释对应的至少一个第三方库中的每个第三方库,分别生成每个第三方库对应的所述目标代码;
所述推荐子系统具体用于:
向所述用户推荐每个第三方库对应的所述目标代码。
在一些可能的实现方式中,所述推荐子系统具体用于:
当所述用户触发三方库推荐时,根据注释以及标签仓库,确定与所述注释对应的至少一个第三方库。
在一些可能的实现方式中,所述系统还包括:
数据挖掘子系统,用于获取源码数据集,所述源码数据集包括多个源码文件,根据所述多个源码文件的调用点,确定调用第三方库的源码文件,根据所述调用第三方库的源码文件,获得所述源码文件的调用模式,根据所述调用模式中每个第三方库对应的标签以及所述标签的出现频率,构建所述标签仓库,所述第三方库对应的标签从所述调用模式中抽 取得到。
第三方面,本申请提供一种计算设备集群。所述计算设备集群包括至少一台计算设备,所述至少一台计算设备包括至少一个处理器和至少一个存储器。所述至少一个处理器、所述至少一个存储器进行相互的通信。所述至少一个处理器用于执行所述至少一个存储器中存储的指令,以使得计算设备或计算设备集群执行如第一方面或第一方面的任一种实现方式所述的代码管理方法。
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,所述指令指示计算设备或计算设备集群执行上述第一方面或第一方面的任一种实现方式所述的代码管理方法。
第五方面,本申请提供了一种包含指令的计算机程序产品,当其在计算设备或计算设备集群上运行时,使得计算设备或计算设备集群执行上述第一方面或第一方面的任一种实现方式所述的代码管理方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。
图1为本申请实施例提供的一种代码管理系统的架构示意图;
图2为本申请实施例提供的一种代码生成方法的流程示意图;
图3为本申请实施例提供的一种浅集成方案的示意图;
图4为本申请实施例提供的一种深集成方案的示意图;
图5为本申请实施例提供的一种代码管理方法的流程图;
图6为本申请实施例提供的一种通过代码编辑界面输入注释的示意图;
图7为本申请实施例提供的一种搜索得到第三方库的流程示意图;
图8为本申请实施例提供的一种结合智能搜索和模型分类共同推荐第三方库的示意图;
图9为本申请实施例提供的一种在代码编辑界面展示推荐的第三方库的示意图;
图10为本申请实施例提供的一种第三方库引入的流程示意图;
图11为本申请实施例提供的一种数据挖掘子系统构建标签仓库的流程示意图;
图12为本申请实施例提供的一种源码分析引擎进行源码分析的示意图;
图13为本申请实施例提供的一种标签仓库中第三方库与标签的映射关系的示意图;
图14为本申请实施例提供的一种计算设备的结构示意图;
图15为本申请实施例提供的一种计算设备集群的结构示意图;
图16为本申请实施例提供的另一种计算设备集群的结构示意图;
图17为本申请实施例提供的又一种计算设备集群的结构示意图。
具体实施方式
本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相 对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
首先对本申请实施例中所涉及到的一些技术术语进行介绍。
代码生成(Code Generation),是指基于已知信息自动生成代码片段或完整代码。其中,已知信息可以包括上下文(例如为已输入的代码)、注释(也称作注解)中的一种或多种。基于上下文生成代码片段也称作代码补全(code completion)。代码补全提供即时类名、方法名和关键字等预测,辅助开发人员编写应用程序(application,APP)的代码。
代码生成工具所生成的代码中可以包括对第三方库(Third-party Library)的调用。第三方库是指由软件开发平台上的开发者以外的实体开发、发布的可重用软件组件。典型的第三方库包括但不限于用于科学计算和数据工作的工作基础包numpy、用于处理文档的python-docx、用于数据挖掘、监控的scrapy、用于web开发的django或用于图像领域的图形库pillow。与直接生成相应功能的代码相比,通过调用第三库,一方面可以减少代码量,提升代码简洁度,另一方面可以缩短代码生成时间,提高代码生成效率。
代码生成工具所推荐的第三方库对生成的代码的质量具有决定性作用。目前,业界提供了多种方式实现第三方库的推荐,包括但不限于基于用户画像的智能推荐、基于开发者习惯的智能推荐。此外,基于模板的代码生成、基于模型的代码生成也可以实现第三方库的推荐。
其中,基于模型的代码自动生成是当前业界主流的代码自动生成方式。该方法基于模型驱动架构(model driven architecture,MDA)实现。MDA支持使用模型语言完成应用设计,实现系统代码生成、运行编译等,最终满足用户需求。应用设计是指通过指定的模型语言或建模工具,建立满足用户需求的系统模型。系统模型的结构和动作可以有效地反映系统信息。在完成一系列转换处理后,系统模型可以被转换为一个能够独立使用的代码文件。一般来说,模型转换包括两种转换模式:模型到模型和模型到代码。由于MDA架构基于统一建模语言(Unified Modeling Language,UML),集成了一些工业系统架构,用于系统设计、可视化服务和数据转换。与传统的UML相比,MDA模型在抽象性、独立性和可操作性方面得到了显著的优化。因此,系统模型在运行过程中可以重复调用,并可以转换为适合不同平台的描述文件。基于此,MDA体系结构打破了设计语言的限制,将模型语言提升到程序语言级别。
然而,基于模型的代码生成方案并未确定引入的第三方库的版本。不同第三方库的类名称很可能一致,一旦引入错误,可能导致生成的代码中有非常隐蔽的错误,甚至引入带有安全风险的第三方库,对应用的安全性产生巨大挑战。
有鉴于此,本申请提供一种代码管理方法。该方法可以由代码管理系统执行。代码管理系统可以是软件系统。例如,代码管理系统可以是集成开发环境(Integrated Development Environment,IDE)等软件开发平台,该软件开发平台集成有第三方库推荐能力,或者代码管理系统可以是集成于软件开发平台的插件、扩展。代码管理系统部署在计算设备集群中,计算设备集群执行软件系统的代码,从而执行本申请的代码管理方法。在一些可能的实现方式中,代码管理系统也可以是硬件系统,例如是具有代码管理能力的计算设备集群,包括但不限于提供代码管理能力如第三方库推荐能力的云服务器,该云服务器提供软件即 服务(software as a service,SaaS)。硬件系统运行时,执行本申请的代码管理方法。
具体地,代码管理系统接收用户输入的注释,该注释为待生成的目标代码的自然语言描述,然后根据注释和存储有第三方库和标签的映射关系的标签仓库,确定与注释对应的至少一个第三方库,接着向用户推荐与注释对应的至少一个第三方库。
该方法针对当下流行的基于模型的代码生成技术进行进一步优化,在用户使用自然语言描述完想要实现的功能(即需求)时,可以基于标签仓库中第三方库和标签的映射关系,优先推荐用户一些可能需要的第三方库。由于第三方库和标签的映射关系是从海量的源码中提取得到,保障了推荐的第三方库可信、安全,与本地依赖没有冲突,并且在功能上满足用户实际的业务诉求,避免不同第三方库具有相同类名称导致引入错误的场景。进一步地,在代码自动生成阶段,通过第三方库选择好的框定行为来优化生成的内容,可以增加生成代码的安全性。
为了使得本申请的技术方案更加清楚、易于理解,下面结合附图对代码管理系统的系统架构进行介绍。
参见图1所示的代码管理系统的架构示意图,该代码管理系统100包括交互子系统102、推荐子系统104。其中,交互子系统102用于接收用户输入的注释,该注释为待生成的目标代码的自然语言描述。其中,自然语言描述可以描述用户的需求。推荐子系统104用于根据注释以及标签仓库106存储的第三方库和标签的映射关系,确定与注释对应的至少一个第三方库,然后向用户推荐与注释对应的至少一个第三方库。
图1以标签仓库106为代码管理系统100内置的标签仓库示例说明,在一些可能的实现方式中,标签仓库106也可以是第三方(如其他平台或开发者)提供,如此,代码管理系统100可以外接上述标签仓库106,以基于标签仓库106中第三方库和标签的映射关系进行第三方库推荐。
需要说明的是,代码管理系统100还可以包括数据挖掘子系统107。数据挖掘子系统107用于构建标签仓库,以支持第三方库推荐。其中,数据挖掘子系统107可以获取源码数据集,该源码数据集包括多个源码文件,然后根据多个源码文件的调用点,确定调用第三方库的源码文件,接着根据调用第三方库的源码文件,获得源码文件的调用模式(pattern),最后根据调用模式中每个第三方库对应的标签以及标签的出现频率,构建标签仓库。其中,第三方库对应的标签从调用模式中抽取得到。
进一步地,代码管理系统100还可以包括代码生成子系统108。代码生成子系统108用于根据推荐的至少一个第三方库生成目标代码。相应地,推荐子系统104还用于向用户推荐基于至少一个第三方库生成的目标代码。
本申请实施例通过将第三方库推荐能力与代码生成能力集成,从而解决图2所示的代码生成方案中第三方库引入失败的痛点问题。如图2所示,在用户通过自然语言表达需求后,代码生成工具可以生成详细的代码片段,如果代码片段中使用了第三方库,用户需要手动引入这些仓库。这个过程中有可能引入错误,如引入具有相同类名称的其他第三方库,或者是引入同一个第三方库的不同版本。本申请实施例的代码管理系统100集成推荐子系统104的第三方库推荐能力和代码生成子系统108的代码生成能力解决了上述问题。
在实际应用时,上述推荐子系统14与代码生成子系统108可以浅集成,或者是深集成。下面分别对浅集成、深集成的方案进行详细说明。
参见图3所示的浅集成方案的示意图,当用户通过交互子系统102输入需求,由推荐子系统104(库推荐工具)确定与上述需求对应的第三方库,该第三方库安全、可信,再由代码生成子系统108(代码生成工具)生成相应的代码片段,如此可以解决手动引入第三方库失败或者产生风险的问题。
参见图4所示的深集成的方案的示意图,当用户通过交互子系统102输入需求,代码生成子系统108(代码生成工具)可以针对该需求生成多个候选的代码片段,推荐子系统104可以针对该需求,确定与上述需求对应的第三方库。由于代码片段通常有对应依赖的第三方库,当代码片段依赖的第三方库包括推荐的第三方库时,表征使用这个代码片段的效果好,可以向用户推荐该代码片段。
接下来,从代码管理系统100的角度对本申请实施例的代码管理方法进行介绍。
参见图5所示的代码管理方法的流程图,该方法包括:
S502:代码管理系统100接收用户输入的注释。
注释,也称作注解,通常是待生成的目标代码的自然语言描述。注释可以由用户(如开发人员)根据自然语言编写得到,例如是用户采用自然语言描述待生成的目标代码的功能。在一些示例中,用户输入的注释可以为“I want to test the class and try to create a http method client”。
具体实现时,代码管理系统100可以提供代码编辑界面,该代码编辑界面可以是图形用户界面(graphical user interface,GUI)或者是命令用户界面(command user interface,CUI),用户可以通过上述代码编辑界面输入注释,代码管理系统100可以接收用户通过代码编辑界面输入的注释。
为了便于理解,下面以代码编辑界面为GUI示例说明。参见图6所示的通过代码编辑界面输入注释的示意图,代码编辑界面600承载有菜单组件602,用户可以通过菜单组件602中的控件新建代码文件,代码编辑界面600的代码编辑窗口604可以展示该新建的代码文件。用户可以在代码编辑窗口604向代码文件输入注释。其中,用户可以在具体的类中以注释方式输入相对明确的自然语言来描述希望实现的功能。例如,用户可以在JavaFile类中输入注释“I want to test the class and try to create a http method client”,以对上述JavaFile类进行测试,并生成一个http客户端方法。
S504:代码管理系统100根据注释以及标签仓库,确定与注释对应的至少一个第三方库。
用户输入注释以表达该用户真实想要实现的功能,代码管理系统100(例如是推荐子系统104)可以解析用户输入的注释,从中理解用户表达的真实意图,并根据算法模型,确定出与注释对应的至少一个第三方库,该第三方库可以视为与用户的真实意图匹配的第三方库。
代码管理系统100还可以设置第三方库推荐的触发条件。当用户触发三方库推荐时,即触发条件被满足时,代码管理系统100可以根据注释以及标签仓库,确定与所述注释对 应的至少一个第三方库。
其中,标签仓库存储有第三方库和标签的映射关系。代码管理系统100可以根据注释搜索标签仓库,获得与注释对应的至少一个第三方库,或者是将所述注释输入由所述标签仓库中第三方库和标签的映射关系训练得到的分类器,获得与所述注释对应的至少一个第三方库。
下面对确定与注释对应的至少一个第三方库的不同方式分别进行介绍。
第一种方式为搜索。具体地,代码管理系统100可以利用搜索引擎如弹性搜索(elastic search,ES)的搜索引擎来搜索标签仓库,从而确定与注释对应的至少一个第三方库。参见图7所示的搜索得到第三方库的流程示意图,标签仓库中第三方库和标签的映射关系可以预先存储至ES的搜索引擎(ES engine)中,当用户传递注释时,可以先使用拦截器(interceptor)进行拦截。其中,拦截器可以使用启发式规则,将用户输入的无效注释进行剔除,将具有功能意图的注释输入搜索引擎。
需要说明,注释在进入搜索引擎之前,代码管理系统100(例如是代码管理系统100调用的搜索引擎)还可以对注释进行分解(split),获得注释在多个层级的标签。代码管理系统100可以根据所述注释在多个层级的标签,分别搜索所述标签仓库,获得所述注释在多个层级的推荐三方库。代码管理系统100可以根据所述注释在多个层级的三方库,获得与所述注释对应的至少一个第三方库。其中,各个层级具有相应的权重,代码管理系统100可以根据权重进行加权运算,从而获得与所述注释对应的至少一个第三方库。
在图7的示例中,多个层级的标签可以包括Name级别标签、Manual级别标签和Auto级别标签中的任意组合。Name级别标签是最直观反映第三方库的信息的标签,通常以三方库的名称作为标签。Manual级别标签是权重中等的标签,它是通过官方网站的介绍,进行标签化拆解形成的标签,这些标签将从第三方库的功能角度进行描述,也是对第三方库相对准确的表达。Auto级别的标签是权重最低的标签,但同时也是量级最大的标签,Auto标签可以是从海量数据中挖掘而来的标签,这些标签是开发者在不同的场景下对当前场景为何使用该三方库的描述。Auto标签在单独出现时可能会有些片面,但在数据量足够大时,可以理解成在大量场景中使用该三方库使用的原因,是第三方库一种侧面的描述。
注释被分解得到上述三个层级的标签后,ES engine可以将Label在Name层级,Manual层级和Auto层级分别进行搜索。每次搜索可以得到对应层级的推荐三方库,三个层级的推荐三方库可以按照权重由高到低进行排列,代码管理系统100可以通过聚合模型,筛选出最终需要推荐的第三方库。
第二种方式为模型分类。具体地,代码管理系统100可以利用标签仓库中的第三方库与标签的映射关系,训练分类器Classifier。例如代码管理系统100可以根据第三方库及其标签构建样本,利用该样本,通过深度学习算法训练神经网络模型,获得分类器。如此,代码管理系统100可以将用户输入的注释输入至分类器,分类器可以提取注释的标签,并进行分类,从而输出与注释对应的至少一个第三方库。
在一些可能的实现方式中,代码管理系统100也可以结合智能搜索、模型分类共同确定与注释对应的至少一个第三方库,解决用户无法准确获取需要的三方库的问题。
参见图8所示的推荐第三方库的示意图,当用户输入自然语言描述的注释时,该注释 可以进入代码管理系统100(具体是推荐子系统104)的搜索模块中,搜索模块可以根据不同层级的标签,在标签仓库中进行搜索,获得与标签对应的至少一个第三方库。与此同时,注释也可以被输入到分类器中,分类器可以将注释的标签进行分类,得到与注释对应的至少一个第三方库。搜索模块和分类器确定的至少一个第三方库可以结合搜索模块和分类器各自的权重进行加权运算,从而确定最终的结果。
S506:代码管理系统100向用户推荐与注释对应的至少一个第三方库。
代码管理系统100可以通过代码管编辑界面,向用户展示与注释对应的至少一个第三方库,从而实现向用户推荐与注释对应的至少一个第三方库。当第三方库包括多个版本时,代码管理系统100还可以确定与注释对应的第三方库的版本,相应地,代码管理系统还可以向用户展示与注释对应的至少一个第三方库的版本。
为了便于理解,继续以测试类和创建http方法示例说明。如图9所示,当用户输入“I want to test the class and try to create a http client method”时,代码管理系统100推断用户很可能要实现一个对http进行操作的方法,代码管理系统100经过筛选,确定两个第三方库,具体为:
“org.apache.httpcomponents:httpclient:4.2.1-atlassian-1”;
“org.asynchttpclient:async-http-client:2.4.1”。
其中,4.2.1-atlassian-1、2.4.1表示第三方库的版本。需要说明的是,代码管理系统100可以确定各个第三方库的推荐程度,然后基于该推荐程度对第三方库进行排序,相应地,代码管理系统100在展示第三方库时,可以按照推荐程度由高至低的顺序展示,推荐程度高的第三方库可以在推荐列表的上方。此外,代码管理系统100还可以展示各第三方库的推荐程度,例如第一个第三方库的推荐程度可以为1.0,第二个第三方库的推荐程度相对低一些。这些第三方库的版本通常是安全、无漏洞的,并且大概率可以实现用户想要的功能。
S508:代码管理系统100基于推荐的至少一个第三方库生成目标代码。
具体地,代码管理系统100可以针对推荐至少一个第三方库中的每个第三方库分别生成目标代码。在一些示例中,用户也可以从推荐的至少一个第三方库中选择一个第三方库,然后代码管理系统100可以响应于用户对所述至少一个第三方库的选择操作,根据用户选择的第三方库生成目标代码。
需要说明的是,当用户选择添加一个或多个第三方库,代码管理系统100还可以在工程文件添加该第三方库的信息,以引入该第三方库。参见图10的第三方库引入的流程示意图,用户可以选择通过第三方库对应的添加(Add)控件触发该第三方库的添加操作,或者是通过触发全部(Add All)控件触发所有推荐的第三方库的添加操作,代码管理系统100可以响应于上述操作,自动在pom文件中加入相应的第三方库的信息(如图中标注框所示),进一步方便用户的开发工作。
S510:代码管理系统100向用户推荐目标代码。
当用户选择第三方库时,代码管理系统100可以响应于用户对所述至少一个第三方库的选择操作,向所述用户推荐基于所述用户选择的第三方库生成的所述目标代码。当用户未执行选择操作,代码管理系统100可以向用户推荐至少一个第三方库中每个第三方库对应的目标代码。
其中,代码管理系统100可以在向用户展示至少一个第三方库时,一并展示每个第三方库对应的目标代码,换言之,上述S506、S510可以并行执行。如此,用户可以直接选择合适的代码进行代码开发。
需要说明的是,当用户选择了推荐的第三方库或选择了推荐的目标代码,代码管理系统100还可以确定所述用户从所述至少一个第三方库中选择的第三方库,例如是用户直接选择的第三方库,或者是选择目标代码,从而选择的该目标代码所依赖的第三方库,然后根据用户选择的第三方库,更新多个层级的权重,由此提高推荐精度。需要说明的是,推荐子系统108采用智能搜索和模型分类共同推荐第三方库时,还可以基于用户选择的第三方库更新搜索模块和分类器的权重,不断优化推荐精度。
上述S508、S510为本申请实施例的可选步骤,执行本申请实施例的代码管理方法也可以不执行上述步骤。例如,用户手动编写代码时,代码管理系统100可以不执行上述S508、S510。
基于上述内容描述,本申请实施例提供一种代码管理方法。该方法针对当下流行的基于模型的代码生成技术进行进一步优化,在用户使用自然语言描述完想要实现的功能(即需求)时,可以基于标签仓库中第三方库和标签的映射关系,优先推荐用户一些可能需要的第三方库。由于第三方库和标签的映射关系是从海量的源码中提取得到,保障了推荐的第三方库可信、安全,与本地依赖没有冲突,并且在功能上满足用户实际的业务诉求,避免不同第三方库具有相同类名称导致引入错误的场景。进一步地,在代码自动生成阶段,通过第三方库选择好的框定行为来优化生成的内容,可以增加生成代码的安全性。
本申请实施例实现第三方库推荐的关键在于标签仓库。标签仓库可以由数据挖掘子系统107进行数据挖掘分析得到。具体地,数据挖掘子系统107可以获取源码数据集,该源码数据集包括多个源码文件,然后根据多个源码文件的调用点,确定调用第三方库的源码文件,接着根据所述调用第三方库的源码文件,获得所述源码文件的调用模式,再根据所述调用模式中每个第三方库对应的标签以及所述标签的出现频率,构建所述标签仓库。
数据挖掘子系统107支持多种语言的数据挖掘,例如支持Java,C,C++,Go,Python等语言,为了便于理解,下面以Java语言示例说明。
参见图11所示的数据挖掘子系统构建标签仓库的流程示意图,数据挖掘子系统107可以包括源码分析引擎、数据获取模块、标签获取模块,下面分别对各个模块的处理流程进行阐述。
在该方案中,数据获取模块首先下载海量的第三方库的源码,例如数据获取模块可以从开源网站上下载海量的开源源码工程,构成Java源码仓库,数据获取模块可以将这些源码使用源码分析引擎进行拆分,获取其中的全量的信息,比如类名ClassName,方法名MethodName等。数据获取模块将全部信息完成清洗后,可以存储到知识图谱中。知识图谱中包括第三方库的最细粒度的信息。如此完成原始数据的积累。
其中,源码分析引擎对源码进行拆分的过程可以参见图12,源码分析引擎可以对源码文件进行解析获得语法树(abstract syntax tree,AST),然后基于AST树的各个节点进行细粒度地拆分,将其拆分成类(class)级别、属性(field)级别、方法(method)级别,可 以得到类声明(class declaration)、属性声明(field declaration)、方法声明(method declaration)等。如此,源码分析引擎可以抽取出全部的声明和调用点。源码分析引擎可以将调用点与知识图谱中第三方库的数据进行适配,从而得到第三方库的调用模式(pattern)。
上述调用pattern可以用于构建训练集,并存储到训练集仓库。训练集仓库可以存储第三方库的三坐标(method,field,class)、调用点的JavaDoc、调用点的上下文以及client库的名称和其他相关信息。
标签获取模块针对训练集中的第三方库获取标签,以构建标签仓库。具体地,标签获取模块可以获取第三方库的名称,作为Name级标签,从第三方库的宣传文档(如官方网站的介绍文档)进行标签化拆解,从而获得Manual级别标签,从Java源码仓库中的Client工程中获取Auto级别标签。
Auto级别标签的挖掘流程如下:
标签获取模块将Java源码仓库中的Client工程依次进行拆分,具体到每一个Java文件,然后将所有Java文件放入到源码分析引擎中,拆解成最细粒度的原子节点,这些节点可以是Method级别,Field级别等。拆解完成之后与知识图谱中的调用点进行对比,一旦Mapping成功,即可认为Client工程中的这个文件调用了一种三方库。源码分析引擎将调用点对应method的JavaDoc抽取出来,即可获得第三方库和调用点的描述对应的调用Pattern。源码分析引擎将全部调用Pattern存储起来可以构成训练集仓库。
标签获取模块可以抽取调用Pattern中的注释,使用分词器进行拆分。分词器使用开源算法,将自然语言拆分成细粒度的词组,这些词组可以被称为标签。将所有的标签和第三方库进行统计,可以获得每一个第三方库对应的全量的标签,以及标签的出现频率。基于出现频率发现每一个标签的长尾效应,在数据量巨大的情况下,可以有效地将长尾部分切割掉,留存的第三方库和标签的映射关系即为可信的映射关系。最后标签获取模块可以将这些新挖掘的调用Pattern存储,从而获得每一个第三方库的细粒度的描述。
图13示出了标签仓库中第三方库与标签的映射关系,以第三方库Guava为例,其name级别标签为guava,其manual级别标签是从官网上的描述挖掘而来,为google,read。其Auto标签是从海量第三方工程中挖掘而来,其中read出现的频率最高,write和io出现的频率相对较低。
基于上述实施例的代码管理方法,本申请还提供一种代码管理系统100,如图1所示,该系统100包括:
交互子系统102,用于接收用户输入的注释,所述注释为待生成的目标代码的自然语言描述;
推荐子系统104,用于根据所述注释以及标签仓库106,确定与所述注释对应的至少一个第三方库,所述标签仓库106存储有第三方库和标签的映射关系;
所述推荐子系统104,还用于向所述用户推荐与所述注释对应的至少一个第三方库。
示例性地,上述交互子系统102、推荐子系统104可以通过硬件实现,或者可以通过软件实现。例如,交互子系统102、推荐子系统104可以是软件或软件的功能模块;又例如,交互子系统102、推荐子系统104也可以是具有相应功能的硬件,例如是部署有相应 功能软件的计算设备集群。
为了便于描述,下面以推荐子系统104示例说明。
当通过软件实现时,推荐子系统104可以是运行在计算设备上的应用程序,如计算引擎等。该应用程序可以以虚拟化服务的方式提供给用户使用。虚拟化服务可以包括虚拟机(virtual machine,VM)服务、裸金属服务器(bare metal server,BMS)服务以及容器(container)服务。其中,VM服务可以是通过虚拟化技术在多个物理主机(如计算设备)上虚拟出虚拟机(virtual machine,VM)资源池以为用户按需提供VM进行使用的服务。BMS服务是在多个物理主机上虚拟出BMS资源池以为用户按需提供BMS进行使用的服务。容器服务是在多个物理主机上虚拟出容器资源池以为用户按需提供容器进行使用的服务。VM是模拟出来的一台虚拟的计算机,也即逻辑上的一台计算机。BMS是一种可弹性伸缩的高性能计算服务,计算性能与传统物理机无差别,具有安全物理隔离的特点。容器是一种内核虚拟化技术,可以提供轻量级的虚拟化,以达到隔离用户空间、进程和资源的目的。应理解,上述虚拟化服务中的VM服务、BMS服务以及容器服务仅仅是作为具体的示例,在实际应用中,虚拟化服务还可以是其他轻量级或者重量级的虚拟化服务,此处不作具体限定。
当通过硬件实现时,推荐子系统104中可以包括至少一个计算设备,如服务器等。或者,推荐子系统104也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。
在一些可能的实现方式中,所述推荐子系统104具体用于:
根据所述注释搜索所述标签仓库106,获得与所述注释对应的至少一个第三方库。
在一些可能的实现方式中,所述推荐子系统104具体用于:
对所述注释进行分解,获得所述注释在多个层级的标签;
根据所述注释在多个层级的标签,分别搜索所述标签仓库106,获得所述注释在多个层级的推荐三方库;
根据所述注释在多个层级的三方库,获得与所述注释对应的至少一个第三方库。
在一些可能的实现方式中,所述推荐子系统104还用于:
确定所述用户从所述至少一个第三方库中选择的第三方库;
根据所述用户选择的第三方库,更新所述多个层级的权重。
在一些可能的实现方式中,所述推荐子系统104具体用于:
将所述注释输入由所述标签仓库中第三方库和标签的映射关系训练得到的分类器,获得与所述注释对应的至少一个第三方库。
在一些可能的实现方式中,所述推荐子系统104还用于:
向所述用户推荐基于所述至少一个第三方库生成的所述目标代码。
在一些可能的实现方式中,所述推荐子系统104具体用于:
响应于用户对所述至少一个第三方库的选择操作,向所述用户推荐基于所述用户选择的第三方库生成的所述目标代码。
在一些可能的实现方式中,所述系统100还包括生成子系统108;
所述生成子系统108,用于根据与所述注释对应的至少一个第三方库中的每个第三方库,分别生成每个第三方库对应的所述目标代码;
所述推荐子系统104具体用于:
向所述用户推荐每个第三方库对应的所述目标代码。
与推荐子系统104类似,生成子系统108可以通过软件实现,或者通过硬件实现。
当通过软件实现时,生成子系统108可以是运行在计算设备上的应用程序,如计算引擎等。该应用程序可以以虚拟化服务的方式提供给用户使用。当通过硬件实现时,生成子系统108中可以包括至少一个计算设备,如服务器等。或者,生成子系统108也可以是利用专用集成电路ASIC实现、或可编程逻辑器件PLD实现的设备等。
在一些可能的实现方式中,所述推荐子系统104具体用于:
当所述用户触发三方库推荐时,根据注释以及标签仓库,确定与所述注释对应的至少一个第三方库。
在一些可能的实现方式中,所述系统100还包括:
数据挖掘子系统107,用于获取源码数据集,所述源码数据集包括多个源码文件,根据所述多个源码文件的调用点,确定调用第三方库的源码文件,根据所述调用第三方库的源码文件,获得所述源码文件的调用模式,根据所述调用模式中每个第三方库对应的标签以及所述标签的出现频率,构建所述标签仓库,所述第三方库对应的标签从所述调用模式中抽取得到。
与推荐子系统104类似,数据挖掘子系统107可以通过软件实现,或者通过硬件实现。
当通过软件实现时,数据挖掘子系统107可以是运行在计算设备上的应用程序,如计算引擎等。该应用程序可以以虚拟化服务的方式提供给用户使用。当通过硬件实现时,数据挖掘子系统107中可以包括至少一个计算设备,如服务器等。或者,数据挖掘子系统107也可以是利用专用集成电路ASIC实现、或可编程逻辑器件PLD实现的设备等。
本申请还提供一种计算设备1400。如图14所示,计算设备1400包括:总线1402、处理器1404、存储器1406和通信接口1408。处理器1404、存储器1406和通信接口1408之间通过总线1402通信。计算设备1400可以是服务器或终端设备。应理解,本申请不限定计算设备1400中的处理器、存储器的个数。
总线1402可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图14中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线1402可包括在计算设备1400各个部件(例如,存储器1406、处理器1404、通信接口1408)之间传送信息的通路。
处理器1404可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。
存储器1406可以包括易失性存储器(volatile memory),例如随机存取存储器(random  access memory,RAM)。处理器1404还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。存储器1406中存储有可执行的程序代码,处理器1404执行该可执行的程序代码以实现前述代码管理方法。具体的,存储器1406上存有代码管理系统100用于执行代码管理方法的指令。
通信接口1408使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备1400与其他设备或通信网络之间的通信。
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。
如图15所示,所述计算设备集群包括至少一个计算设备1400。计算设备集群中的一个或多个计算设备1400中的存储器1406中可以存有相同的代码管理系统100用于执行代码管理方法的指令。
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备1400也可以用于执行代码管理系统100用于执行代码管理方法的部分指令。换言之,一个或多个计算设备1400的组合可以共同执行代码管理系统100用于执行代码管理方法的指令。
需要说明的是,计算设备集群中的不同的计算设备1400中的存储器1406可以存储不同的指令,用于执行代码管理系统100的部分功能。
图16示出了一种可能的实现方式。如图16所示,两个计算设备1400A和1400B通过通信接口1408实现连接。计算设备1400A中的存储器上存有用于执行交互子系统10的功能的指令,计算设备1400B中的存储器上存有用于执行推荐子系统104的功能的指令。进一步地,计算设备1400A中的存储器上还可以存有用于执行代码生成子系统108的功能的指令,计算设备1400B中的存储器上还可以存有用于执行数据挖掘子系统107的功能的指令。其中,数据挖掘子系统107构建的数据仓库106可以存储在计算设备1400B的存储器中。换言之,计算设备1400A和1400B的存储器1406共同存储了代码管理系统100用于执行代码管理方法的指令。
图16所示的计算设备集群之间的连接方式可以是考虑到本申请提供的代码管理方法需要消耗较多的计算资源进行第三方库推荐。因此,考虑将推荐子系统104实现的功能交由计算设备1400B执行。
应理解,图16中示出的计算设备1400A的功能也可以由多个计算设备1400完成。同样,计算设备1400B的功能也可以由多个计算设备1400完成。
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。图17示出了一种可能的实现方式。如图17所示,两个计算设备1400C和1400D之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,计算设备1400C中的存储器上存有用于执行交互子系统10的功能的指令,计算设备1400D中的存储器上存有用于执行推 荐子系统104的功能的指令。进一步地,计算设备1400C中的存储器上还可以存有用于执行代码生成子系统108的功能的指令,计算设备1400D中的存储器上还可以存有用于执行数据挖掘子系统107的功能的指令。其中,数据挖掘子系统107构建的数据仓库106可以存储在计算设备1400D的存储器中。换言之,计算设备1400C和1400D的存储器1406共同存储了代码管理系统100用于执行代码管理方法的指令。
图17所示的计算设备集群之间的连接方式可以是考虑到本申请提供的代码管理方法需要消耗较多的计算资源进行第三方库推荐。因此,考虑将推荐子系统104实现的功能交由计算设备1400D执行。
应理解,图17中示出的计算设备1400C的功能也可以由多个计算设备1400完成。同样,计算设备1400D的功能也可以由多个计算设备1400完成。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行上述应用于代码管理系统100用于执行代码管理方法。
本申请实施例还提供了一种包含指令的计算机程序产品。所述计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当所述计算机程序产品在至少一个计算机设备上运行时,使得至少一个计算机设备执行上述代码管理方法。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的保护范围。

Claims (23)

  1. 一种代码管理方法,其特征在于,由代码管理系统执行,所述方法包括:
    接收用户输入的注释,所述注释为待生成的目标代码的自然语言描述;
    根据所述注释以及标签仓库,确定与所述注释对应的至少一个第三方库,所述标签仓库存储有第三方库和标签的映射关系;
    向所述用户推荐与所述注释对应的至少一个第三方库。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述注释以及标签仓库,确定与所述注释对应的至少一个第三方库,包括:
    根据所述注释搜索所述标签仓库,获得与所述注释对应的至少一个第三方库。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述注释搜索所述标签仓库,获得与所述注释对应的至少一个第三方库,包括:
    对所述注释进行分解,获得所述注释在多个层级的标签;
    根据所述注释在多个层级的标签,分别搜索所述标签仓库,获得所述注释在多个层级的推荐三方库;
    根据所述注释在多个层级的三方库,获得与所述注释对应的至少一个第三方库。
  4. 根据权利要求2或3所述的方法,其特征在于,所述方法还包括:
    确定所述用户从所述至少一个第三方库中选择的第三方库;
    根据所述用户选择的第三方库,更新所述多个层级的权重。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述根据所述注释以及标签仓库,确定与所述注释对应的至少一个第三方库,包括:
    将所述注释输入由所述标签仓库中第三方库和标签的映射关系训练得到的分类器,获得与所述注释对应的至少一个第三方库。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述方法还包括:
    向所述用户推荐基于所述至少一个第三方库生成的所述目标代码。
  7. 根据权利要求6所述的方法,其特征在于,所述向所述用户推荐基于所述至少一个第三方库生成的所述目标代码,包括:
    响应于用户对所述至少一个第三方库的选择操作,向所述用户推荐基于所述用户选择的第三方库生成的所述目标代码。
  8. 根据权利要求6所述的方法,其特征在于,所述向所述用户推荐基于所述至少一个第三方库生成的所述目标代码,包括:
    根据与所述注释对应的至少一个第三方库中的每个第三方库,分别生成每个第三方库对应的所述目标代码;
    向所述用户推荐每个第三方库对应的所述目标代码。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述根据所述注释以及标签仓库,确定与所述注释对应的至少一个第三方库,包括:
    当所述用户触发三方库推荐时,根据注释以及标签仓库,确定与所述注释对应的至少一个第三方库。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述方法还包括:
    获取源码数据集,所述源码数据集包括多个源码文件;
    根据所述多个源码文件的调用点,确定调用第三方库的源码文件;
    根据所述调用第三方库的源码文件,获得所述源码文件的调用模式;
    根据所述调用模式中每个第三方库对应的标签以及所述标签的出现频率,构建所述标签仓库,所述第三方库对应的标签从所述调用模式中抽取得到。
  11. 一种代码管理系统,其特征在于,所述系统包括:
    交互子系统,用于接收用户输入的注释,所述注释为待生成的目标代码的自然语言描述;
    推荐子系统,用于根据所述注释以及标签仓库,确定与所述注释对应的至少一个第三方库,所述标签仓库存储有第三方库和标签的映射关系;
    所述推荐子系统,还用于向所述用户推荐与所述注释对应的至少一个第三方库。
  12. 根据权利要求11所述的系统,其特征在于,所述推荐子系统具体用于:
    根据所述注释搜索所述标签仓库,获得与所述注释对应的至少一个第三方库。
  13. 根据权利要求12所述的系统,其特征在于,所述推荐子系统具体用于:
    对所述注释进行分解,获得所述注释在多个层级的标签;
    根据所述注释在多个层级的标签,分别搜索所述标签仓库,获得所述注释在多个层级的推荐三方库;
    根据所述注释在多个层级的三方库,获得与所述注释对应的至少一个第三方库。
  14. 根据权利要求12或13所述的系统,其特征在于,所述推荐子系统还用于:
    确定所述用户从所述至少一个第三方库中选择的第三方库;
    根据所述用户选择的第三方库,更新所述多个层级的权重。
  15. 根据权利要求11至14任一项所述的系统,其特征在于,所述推荐子系统具体用于:
    将所述注释输入由所述标签仓库中第三方库和标签的映射关系训练得到的分类器,获得与所述注释对应的至少一个第三方库。
  16. 根据权利要求11至15任一项所述的系统,其特征在于,所述推荐子系统还用于:
    向所述用户推荐基于所述至少一个第三方库生成的所述目标代码。
  17. 根据权利要求16所述的系统,其特征在于,所述推荐子系统具体用于:
    响应于用户对所述至少一个第三方库的选择操作,向所述用户推荐基于所述用户选择的第三方库生成的所述目标代码。
  18. 根据权利要求16所述的系统,其特征在于,所述系统还包括生成子系统;
    所述生成子系统,用于根据与所述注释对应的至少一个第三方库中的每个第三方库,分别生成每个第三方库对应的所述目标代码;
    所述推荐子系统具体用于:
    向所述用户推荐每个第三方库对应的所述目标代码。
  19. 根据权利要求11至18任一项所述的系统,其特征在于,所述推荐子系统具体用于:
    当所述用户触发三方库推荐时,根据注释以及标签仓库,确定与所述注释对应的至少一个第三方库。
  20. 根据权利要求11至19任一项所述的系统,其特征在于,所述系统还包括:
    数据挖掘子系统,用于获取源码数据集,所述源码数据集包括多个源码文件,根据所述多个源码文件的调用点,确定调用第三方库的源码文件,根据所述调用第三方库的源码文件,获得所述源码文件的调用模式,根据所述调用模式中每个第三方库对应的标签以及所述标签的出现频率,构建所述标签仓库,所述第三方库对应的标签从所述调用模式中抽取得到。
  21. 一种计算设备集群,其特征在于,所述计算设备集群包括至少一台计算设备,所述至少一台计算设备包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储有计算机可读指令;所述至少一个处理器执行所述计算机可读指令,以使得所述计算设备集群执行如权利要求1至10中任一项所述的方法。
  22. 一种计算机可读存储介质,其特征在于,包括计算机可读指令;所述计算机可读指令用于实现权利要求1至10任一项所述的方法。
  23. 一种计算机程序产品,其特征在于,包括计算机可读指令;所述计算机可读指令用于实现权利要求1至10任一项所述的方法。
PCT/CN2023/081059 2022-08-10 2023-03-13 一种代码管理方法及相关设备 WO2024031983A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210953688.X 2022-08-10
CN202210953688 2022-08-10
CN202211665526.2 2022-12-23
CN202211665526.2A CN117632224A (zh) 2022-08-10 2022-12-23 一种代码管理方法及相关设备

Publications (1)

Publication Number Publication Date
WO2024031983A1 true WO2024031983A1 (zh) 2024-02-15

Family

ID=89850570

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/081059 WO2024031983A1 (zh) 2022-08-10 2023-03-13 一种代码管理方法及相关设备

Country Status (1)

Country Link
WO (1) WO2024031983A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108717470A (zh) * 2018-06-14 2018-10-30 南京航空航天大学 一种具有高准确度的代码片段推荐方法
CN110716749A (zh) * 2019-09-03 2020-01-21 东南大学 一种基于功能相似度匹配的代码搜索方法
CN110795080A (zh) * 2019-10-21 2020-02-14 山东舜知信息科技有限公司 一种基于数据库注释的代码自动生成系统及构建方法
CN112507065A (zh) * 2020-11-18 2021-03-16 电子科技大学 一种基于注释语义信息的代码搜索方法
CN112966095A (zh) * 2021-04-06 2021-06-15 南通大学 一种基于jean的软件代码推荐方法
US20210191696A1 (en) * 2020-12-14 2021-06-24 Intel Corporation Methods, apparatus, and articles of manufacture to identify and interpret code

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108717470A (zh) * 2018-06-14 2018-10-30 南京航空航天大学 一种具有高准确度的代码片段推荐方法
CN110716749A (zh) * 2019-09-03 2020-01-21 东南大学 一种基于功能相似度匹配的代码搜索方法
CN110795080A (zh) * 2019-10-21 2020-02-14 山东舜知信息科技有限公司 一种基于数据库注释的代码自动生成系统及构建方法
CN112507065A (zh) * 2020-11-18 2021-03-16 电子科技大学 一种基于注释语义信息的代码搜索方法
US20210191696A1 (en) * 2020-12-14 2021-06-24 Intel Corporation Methods, apparatus, and articles of manufacture to identify and interpret code
CN112966095A (zh) * 2021-04-06 2021-06-15 南通大学 一种基于jean的软件代码推荐方法

Similar Documents

Publication Publication Date Title
US11379227B2 (en) Extraquery context-aided search intent detection
US10649836B2 (en) Detecting an error message and automatically presenting links to relevant solution pages
US10019716B1 (en) Method for feedback submission resolution
US9772890B2 (en) Sophisticated run-time system for graph processing
US9400700B2 (en) Optimized system for analytics (graphs and sparse matrices) operations
CN111736840A (zh) 小程序应用的编译方法、运行方法、存储介质及电子设备
WO2022089188A1 (zh) 一种代码处理方法、装置、设备及介质
US11789913B2 (en) Integration of model execution engine containers with a model development environment
US9141344B2 (en) Hover help support for application source code
WO2018161509A1 (zh) 条件编译预处理方法、终端及存储介质
KR20220113372A (ko) 통합된 참조물 및 2차 객체 표시
CN114090155A (zh) 机器人流程自动化界面元素定位方法、装置和存储介质
US9898467B1 (en) System for data normalization
US11086600B2 (en) Back-end application code stub generation from a front-end application wireframe
US20210150289A1 (en) Text classification for input method editor
CN107077365B (zh) 有选择地加载预编译的头部和/或其部分
US10872085B2 (en) Recording lineage in query optimization
WO2024031983A1 (zh) 一种代码管理方法及相关设备
US10635483B2 (en) Automatic synopsis generation for command-line interfaces
US11250084B2 (en) Method and system for generating content from search results rendered by a search engine
US20220254150A1 (en) Exceeding the limits of visual-linguistic multi-task learning
US20220300404A1 (en) Qualifying impacts of third-party code changes on dependent software
US10620946B1 (en) Dynamic modeling for opaque code during static analysis
CN117632224A (zh) 一种代码管理方法及相关设备
WO2024082983A1 (zh) 一种代码推荐方法、装置及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23851219

Country of ref document: EP

Kind code of ref document: A1