CN112836477B - Method and device for generating code annotation document, electronic equipment and storage medium - Google Patents

Method and device for generating code annotation document, electronic equipment and storage medium Download PDF

Info

Publication number
CN112836477B
CN112836477B CN202110057846.9A CN202110057846A CN112836477B CN 112836477 B CN112836477 B CN 112836477B CN 202110057846 A CN202110057846 A CN 202110057846A CN 112836477 B CN112836477 B CN 112836477B
Authority
CN
China
Prior art keywords
annotation
text
code
generating
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110057846.9A
Other languages
Chinese (zh)
Other versions
CN112836477A (en
Inventor
吴迪
凌利虎
虞佳祺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
17win Network Technology Co ltd
Original Assignee
17win Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 17win Network Technology Co ltd filed Critical 17win Network Technology Co ltd
Priority to CN202110057846.9A priority Critical patent/CN112836477B/en
Publication of CN112836477A publication Critical patent/CN112836477A/en
Application granted granted Critical
Publication of CN112836477B publication Critical patent/CN112836477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/73Program documentation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Library & Information Science (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application discloses a method for generating a code annotation document, which comprises the following steps: receiving an annotation generation request and determining an object code file corresponding to the annotation generation request; analyzing the target code file into a Psi tree, and determining a text to be annotated according to the Psi tree; and generating annotation content corresponding to the text to be annotated by utilizing an annotation dictionary, and generating an annotation document of the target code file according to the annotation content. The method and the device can automatically generate the code annotation document, and improve the generation efficiency of the code annotation document. The application also discloses a code annotation document generation device, electronic equipment and a storage medium, which have the beneficial effects.

Description

Method and device for generating code annotation document, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a code annotation document, an electronic device, and a storage medium.
Background
The code annotation document is annotation description of code content, and description information such as class, method, parameter, return value, member and the like can be acquired from the program source code through the code annotation document, so that an API interface document corresponding to the source code is generated. Good, complete and detailed code annotation documents help developers to learn code more conveniently, quickly and accurately so that code readers can know the specific roles and functions of the corresponding methods or classes without reading the source code. The code annotation document may also help the caller of the API to write the calling code conveniently.
In the related art, the code annotation document is completely manually maintained by a developer, and the efficiency is low. When the project is large, maintenance of the code annotation document often takes more time to develop.
Therefore, how to automatically generate a code annotation document and improve the generation efficiency of the code annotation document are technical problems that a person skilled in the art needs to solve at present.
Disclosure of Invention
The purpose of the application is to provide a method and a device for generating a code annotation document, an electronic device and a storage medium, which can automatically generate the code annotation document and improve the generation efficiency of the code annotation document.
In order to solve the above technical problems, the present application provides a method for generating a code annotation document, which includes:
receiving an annotation generation request and determining an object code file corresponding to the annotation generation request;
analyzing the target code file into a Psi tree, and determining a text to be annotated according to the Psi tree;
and generating annotation content corresponding to the text to be annotated by utilizing an annotation dictionary, and generating an annotation document of the target code file according to the annotation content.
Optionally, determining the text to be annotated according to the Psi tree includes:
obtaining class names from PsiClass objects in the PsiClass tree, and determining PsiMethod objects in the PsiClass objects;
and setting the class name and the method signature of the PsiMethod object as the text to be annotated.
Optionally, generating annotation content corresponding to the text to be annotated by using an annotation dictionary, and generating an annotation document of the object code file according to the annotation content, including:
generating annotation content corresponding to the class name by using an annotation dictionary;
generating annotation content corresponding to the method signature by using the annotation dictionary;
and splicing the annotation content corresponding to the class name and the annotation content corresponding to the method to generate the annotation document of the target code file.
Optionally, generating the annotation content corresponding to the method signature by using the annotation dictionary includes:
acquiring a method name of the PsiMethod object, and generating annotation content corresponding to the method name by utilizing the annotation dictionary;
acquiring parameters of the PsiMethod object, and generating annotation content corresponding to the parameters by utilizing the annotation dictionary;
and acquiring the return value type of the PsiMethod object, and generating annotation content corresponding to the return value type by utilizing the annotation dictionary.
Optionally, the generating process of the annotation dictionary includes:
acquiring a sample code file, and determining a code text in the sample code file; wherein the code text comprises any one or a combination of a plurality of class names, method names and parameter names;
generating annotation text corresponding to the code text by using the sample annotation document;
performing word segmentation processing on the code text to obtain a word segmentation text, and setting an annotation text of the word segmentation text to be blank;
combining all the code texts and the annotation texts of the word segmentation texts to obtain text corresponding relations; the text corresponding relation comprises a corresponding relation between a code writing language and an annotation language;
and generating the annotation dictionary according to the text corresponding relation.
Optionally, merging all the code texts and the word segmentation texts to obtain text corresponding relations, including:
removing invalid texts in the annotation text to obtain alternative annotation text; the invalid text comprises an annotation text with empty content and an annotation text with occurrence times smaller than a preset value;
merging annotation items of the code text and the alternative annotation text of the word segmentation text according to text content; the annotation texts in the annotation items are arranged in descending order according to the occurrence frequency;
and generating a text corresponding relation according to the text content of the code text and the word segmentation text and the corresponding annotation item.
Optionally, the generating, by using an annotation dictionary, annotation content corresponding to the text to be annotated includes:
determining a target word to be annotated currently from the text to be annotated;
judging whether dictionary entries corresponding to the target words exist in the annotation dictionary or not;
if yes, determining the annotation content according to dictionary entries corresponding to the target words;
if not, performing word segmentation processing on the target word to obtain a target word, generating annotation contents corresponding to all the target word by using the annotation dictionary, and splicing the annotation contents corresponding to all the word to be annotated to obtain the annotation contents corresponding to the target word.
The application also provides a device for generating the code annotation document, which comprises the following steps:
the file determining module is used for receiving the annotation generation request and determining an object code file corresponding to the annotation generation request;
the text determining module is used for analyzing the target code file into a Psi tree and determining a text to be annotated according to the Psi tree;
and the annotation generation module is used for generating annotation content corresponding to the text to be annotated by utilizing an annotation dictionary and generating an annotation document of the target code file according to the annotation content.
The present application also provides a storage medium having stored thereon a computer program which, when executed, implements the steps performed by the above-described code annotation document generation method.
The application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps executed by the code annotation document generation method when calling the computer program in the memory.
The application provides a method for generating a code annotation document, which comprises the following steps: receiving an annotation generation request and determining an object code file corresponding to the annotation generation request; analyzing the target code file into a Psi tree, and determining a text to be annotated according to the Psi tree; and generating annotation content corresponding to the text to be annotated by utilizing an annotation dictionary, and generating an annotation document of the target code file according to the annotation content.
After receiving the annotation generation request, the application determines an object code file that needs to generate a code annotation document. The method comprises the steps that a target code file is analyzed into a Psi tree, a text to be annotated in the target code file can be determined, annotation content corresponding to the text to be annotated can be generated through an annotation dictionary, and then annotation documents of the target code file are generated by combining all the annotation content. The process does not need to be manually participated, the content needing to be annotated in the target code file can be automatically queried and annotated, the code annotation document can be automatically generated, and the generation efficiency of the code annotation document is improved. The application also provides a device for generating the code annotation document, an electronic device and a storage medium, which have the beneficial effects and are not repeated here.
Drawings
For a clearer description of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for generating a code annotation document according to an embodiment of the present application;
FIG. 2 is a flowchart of a code annotation method according to an embodiment of the present application;
FIG. 3 is a flowchart of a dictionary generating method according to an embodiment of the present application;
fig. 4 is a flowchart of a java doc generating method provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a device for generating a code annotation document according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Referring now to fig. 1, fig. 1 is a flowchart of a method for generating a code annotation document according to an embodiment of the present application.
The specific steps may include:
s101: receiving an annotation generation request and determining an object code file corresponding to the annotation generation request;
the annotation generation request may be a request sent by the code submitting device or a request issued by the user. After receiving the annotation generation request, the annotation generation request may be parsed for a file identification, and a corresponding object code file may be determined based on the file identification.
S102: analyzing the target code file into a Psi tree, and determining a text to be annotated according to the Psi tree;
the embodiment may be implemented based on plug-in technology through an IDE (Integrated Development Environment ), and the object code file is parsed into a Psi tree through an IDE interface, so that class names, psi methods and Psi field fields in the Psi tree may be used as text to be annotated.
S103: and generating annotation content corresponding to the text to be annotated by utilizing the annotation dictionary, and generating an annotation document of the target code file according to the annotation content.
After determining the text to be annotated, the annotation dictionary may be called to generate annotation content corresponding to the text to be annotated. In particular, the annotation dictionary may include a plurality of entries, each entry including code text of a code writing language and annotation text of a corresponding annotation language. According to the method and the device, all dictionary entries of the annotation dictionary can be loaded, each dictionary entry is analyzed into { Key, value }, and annotation content corresponding to the text to be annotated is generated by utilizing the analyzed dictionary entries. Key is code writing language, value is annotation language. The annotation content is the result of annotating specific content in the target code file, and the embodiment can integrate all the annotation content to obtain the annotation document of the target code file.
The present embodiment, upon receiving an annotation generation request, determines an object code file that needs to generate a code annotation document. The method comprises the steps that a target code file is analyzed into a Psi tree, a text to be annotated in the target code file can be determined, annotation content corresponding to the text to be annotated can be generated through an annotation dictionary, and then annotation documents of the target code file are generated by combining all the annotation content. The process does not need to be manually participated, the content needing to be annotated in the target code file can be automatically queried and annotated, the code annotation document can be automatically generated, and the generation efficiency of the code annotation document is improved.
As a further introduction to the corresponding embodiment of fig. 1, determining text to be annotated may include: obtaining class names from PsiClass objects in the PsiClass tree, and determining PsiMethod objects in the PsiClass objects; and setting the class name and the method signature of the PsiMethod object as the text to be annotated. Accordingly, the process of annotating a document is as follows: generating annotation content corresponding to the class name by using an annotation dictionary; generating annotation content corresponding to the method signature by using the annotation dictionary; and splicing the annotation content corresponding to the class name and the annotation content corresponding to the method to generate the annotation document of the target code file.
The method signature includes the method name, the parameter, and the return value type, so the present embodiment can generate the annotation content corresponding to the method signature by: acquiring a method name of the PsiMethod object, and generating annotation content corresponding to the method name by utilizing the annotation dictionary; acquiring parameters of the PsiMethod object, and generating annotation content corresponding to the parameters by utilizing the annotation dictionary; and acquiring the return value type of the PsiMethod object, and generating annotation content corresponding to the return value type by utilizing the annotation dictionary.
Referring to fig. 2, fig. 2 is a flowchart of a code annotation method provided by the embodiment of the present application, the embodiment specifically describes a process of generating annotation text based on an annotation dictionary, and performs word segmentation processing on text that cannot be annotated by the annotation dictionary, so as to improve coverage rate of code annotation, and the embodiment may be combined with an embodiment corresponding to fig. 1 to obtain a further implementation, where the embodiment may include the following steps:
s201: determining a target word to be annotated currently from the text to be annotated;
s202: judging whether dictionary entries corresponding to target words exist in the annotation dictionary; if yes, go to S203; if not, entering S204;
s203: determining annotation content according to dictionary entries corresponding to the target words;
s204: and performing word segmentation processing on the target words to obtain target words, generating annotation contents corresponding to all the target words by using the annotation dictionary, and splicing the annotation contents corresponding to all the words to be annotated to obtain the annotation contents corresponding to the target words.
In the code writing process, a plurality of words are written in succession, namely: the "get friuit Type" is written as "getfriuittype". In this embodiment, firstly, whether dictionary entries of "getfriittype" exist in the annotation dictionary is queried, if yes, corresponding annotation content is output, if not, word segmentation processing is performed on the "getfriittype" to obtain target words "get", "friit" and "Type", dictionary entries corresponding to "get", "friit" and "Type" in the annotation dictionary are queried respectively to obtain the annotation content, and all the annotation content is spliced to obtain the annotation content corresponding to the text to be annotated. Further, if there is no dictionary entry corresponding to the word, the word itself may be returned. For example, dictionary entries of "getfriittype" and "friit" do not exist in the annotation dictionary, but dictionary entries of "get" and "Type" exist, and the annotation content obtained at this time is "get friit Type". In the process, searching is performed in a mode of firstly integrating and then segmenting words, so that the generation efficiency and the accuracy of the annotation document are improved.
As a possible implementation, the present embodiment may generate the annotation dictionary by: acquiring a sample code file, and determining a code text in the sample code file; wherein the code text comprises any one or a combination of a plurality of class names, method names and parameter names; generating annotation text corresponding to the code text by using the sample annotation document; performing word segmentation processing on the code text to obtain a word segmentation text, and setting an annotation text of the word segmentation text to be blank; combining all the code texts and the annotation texts of the word segmentation texts to obtain text corresponding relations; the text corresponding relation comprises a corresponding relation between a code writing language and an annotation language; and generating the annotation dictionary according to the text corresponding relation.
The sample code file is a preset code file, and the sample annotation document is an existing annotation document. And if the annotation text corresponding to the code text exists in the sample annotation document, directly generating the annotation text corresponding to the code text. In this embodiment, the code text may be subjected to word segmentation to obtain a word segmentation text, and the comment text corresponding to the word segmentation text may be set to be blank. Furthermore, in the embodiment, the capital letters are used as the initial letters of a word segmentation, and then the word segmentation is realized through letter size writing. For example, if the code text "getfriittype" exists, the embodiment may split the code text "getfriittype" into "get", "friit" and "Type", and set the notes of "get", "friit" and "Type" to null.
Further, in this embodiment, the text correspondence may be obtained by combining all the code text and the word segmentation text in the following manner: removing invalid texts in the annotation text to obtain alternative annotation text; the invalid text comprises an annotation text with empty content and an annotation text with occurrence times smaller than a preset value; merging annotation items of the code text and the alternative annotation text of the word segmentation text according to text content; the annotation texts in the annotation items are arranged in descending order according to the occurrence frequency; and generating a text corresponding relation according to the text content of the code text and the word segmentation text and the corresponding annotation item. In the above embodiment, the translation dictionary is composed by analyzing existing codes and extracting existing javadocs.
The flow described in the above embodiment is explained below by a process of generating an annotation document Javadoc of Java code in an actual application.
Javadoc is a specific Java annotation, is a technology provided by Java language, and can obtain description information such as class, method, parameter, return value, member and the like from program source code through Javadoc, so as to generate an API interface document corresponding to the source code. Good, complete and detailed Javadoc helps developers to know codes more conveniently, quickly and accurately, so that code readers can know the specific actions and functions of corresponding methods or classes without reading source codes. For the caller of the API, the calling code is written conveniently. At present, research and development personnel often pay attention to the Javadoc in the development process, and cannot know the effect of the Javadoc. Most developers attach importance to code, but not to Javadoc. The main reason is that in the development process, the Javadoc needs manual maintenance of a developer, and the developer can take more time to process writing and modify the Javadoc.
In the related art, the Java doc is mainly maintained manually by a developer, and the efficiency is low. When the project is large, the maintenance of the Java doc often needs to occupy more development time, and in addition, the manual maintenance also can bring the problems of Java doc deletion, error, untimely update and the like. In order to solve the problems, the embodiment provides a scheme for automatically generating the Javadoc by means of IDE based on dictionary function, which can help developers to quickly complete writing of the Javadoc, improve development efficiency and enhance standardization and readability of codes.
In the development process of the Java system, the following characteristics exist:
characteristic 1: specific keywords such as get, delete, update, add, list, request, etc. may appear multiple times in a system. These specific keywords correspond to specific chinese translations in the system.
Characteristic 2: some system related business names may appear repeatedly, such as a fruit management system, the word frein may appear repeatedly, and the name with the word frein may appear multiple times, such as getFruit, deleteFruit, fruitList.
Characteristic 3: in natural language, a word often corresponds to a plurality of chinese word translations, such as english word name, corresponding chinese translations have names, etc., but in a specific system, a specific chinese paraphrasing is often fixed with certainty. For example, in a student archive management system, name often represents name; in contrast, in one merchandise management system, name represents a name. That is, in certain systems, an english word tends to be defined as corresponding to a chinese name.
Based on the above-listed characteristics, the present embodiment can determine the chinese text corresponding to the english text in the system according to the dictionary mode. Before the chinese text is generated, the dictionary file may be initialized based on the existing code and its corresponding chinese notes. This process only needs to be initialized once. Referring to fig. 3, fig. 3 is a flowchart of a dictionary generating method according to an embodiment of the present application, and specific steps for generating a dictionary shown in fig. 3 are as follows:
step A1: the IDE plug-in scans all Java files, analyzes the Java files into PsiTree (Psi object tree) through an IDE interface, and acquires English texts of class names, method names, parameter names and the like in all Java classes. Analyzing the existing Javadoc to obtain the Chinese text corresponding to the English text.
Step A2: and (3) performing word segmentation processing on the English text obtained in the step (1) according to the case, and setting the Chinese text of the English text after word segmentation to be blank.
In Java code, developers generally name classes, methods, parameters, and the like by using a hump naming method. Hump nomenclature includes the small hump (Camel) nomenclature and the large hump (Upper Camel) nomenclature. The small hump nomenclature starts with a lower case letter, followed by the naming method of the capitalization of the first letter of each word, is commonly used for method names, parameter names, and variable names. For example: getFirstName, lastName. Compared with the small hump naming method, the first letter of the first word in the large hump naming method is capitalized and is commonly used for class names and interface names. For example: personInfo, shcoolLocation. Therefore, the present embodiment can perform word segmentation processing according to cases. For example, the getfriittype may be split into "get", "friit" and "type".
Step A3: the items obtained in step 1 and step 2 are combined, and if a plurality of Chinese texts exist for the same English text, the combination can be performed. The specific merging rule is as follows:
rule i: discarding the term of Chinese text being empty;
rule ii: and counting the items with the Chinese text not being empty according to the occurrence times, and discarding the items with the small occurrence times. For example, the fruit corresponding to the fruit appears 5 times, the fruit appears 2 times, the fruit is discarded, and finally the Chinese text corresponding to the fruit is the fruit.
Step A4: and arranging the combined items according to the descending order of the occurrence times, namely arranging the item with the largest occurrence times at the forefront.
The dictionary can be supplemented and annotated by a user through descending order, so that the user can perform manual translation on English which has high occurrence frequency and has no Chinese annotation preferentially.
Step A5: forming a final dictionary entry in the form of key=value; key is an English text, and Value is a Chinese text.
Step A6: all dictionary entries are written to a dictionary file or database.
Step A7: and carrying out manual supplement and perfection on dictionary entries with partial Chinese texts according to actual conditions. Because the most frequently occurring entries are top-ranked, a developer can translate only a portion of the entries as needed.
Referring to fig. 4, fig. 4 is a flowchart of a java doc generating method provided in an embodiment of the present application, where the embodiment describes a process of generating java doc using dictionary after dictionary initialization, and specifically includes the following steps:
step B1: at IDE startup, the plug-in automatically loads all dictionary entries to parse each entry into Key and Value. Key is an English text, and Value is a corresponding Chinese text.
Step B2: in IDE, user triggers Javadoc generating function by calling plug-in.
Step B3: and analyzing the corresponding Java file into PsiTree through the IDE interface.
Step B4: and obtaining the class name of the current class from a PsiClass (Psi class) object in the PsiTree, and calling a translation function to generate a Chinese text.
Step B5: all PsiMethod objects in the PsiClass are traversed, and corresponding Javadoc is generated for each method.
Specifically, the process of obtaining and translating english text of each part of the method signature is as follows: acquiring the name of a method, such as getfriit, through PsiMethod, and calling a translation function to generate a Chinese text; acquiring all parameters of the PsiMethod object, traversing the parameters, and calling a translation function to generate a Chinese text; acquiring a return value type of the PsiMethod object, such as a frit, and calling a translation function to generate a Chinese text; the translation of the spliced parts generates Javadoc.
Step B6: and writing Java files to complete the Java generation flow.
In the above process, the generation of the chinese text corresponding to the english text is the most critical and complex, and may specifically be performed according to the following steps:
step C1: and obtaining Chinese from the dictionary according to the full name, if the method name is getFruit, directly searching the getFruit in the dictionary, and if the Chinese text corresponding to English exists in the dictionary, directly returning the corresponding Chinese text, such as 'obtaining fruit'.
Step C2: if the corresponding Chinese text is not found in the dictionary, the English text is segmented into a plurality of words according to the case, such as getFrutType, and can be segmented into get, fruit, type three words; chinese text of each word is obtained from the dictionary, such as get, fruit, type corresponding to "obtain", "fruit", "type", respectively; if not, returning the word itself; the obtained translations are finally combined into final translations, such as "fruit type obtained".
In the above embodiment, the dictionary is generated according to the existing scheme, and the translation does not need to be manually initialized; the embodiment automatically generates the Javadoc according to English names of classes, methods, members and the like based on dictionary functions, helps developers save development time, improves working efficiency, and improves readability and maintainability of codes.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a generating device for a code annotation document according to an embodiment of the present application;
the apparatus may include:
the file determining module 100 is configured to receive an annotation generation request, and determine an object code file corresponding to the annotation generation request;
the text determining module 200 is configured to parse the object code file into a Psi tree, and determine a text to be annotated according to the Psi tree;
and the annotation generation module 300 is used for generating annotation content corresponding to the text to be annotated by utilizing an annotation dictionary and generating an annotation document of the target code file according to the annotation content.
The present embodiment, upon receiving an annotation generation request, determines an object code file that needs to generate a code annotation document. The method comprises the steps that a target code file is analyzed into a Psi tree, a text to be annotated in the target code file can be determined, annotation content corresponding to the text to be annotated can be generated through an annotation dictionary, and then annotation documents of the target code file are generated by combining all the annotation content. The process does not need to be manually participated, the content needing to be annotated in the target code file can be automatically queried and annotated, the code annotation document can be automatically generated, and the generation efficiency of the code annotation document is improved.
Further, the text determination module 200 includes:
a parsing unit for parsing the object code file into Psi tree,
the text to be annotated determining unit is used for acquiring class names from PsiClass objects in the PsiClass tree and determining PsiMethod objects in the PsiClass objects; and setting the method signature of the class name and the PsiMethod object as the text to be annotated.
Further, the text to be annotated determining unit includes:
a class name annotating subunit, configured to generate annotation content corresponding to the class name by using an annotation dictionary;
a method annotation subunit, configured to generate annotation content corresponding to the method signature by using the annotation dictionary;
and the content splicing subunit is used for splicing the annotation content corresponding to the class name and the annotation content corresponding to the method and generating the annotation document of the target code file.
Further, a method annotation subunit, configured to obtain a method name of the PsiMethod object, and generate annotation content corresponding to the method name by using the annotation dictionary; the method is also used for acquiring parameters of the PsiMethod object and generating annotation content corresponding to the parameters by utilizing the annotation dictionary; and the annotation dictionary is also used for acquiring the return value type of the PsiMethod object and generating annotation content corresponding to the return value type by utilizing the annotation dictionary.
Further, the method further comprises the following steps:
the text annotation module is used for acquiring a sample code file and determining code text in the sample code file; wherein the code text comprises any one or a combination of a plurality of class names, method names and parameter names; the method is also used for generating annotation text corresponding to the code text by utilizing the sample annotation document; performing word segmentation processing on the code text to obtain a word segmentation text, and setting an annotation text of the word segmentation text to be blank;
the annotation merging module is used for merging all the code texts and the annotation texts of the word segmentation texts to obtain text corresponding relations; the text corresponding relation comprises a corresponding relation between a code writing language and an annotation language;
and the dictionary generating module is used for generating the annotation dictionary according to the text corresponding relation.
Further, the annotation merging module is used for removing invalid texts in the annotation texts to obtain alternative annotation texts; the invalid text comprises an annotation text with empty content and an annotation text with occurrence times smaller than a preset value; the method is also used for merging annotation items of the code text and the alternative annotation text of the word segmentation text according to text content; the annotation texts in the annotation items are arranged in descending order according to the occurrence frequency; and the method is also used for generating a text corresponding relation according to the text content of the code text and the word segmentation text and the corresponding annotation item.
Further, the annotation generation module 300 is configured to determine a target word to be annotated currently from the text to be annotated; the dictionary entry corresponding to the target word is also used for judging whether the annotation dictionary exists or not; if yes, determining the annotation content according to dictionary entries corresponding to the target words; if not, performing word segmentation processing on the target word to obtain a target word, generating annotation contents corresponding to all the target word by using the annotation dictionary, and splicing the annotation contents corresponding to all the word to be annotated to obtain the annotation contents corresponding to the target word.
Since the embodiments of the apparatus portion and the embodiments of the method portion correspond to each other, the embodiments of the apparatus portion are referred to the description of the embodiments of the method portion, and are not repeated herein.
The present application also provides a storage medium having stored thereon a computer program which, when executed, performs the steps provided by the above embodiments. The storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The application also provides an electronic device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided in the foregoing embodiments when calling the computer program in the memory. Of course the electronic device may also include various network interfaces, power supplies, etc.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it would be obvious to those skilled in the art that various improvements and modifications can be made to the present application without departing from the principles of the present application, and such improvements and modifications fall within the scope of the claims of the present application.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (8)

1. A method of generating a code annotation document, comprising:
receiving an annotation generation request and determining an object code file corresponding to the annotation generation request;
analyzing the target code file into a Psi tree, and determining a text to be annotated according to the Psi tree;
generating annotation content corresponding to the text to be annotated by using an annotation dictionary, and generating an annotation document of the target code file according to the annotation content;
the annotation dictionary generation process comprises the following steps:
acquiring a sample code file, and determining a code text in the sample code file; wherein the code text comprises any one or a combination of a plurality of class names, method names and parameter names;
generating annotation text corresponding to the code text by using the sample annotation document;
performing word segmentation processing on the code text to obtain a word segmentation text, and setting an annotation text of the word segmentation text to be blank;
combining all the code texts and the annotation texts of the word segmentation texts to obtain text corresponding relations; the text corresponding relation comprises a corresponding relation between a code writing language and an annotation language;
generating the annotation dictionary according to the text corresponding relation;
the annotation dictionary comprises a plurality of dictionary entries, wherein each dictionary entry comprises a code text of a code writing language and an annotation text of a corresponding annotation language;
the generating the annotation content corresponding to the text to be annotated by using the annotation dictionary comprises the following steps:
determining a target word to be annotated currently from the text to be annotated;
judging whether dictionary entries corresponding to the target words exist in the annotation dictionary or not;
if yes, determining the annotation content according to dictionary entries corresponding to the target words;
if not, performing word segmentation processing on the target word to obtain a target word, generating annotation contents corresponding to all the target word by using the annotation dictionary, and splicing the annotation contents corresponding to all the word to be annotated to obtain the annotation contents corresponding to the target word.
2. The method of generating a code annotation document as claimed in claim 1, wherein determining text to be annotated from the Psi tree comprises:
obtaining class names from PsiClass objects in the PsiClass tree, and determining PsiMethod objects in the PsiClass objects;
and setting the class name and the method signature of the PsiMethod object as the text to be annotated.
3. The method for generating a code annotation document according to claim 2, wherein generating annotation content corresponding to the text to be annotated using an annotation dictionary, and generating an annotation document of the object code file from the annotation content, comprises:
generating annotation content corresponding to the class name by using an annotation dictionary;
generating annotation content corresponding to the method signature by using the annotation dictionary;
and splicing the annotation content corresponding to the class name and the annotation content corresponding to the method to generate the annotation document of the target code file.
4. A method of generating a code annotation document as claimed in claim 3, wherein generating annotation content corresponding to the method signature using the annotation dictionary comprises:
acquiring a method name of the PsiMethod object, and generating annotation content corresponding to the method name by utilizing the annotation dictionary;
acquiring parameters of the PsiMethod object, and generating annotation content corresponding to the parameters by utilizing the annotation dictionary;
and acquiring the return value type of the PsiMethod object, and generating annotation content corresponding to the return value type by utilizing the annotation dictionary.
5. The method for generating a code annotation document as claimed in claim 1, wherein merging all the code texts and the word segmentation texts to obtain text correspondence comprises:
removing invalid texts in the annotation text to obtain alternative annotation text; the invalid text comprises an annotation text with empty content and an annotation text with occurrence times smaller than a preset value;
merging annotation items of the code text and the alternative annotation text of the word segmentation text according to text content; the annotation texts in the annotation items are arranged in descending order according to the occurrence frequency;
and generating a text corresponding relation according to the text content of the code text and the word segmentation text and the corresponding annotation item.
6. A code annotation document generation apparatus, comprising:
the file determining module is used for receiving the annotation generation request and determining an object code file corresponding to the annotation generation request;
the text determining module is used for analyzing the target code file into a Psi tree and determining a text to be annotated according to the Psi tree;
the annotation generation module is used for generating annotation content corresponding to the text to be annotated by utilizing an annotation dictionary and generating an annotation document of the target code file according to the annotation content;
further comprises:
the text annotation module is used for acquiring a sample code file and determining code text in the sample code file; wherein the code text comprises any one or a combination of a plurality of class names, method names and parameter names; the method is also used for generating annotation text corresponding to the code text by using the sample annotation document; performing word segmentation processing on the code text to obtain a word segmentation text, and setting an annotation text of the word segmentation text to be blank;
the annotation merging module is used for merging all the code texts and the annotation texts of the word segmentation texts to obtain text corresponding relations; the text corresponding relation comprises a corresponding relation between a code writing language and an annotation language;
the dictionary generating module is used for generating the annotation dictionary according to the text corresponding relation;
the annotation dictionary comprises a plurality of dictionary entries, wherein each dictionary entry comprises a code text of a code writing language and an annotation text of a corresponding annotation language;
the annotation generation module is specifically used for determining a target word to be annotated currently from the text to be annotated; the dictionary entry corresponding to the target word is also used for judging whether the annotation dictionary exists or not; if yes, determining the annotation content according to dictionary entries corresponding to the target words; if not, performing word segmentation processing on the target word to obtain a target word, generating annotation contents corresponding to all the target word by using the annotation dictionary, and splicing the annotation contents corresponding to all the word to be annotated to obtain the annotation contents corresponding to the target word.
7. An electronic device comprising a memory and a processor, the memory having stored therein a computer program, the processor, when invoking the computer program in the memory, performing the steps of the method for generating a code annotation document according to any of claims 1 to 5.
8. A storage medium having stored therein computer executable instructions which, when loaded and executed by a processor, implement the steps of the method of generating a code annotation document according to any of claims 1 to 5.
CN202110057846.9A 2021-01-15 2021-01-15 Method and device for generating code annotation document, electronic equipment and storage medium Active CN112836477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110057846.9A CN112836477B (en) 2021-01-15 2021-01-15 Method and device for generating code annotation document, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110057846.9A CN112836477B (en) 2021-01-15 2021-01-15 Method and device for generating code annotation document, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112836477A CN112836477A (en) 2021-05-25
CN112836477B true CN112836477B (en) 2024-02-09

Family

ID=75928509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110057846.9A Active CN112836477B (en) 2021-01-15 2021-01-15 Method and device for generating code annotation document, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112836477B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033748A (en) * 2010-12-03 2011-04-27 中国科学院软件研究所 Method for generating data processing flow codes
CN102129365A (en) * 2010-01-20 2011-07-20 阿里巴巴集团控股有限公司 Method and device for generating code documentations
CN106021410A (en) * 2016-05-12 2016-10-12 中国科学院软件研究所 Source code annotation quality evaluation method based on machine learning
CN106681708A (en) * 2016-11-16 2017-05-17 中国科学院软件研究所 Automatic source code annotation generation method based on data mining
CN108319467A (en) * 2018-01-03 2018-07-24 武汉斗鱼网络科技有限公司 A kind of annotation fill method
CN110069252A (en) * 2019-04-11 2019-07-30 浙江网新恒天软件有限公司 A kind of source code file multi-service label mechanized classification method
CN110825430A (en) * 2019-11-08 2020-02-21 政采云有限公司 API document generation method, device, equipment and storage medium
CN111046283A (en) * 2019-12-04 2020-04-21 深圳前海微众银行股份有限公司 Feature selection method, device, equipment and storage medium
CN111767096A (en) * 2020-06-29 2020-10-13 深圳前海微众银行股份有限公司 Interface document generation method, device, equipment and computer readable storage medium
CN112162775A (en) * 2020-10-21 2021-01-01 南通大学 Java code annotation automatic generation method based on Transformer and mixed code expression

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129365A (en) * 2010-01-20 2011-07-20 阿里巴巴集团控股有限公司 Method and device for generating code documentations
CN102033748A (en) * 2010-12-03 2011-04-27 中国科学院软件研究所 Method for generating data processing flow codes
CN106021410A (en) * 2016-05-12 2016-10-12 中国科学院软件研究所 Source code annotation quality evaluation method based on machine learning
CN106681708A (en) * 2016-11-16 2017-05-17 中国科学院软件研究所 Automatic source code annotation generation method based on data mining
CN108319467A (en) * 2018-01-03 2018-07-24 武汉斗鱼网络科技有限公司 A kind of annotation fill method
CN110069252A (en) * 2019-04-11 2019-07-30 浙江网新恒天软件有限公司 A kind of source code file multi-service label mechanized classification method
CN110825430A (en) * 2019-11-08 2020-02-21 政采云有限公司 API document generation method, device, equipment and storage medium
CN111046283A (en) * 2019-12-04 2020-04-21 深圳前海微众银行股份有限公司 Feature selection method, device, equipment and storage medium
CN111767096A (en) * 2020-06-29 2020-10-13 深圳前海微众银行股份有限公司 Interface document generation method, device, equipment and computer readable storage medium
CN112162775A (en) * 2020-10-21 2021-01-01 南通大学 Java code annotation automatic generation method based on Transformer and mixed code expression

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Deep code comment generation with hybrid lexical and syntactical information;Xing Hu 等;《Empirical Software Engineering》;2179-2217 *
Towards automatically generating summary comments for Java Methods;Giriprasad Sridhara 等;《ASE 10》;43-52 *
基于组合分类算法的源代码注释质量评估方法;余海 等;《计算机应用》;3448-3453 *

Also Published As

Publication number Publication date
CN112836477A (en) 2021-05-25

Similar Documents

Publication Publication Date Title
US20230142217A1 (en) Model Training Method, Electronic Device, And Storage Medium
US9817888B2 (en) Supplementing structured information about entities with information from unstructured data sources
US8484238B2 (en) Automatically generating regular expressions for relaxed matching of text patterns
US20190236102A1 (en) System and method for differential document analysis and storage
CN114616572A (en) Cross-document intelligent writing and processing assistant
Feinerer et al. Package ‘tm’
US8074171B2 (en) System and method to provide warnings associated with natural language searches to determine intended actions and accidental omissions
US20070179932A1 (en) Method for finding data, research engine and microprocessor therefor
EP2162833A1 (en) A method, system and computer program for intelligent text annotation
CN110609998A (en) Data extraction method of electronic document information, electronic equipment and storage medium
US9372846B1 (en) Method for abstract syntax tree building for large-scale data analysis
CN112527291A (en) Webpage generation method and device, electronic equipment and storage medium
CN116521621A (en) Data processing method and device, electronic equipment and storage medium
Wong et al. iSentenizer‐μ: Multilingual Sentence Boundary Detection Model
CN112925879A (en) Information processing apparatus, storage medium, and information processing method
CN110008807A (en) A kind of training method, device and the equipment of treaty content identification model
CN112836477B (en) Method and device for generating code annotation document, electronic equipment and storage medium
CN114743012B (en) Text recognition method and device
CN115146070A (en) Key value generation method, knowledge graph generation method, device, equipment and medium
Wyatt Work in progress: Demystifying PDF through a machine-readable definition
CN114840657A (en) API knowledge graph self-adaptive construction and intelligent question-answering method based on mixed mode
US11017172B2 (en) Proposition identification in natural language and usage thereof for search and retrieval
Lim et al. Efficient temporal information extraction from korean documents
US11720531B2 (en) Automatic creation of database objects
Cybulski et al. The use of templates and restricted english in structuring and analysis of informal requirements specifications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant