WO2019028990A1 - 代码元素的命名方法、装置、电子设备及介质 - Google Patents
代码元素的命名方法、装置、电子设备及介质 Download PDFInfo
- Publication number
- WO2019028990A1 WO2019028990A1 PCT/CN2017/104537 CN2017104537W WO2019028990A1 WO 2019028990 A1 WO2019028990 A1 WO 2019028990A1 CN 2017104537 W CN2017104537 W CN 2017104537W WO 2019028990 A1 WO2019028990 A1 WO 2019028990A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- code
- naming
- user
- usage information
- code element
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
Definitions
- the present application belongs to the field of software development technologies, and in particular, to a method, a device, an electronic device and a medium for naming code elements.
- the embodiment of the present application provides a method, a device, an electronic device, and a medium for naming code elements, so as to solve the problem that the naming efficiency of code elements in the prior art is relatively low.
- a first aspect of an embodiment of the present application provides a method for naming a code element, including:
- the determined naming of the first code element is recommended to the user to cause the user to select a naming associated with the second code element from each of the recommended naming.
- a second aspect of the embodiments of the present application provides a naming device for a code element, including:
- An extraction module configured to extract a naming and an annotation of each first code element from a preset code library, where the preset code library includes a plurality of pieces of code, wherein the first code element includes a variable, a constant, a function, and a class And documents;
- a first obtaining module configured to acquire element usage information input by a user, where the element usage information is used to describe a function of a second code element that the user needs to create;
- a calculation module configured to separately calculate a similarity between each of the annotations and the element usage information, obtain the annotation whose similarity with the element usage information is greater than a preset threshold, and determine a location corresponding to the annotation The first code element;
- a recommendation module configured to recommend the determined naming of the first code element to the user, so that the user selects a naming associated with the second code element from each recommended naming.
- a third aspect of embodiments of the present application provides an electronic device including a memory, a processor, and computer readable instructions stored on the memory and executable on the processor, the processor executing the The computer sequence implements the steps of the naming method of the code elements as provided by the first aspect above.
- a fourth aspect of the embodiments of the present application provides a computer readable storage medium storing computer readable instructions, the computer readable instructions being implemented by at least one processor, as described above The steps of the naming method of the code elements provided on the one hand.
- the user can enable the user to Based on the similarly-coded code elements, the recommended naming with higher reference value is obtained, ensuring that the user can quickly determine a naming that best suits his or her own needs from the recommended naming, thereby creating a new code element based on the naming.
- each recommended naming is the naming used by other developers to ensure that the program is highly readable, so users do not have to spend too much time
- the embodiment of the present application improves the naming efficiency of code elements.
- FIG. 1 is a flowchart of an implementation of a method for naming code elements provided by an embodiment of the present application
- FIG. 2 is a specific implementation flowchart of a naming method S103 of a code element provided by an embodiment of the present application;
- FIG. 3 is a specific implementation flowchart of a naming method S1031 of a code element provided by an embodiment of the present application;
- FIG. 5 is a flowchart of an implementation of a method for naming code elements according to another embodiment of the present disclosure
- FIG. 6 is a structural block diagram of a naming device for a code element according to an embodiment of the present application.
- FIG. 7 is a structural block diagram of a naming device for a code element according to another embodiment of the present application.
- FIG. 8 is a structural block diagram of a device for naming code elements according to another embodiment of the present application.
- FIG. 9 is a schematic diagram of an electronic device according to an embodiment of the present application.
- FIG. 1 shows an implementation flow of a naming method of a code element provided by an embodiment of the present application, where the method flow includes steps S101 to S104.
- the specific implementation principles of each step are as follows:
- S101 Extract a naming and a comment of each first code element from a preset code library, where the preset code library includes a plurality of pieces of code, wherein the first code element includes a variable, a constant, a function, a class, and a file.
- the code represents the computer language instructions written by the developer in a language supported by the development tool.
- the collection of multi-segment code is the code base.
- the code base contains code for implementing different functions.
- the code collection process may be, for example, storing the code uploaded by the program developer to the code base; downloading the code pre-stored in the plurality of third-party code sharing platforms, and storing the code in the code base.
- the above third-party code sharing platform may be, for example, bitbucket and gitHub.
- each piece of code in the code base there are various types of code elements, including but not limited to variables, constants, functions, classes, and files.
- code elements including but not limited to variables, constants, functions, classes, and files.
- the program developer in order to improve the readability of the program, the program developer will Each piece of code is annotated, and comments are often used to describe what the code can do or create intent.
- the comments for each code element are also stored in the code base.
- the static analysis tool can perform static analysis on each piece of code in the code library to obtain the naming and annotation of each code element in the code library. For example, parsing techniques, such as clang or llvm, convert each piece of code into a syntax tree, automatically sorting out the types, naming, and comments at the end of the line of code that exist in the code base.
- parsing techniques such as clang or llvm
- S102 Acquire element usage information input by the user, where the element usage information is used to describe a function of a second code element that the user needs to create.
- the user wants to create a new code element in the application they need to develop, the user is asked to enter a usage description associated with the code element. For example, if a program developer wants to name a variable and the variable is mainly used to implement the function of cumulative counting, the obtained element usage information of the user input may be a "counter".
- the element usage information input by the user includes, but is not limited to, different kinds of language characters such as Chinese or English.
- S103 Calculate the similarity between each of the annotations and the element usage information, obtain the annotation whose similarity with the element usage information is greater than a preset threshold, and determine the first corresponding to the annotation. Code element.
- the element usage information input by the user is compared with the annotation of each code element in the code base to determine the similarity between the function of the existing code element in the code base and the function of the code element required by the user. .
- the annotation of each code element in the code library is sequentially acquired, and the similarity between the annotation and the element usage information input by the user is calculated, that is, the text similarity between the annotation and the element usage information is calculated.
- the method of calculating text similarity includes but is not limited to cosine distance, Euclidean distance, Jaked distance, and probability distribution distance (K-L distance).
- each annotation whose similarity is greater than a preset threshold is filtered out. If the similarity between the annotation of any code element and the element usage information input by the user is greater than a preset threshold, it indicates that the code element is closer to the function of the code element that the user needs to establish. Therefore, in the code base, the code element corresponding to the filtered comment is obtained.
- the foregoing S103 specifically includes:
- S1031 Generate, according to the annotation of the first code element and the element usage information input by the user, a first vector corresponding to the first code element and a second vector corresponding to the second code element.
- S1032 Calculate a cosine similarity between the first vector and the second vector.
- the annotation and the element usage information input by the user are respectively converted and processed according to Word2Vec, Doc2Vec, and the vector space model VSM, to obtain the first vector corresponding to the annotation and The second vector corresponding to the element usage information.
- Word2Vec Word2Vec
- Doc2Vec the vector space model VSM
- the text similarity is specifically a cosine similarity. Since the annotation of the code element in the code base has been converted into the first vector, and the element usage information input by the user has been converted into the second vector, the similarity between the annotation of the calculation code element and the element usage information input by the user is calculated as the first vector and The cosine similarity of the second vector.
- x i represents the ith element value of the first vector
- y i represents the ith element value of the second vector
- n represents the total number of elements of the first vector or the second vector.
- S1033 Determine the first vector that has a cosine similarity with the second vector that is greater than a preset threshold, and obtain the determined first code element corresponding to the first vector.
- each first vector in which the cosine similarity is greater than a preset threshold is selected.
- the code elements corresponding to the respective first vectors obtained by the screening are determined.
- the annotation and the element usage information expressed in text form are realized.
- the dimensionality reduction processing because the cosine similarity can better reflect the similarity between the texts, and the calculation process is relatively simple, therefore, the calculation efficiency of the similarity between the annotation and the element usage information is improved.
- S10311 Perform word segmentation processing on the element use information and the annotation to obtain a plurality of first word segments corresponding to the annotations and a plurality of second word segments corresponding to the element usage information.
- the stop word in the annotation is removed, and each character remaining in the annotation is subjected to word segmentation processing to obtain a plurality of word segments corresponding to the annotation.
- the stop words in the use information of the element are removed, and each of the remaining characters in the element use information is subjected to word segmentation processing to obtain a plurality of word segments corresponding to the element use information.
- S10313 Synchronize the word frequency-reverse file frequency TF-IDF information of each participle in the commentary in the word bag model to generate a first vector corresponding to the comment according to the TF-IDF information.
- S10314 separately calculate word frequency-reverse file frequency TF-IDF information of each participle in the element usage information in the word bag model, to generate a second vector corresponding to the element usage information according to the TF-IDF information.
- the preset weight value corresponding to the word part is obtained, the frequency of the part word in the comment corresponding to the word bag model is determined, and the frequency of occurrence of the part word in the element use information is determined. .
- the above frequency is the word frequency (TF)
- the preset weight value is the inverse file frequency (IDF, Inverse Document Frequency).
- the product of the word frequency corresponding to the participle in the comment and the frequency of the reverse file is used as the TF-IDF information of the participle in the comment, and the TF-IDF information is output as one element in the first vector.
- the product of the word frequency corresponding to the word segmentation in the element use information and the reverse file frequency is used as the TF-IDF information of the word segment in the element use information, and the TF-IDF information is output as one element in the second vector. Therefore, when the word bag model includes N word segments, N elements will be included in the first vector and the second vector corresponding to the word bag model.
- the implementation process of the foregoing S10311 to S10314 is as follows: for a comment corresponding to a certain code element in the code library, if the comment is “function running time: calculating the average value of the duration”, After stopping the word and performing word segmentation processing, the obtained first participle is “function/run/time/calculation/time/average”; if the element usage information input by the user is “run time”, it is removed After the word is stopped and the word segmentation process is performed, the obtained second participle is “run/cost/time”; the analysis of the above S10312 can be obtained, and the current time is included in the bag model corresponding to the comment and the element usage information, There are seven different word segments of "function/run/time/calculation/average/cost/time”, if the TF-IDF information of each participle in the comment is "1, 1, 2, 1, respectively” 1, 0, 0”, the TF-IDF information in the element usage information is "0, 1, 0, 0, 1, 1, 0", respectively
- the higher the frequency of the word segmentation in the text and the higher the frequency of the reverse file the greater the importance of the word segment in the text, and the more the main content of the annotation or element usage information can be reflected.
- the model is used to obtain the vector corresponding to each annotation and the element usage information, and each element in the vector is represented by the TF-IDF information of each participle in the word bag model, so that the cosine similarity calculated based on the vector can be
- the main content of the text is closely related, thus accurately reflecting the similarity of the annotations and the information on the use of the elements.
- S104 recommend the determined naming of the first code element to the user, so that the user pushes each In the recommendation naming, the naming associated with the second code element is selected.
- the similarity between the annotation of the first code element determined from the code base and the element usage information of the second code element created by the user is higher, the functions, attributes, and usage of the first code element and the second code element The probability that the methods are the same is relatively high, so the naming of the first code element is also more suitable for the second code element. Therefore, the naming of the determined first code element is recommended to the user.
- the number of the determined first code elements is also plural.
- the naming of each of the first code elements is displayed to recommend each naming to the user, so that the user can select the naming of the second code element that he/she needs to create from the multiple namings displayed at the current time. Or, after selecting one of the recommended namings, add extra characters (such as serial numbers) to the naming to determine the final naming to be the naming of the second code element.
- the user can enable the user to Based on the similarly-coded code elements, the recommended naming with higher reference value is obtained, ensuring that the user can quickly determine a naming that best suits his or her own needs from the recommended naming, thereby creating a new code element based on the naming.
- each recommended naming is the naming used by other developers to ensure that the program is highly readable, so users do not have to spend too much time
- the embodiment of the present application improves the naming efficiency of code elements.
- the word segmentation result corresponding to the element usage information input by the user if the word segmentation includes a Chinese word segmentation or other non-English word segmentation, the Chinese word segmentation or other non-English word segmentation is converted into an English word one by one.
- S107 Acquire a naming algorithm that matches the coding language according to a programming language used by the user at the current time.
- the programming language required by the user at the current moment is obtained. For example, detecting other code in the application currently being developed by the user, after analyzing the syntax structure of each code, determining the programming language used by the user at the current time; acquiring the instruction according to the programming language parameter input by the user Users are developing applications
- the programming language required for the sequence include but are not limited to C language, C++, Python, LinuxC, and Java.
- S108 Process each of the English words according to the naming algorithm, and recommend the obtained character string to the user.
- each English word is processed to obtain a character string including each of the above English words.
- the naming algorithm may be processed by connecting each English word with a preset connector to obtain a character string including each of the above English words.
- the connector includes but is not limited to an underline, an empty connector, a dash, and the like. After that, the output string is named as a recommendation and displayed.
- the element usage information input by the user is “acquire student achievement”, the result obtained after the word segmentation is “acquisition/student/score”, and the English words corresponding to each participle are get, student and score respectively.
- the current programming language of the user is Java, and the corresponding naming algorithm is the hump naming algorithm, the three characters of get, student, and score are processed, and the obtained string is getStudentScore; if the current time user's programming language For Linux C, and its corresponding naming algorithm is an underline combination algorithm, after the three words of get, student and score are connected, the obtained string is get_student_score.
- the annotations of the respective code elements may not have a high degree of similarity with the element usage information input by the user, and therefore,
- the English words obtained after the translation are processed, and the string containing the English words is named as a recommendation, so that the user can directly use the string to name the code elements that need to be created, and improve the naming of the code elements.
- Efficiency because English words have actual semantics, they can be more easily read by others. Therefore, by recommending a string with higher readability to the user and using the string as the naming of the code element, the user needs to be further developed.
- the code readability of the app since the number of the first code elements stored in the code library is limited, the annotations of the respective code elements may not have a high degree of similarity with the element usage information input by the user, and therefore,
- the English words obtained after the translation are processed, and the string containing the English words is named as a recommendation, so that the user can directly use the string to name the code elements that need to be created, and improve the naming of the
- FIG. 5 is a flowchart showing an implementation process of a naming method for a code element according to another embodiment of the present application. As shown in FIG. 5, after the foregoing S104, the method further includes:
- the recommended naming selected by the user is detected.
- the number of cumulative selections of the recommended name is increased by one, and the cumulative number of selections indicates the number of times the recommended naming is selected.
- Each naming recommended to the user matches the element usage information entered by the user. Bind the element usage information to each After the recommended naming, the element usage information, the recommended naming, and the binding relationship between the two are stored in a pre-established information base.
- each recommended naming bound to the element usage information input by the user at the current moment is directly read from the information base, and the read recommendation naming is displayed to the user.
- each recommendation naming in the information library Since the cumulative number of selections of each recommendation naming in the information library is different, when the recommended naming is displayed to the user, the naming with a higher cumulative number of times is preferentially displayed. That is, each of the read recommendation names is sequentially displayed in the order of the cumulative number of selections.
- FIG. 6 is a structural block diagram of the naming device of the code element provided by the embodiment of the present application. For the convenience of description, only the embodiment related to the embodiment of the present application is shown. section.
- the apparatus includes:
- the extracting module 601 is configured to extract a naming and an annotation of each first code element from a preset code library, where the preset code library includes a plurality of pieces of code, wherein the first code element includes a variable, a constant, a function, Classes and files.
- the first obtaining module 602 is configured to acquire element usage information input by the user, where the element usage information is used to describe a function of the second code element that the user needs to create.
- a calculation module 603 configured to separately calculate a similarity between each of the annotations and the element usage information, obtain the annotation whose similarity with the element usage information is greater than a preset threshold, and determine a corresponding to the annotation The first code element Prime.
- the recommendation module 604 is configured to recommend the determined naming of the first code element to the user, so that the user selects a naming associated with the second code element from each recommended naming.
- the calculating module 603 includes:
- Generating a submodule configured to generate, according to the annotation of the first code element and the element usage information of the user input, a first vector corresponding to the first code element and a second corresponding to the second code element vector.
- a calculation submodule configured to calculate a cosine similarity between the first vector and the second vector.
- Obtaining a submodule configured to determine the first vector that has a cosine similarity with the second vector that is greater than a preset threshold, and obtain the first code element corresponding to the determined first vector.
- the generating submodule is specifically configured to:
- the word frequency-reverse file frequency TF-IDF information of each participle in the word bag model in the element use information is separately calculated to generate a second vector corresponding to the element use information according to the TF-IDF information.
- the naming device of the code element further includes:
- a word segmentation module 605 configured to perform word segmentation processing on the element use information if the annotation with the similarity of the element usage information is greater than a preset threshold in the code library, to obtain a plurality of the words Second participle.
- the conversion module 606 is configured to convert each of the second word segments into English words, respectively.
- the second obtaining module 607 is configured to obtain a naming algorithm that matches the encoding language according to a programming language used by the user at the current time.
- the processing module 608 is configured to process each of the English words based on the naming algorithm, and recommend the processed character string to the user.
- the naming device of the code element further includes:
- the storage module 609 is configured to bind the element usage information to each recommended naming.
- the statistic module 610 is configured to increase the number of cumulative selections of the recommended naming selected by the user at the current time by one, so that when the element usage information input by the user is received again, according to the order of the corresponding cumulative selection times, Each recommended naming of the element usage information binding is recommended to the user in turn.
- the user can enable the user to Based on the similarly-coded code elements, the recommended naming with higher reference value is obtained, ensuring that the user can quickly determine a naming that best suits his or her own needs from the recommended naming, thereby creating a new code element based on the naming.
- each recommended naming is the naming used by other developers to ensure that the program is highly readable, so users do not have to spend too much time
- the embodiment of the present application improves the naming efficiency of code elements.
- FIG. 9 is a schematic diagram of an electronic device according to an embodiment of the present application.
- the electronic device 9 of this embodiment includes a processor 90, a memory 91, and computer readable instructions 92 stored in the memory 91 and executable on the processor 90, such as code elements. Name the program.
- the processor 90 executes the computer readable instructions 92 to implement the steps in the naming method embodiments of the various code elements described above, such as steps 101 through 104 shown in FIG.
- the processor 90 when executing the computer readable instructions 92, implements the functions of the various modules/units in the various apparatus embodiments described above, such as the functions of the modules 601 through 604 shown in FIG.
- the computer readable instructions 92 may be partitioned into one or more modules/units that are stored in the memory 91 and executed by the processor 90, To complete this application.
- the one or more modules/units may be a series of computer readable instruction instruction segments capable of performing a particular function, the instruction segments being used to describe the execution of the computer readable instructions 92 in the electronic device 9.
- the electronic device 9 can be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
- the electronic device may include, but is not limited to, a processor 90, a memory 91. It will be understood by those skilled in the art that FIG. 9 is merely an example of the electronic device 9, and does not constitute a limitation on the electronic device 9, and may include more or less components than those illustrated, or combine some components, or different components.
- the electronic device may further include an input and output device, a network access device, a bus, and the like.
- the processor 90 may be a central processing unit (CPU), or may be other general-purpose processors, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc.
- the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
- the memory 91 may be an internal storage unit of the electronic device 9, such as a hard disk or memory of the electronic device 9.
- the memory 91 may also be an external storage device of the electronic device 9, such as a plug-in hard disk equipped on the electronic device 9, a smart memory card (SMC), and a secure digital (SD). Card, flash Flash card, etc. Further, the memory 91 may also include both an internal storage unit of the electronic device 9 and an external storage device.
- the memory 91 is configured to store the computer readable instructions and other programs and data required by the electronic device.
- the memory 91 can also be used to temporarily store data that has been output or is about to be output.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
- a computer readable storage medium A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (20)
- 一种代码元素的命名方法,其特征在于,包括:从预设代码库中提取各个第一代码元素的命名及注释,所述预设的代码库中包含多段代码,其中,所述第一代码元素包括变量、常量、函数、类以及文件;获取用户输入的元素用途信息,所述元素用途信息用于描述用户所需创建的第二代码元素的功能;分别计算每一所述注释与所述元素用途信息的相似度,获取与所述元素用途信息的相似度大于预设阈值的所述注释,并确定出与该注释对应的所述第一代码元素;将确定出的所述第一代码元素的命名推荐至所述用户,以使所述用户从各个推荐命名中,选取出与所述第二代码元素相关的命名。
- 如权利要求1所述的代码元素的命名方法,其特征在于,所述分别计算每一所述注释与所述元素用途信息的相似度,获取与所述元素用途信息的相似度大于预设阈值的所述注释,并确定出与该注释对应的所述第一代码元素,包括:根据所述第一代码元素的所述注释以及所述用户输入的元素用途信息,生成所述第一代码元素对应的第一向量以及所述第二代码元素对应的第二向量;计算所述第一向量与所述第二向量的余弦相似度;确定出与所述第二向量的余弦相似度大于预设阈值的所述第一向量,并获取确定出的所述第一向量所对应的所述第一代码元素。
- 如权利要求2所述的代码元素的命名方法,其特征在于,所述根据所述第一代码元素的所述注释以及所述用户输入的元素用途信息,生成所述第一代码元素对应的第一向量以及所述第二代码元素对应的第二向量,包括:对所述元素用途信息以及所述注释进行分词处理,以得到分别与所述注释对应的多个第一分词以及与所述元素用途信息对应的多个第二分词;对所述多个第一分词以及所述多个第二分词进行合并去重处理后,输入预先建立的词袋模型;分别统计所述词袋模型中每一分词在所述注释中的词频-逆向文件频率TF-IDF信息,以根据该TF-IDF信息,生成所述注释对应的第一向量;分别统计所述词袋模型中每一分词在所述元素用途信息中的词频-逆向文件频率TF-IDF信息,以根据该TF-IDF信息,生成所述元素用途信息对应的第二向量。
- 如权利要求1所述的代码元素的命名方法,其特征在于,还包括:在所述代码库中,若不存在与所述元素用途信息的相似度大于预设阈值的所述注释,则对所述元素用途信息进行分词处理,得到多个所述第二分词;分别将各个所述第二分词转换为英文单词;根据当前时刻所述用户所使用的编程语言,获取与所述编码语言匹配的命名算法;基于所述命名算法,对各个所述英文单词进行处理,并将处理后所得到的字符串推荐至所述用户。
- 如权利要求1所述的代码元素的命名方法,其特征在于,还包括:将所述元素用途信息与各个推荐命名绑定存储;将当前时刻用户选取的推荐命名的累积选取次数加一,以在再次接收到用户输入的所述元素用途信息时,根据对应的所述累积选取次数的高低顺序,将与所述元素用途信息绑定的各个推荐命名依次推荐至所述用户。
- 一种代码元素的命名装置,其特征在于,包括:提取模块,用于从预设代码库中提取各个第一代码元素的命名及注释,所述预设的代码库中包含多段代码,其中,所述第一代码元素包括变量、常量、函数、类以及文件;第一获取模块,用于获取用户输入的元素用途信息,所述元素用途信息用于描述用户所需创建的第二代码元素的功能;计算模块,用于分别计算每一所述注释与所述元素用途信息的相似度,获取与所述元素用途信息的相似度大于预设阈值的所述注释,并确定出与该注释对应的所述第一代码元素;推荐模块,用于将确定出的所述第一代码元素的命名推荐至所述用户,以使所述用户从各个推荐命名中,选取出与所述第二代码元素相关的命名。
- 根据权利要求6所述的代码元素的命名装置,其特征在于,所述计算模块包括:生成子模块,用于根据所述第一代码元素的所述注释以及所述用户输入的元素用途信息,生成所述第一代码元素对应的第一向量以及所述第二代码元素对应的第二向量;计算子模块,用于计算所述第一向量与所述第二向量的余弦相似度;获取子模块,用于确定出与所述第二向量的余弦相似度大于预设阈值的所述第一向量,并获取确定出的所述第一向量所对应的所述第一代码元素。
- 根据权利要求7所述的代码元素的命名装置,其特征在于,所述生成子模块具体用于:对所述元素用途信息以及所述注释进行分词处理,以得到分别与所述注释对应的多个第一分词以及与所述元素用途信息对应的多个第二分词;对所述多个第一分词以及所述多个第二分词进行合并去重处理后,输入预先建立的词袋模型;分别统计所述词袋模型中每一分词在所述注释中的词频-逆向文件频率TF-IDF信息,以根据该TF-IDF信息,生成所述注释对应的第一向量;分别统计所述词袋模型中每一分词在所述元素用途信息中的词频-逆向文件频率TF-IDF信息,以根据该TF-IDF信息,生成所述元素用途信息对应的第二向量。
- 根据权利要求6所述的代码元素的命名装置,其特征在于,还包括:分词模块,用于在所述代码库中,若不存在与所述元素用途信息的相似度 大于预设阈值的所述注释,则对所述元素用途信息进行分词处理,得到多个所述第二分词;转换模块,用于分别将各个所述第二分词转换为英文单词。第二获取模块,用于根据当前时刻所述用户所使用的编程语言,获取与所述编码语言匹配的命名算法;处理模块,用于基于所述命名算法,对各个所述英文单词进行处理,并将处理后所得到的字符串推荐至所述用户。
- 根据权利要求6所述的代码元素的命名装置,其特征在于,还包括:存储模块,用于将所述元素用途信息与各个推荐命名绑定存储;统计模块,用于将当前时刻用户选取的推荐命名的累积选取次数加一,以在再次接收到用户输入的所述元素用途信息时,根据对应的所述累积选取次数的高低顺序,将与所述元素用途信息绑定的各个推荐命名依次推荐至所述用户。
- 一种电子设备,其特征在于,包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机序时实现如下步骤:从预设代码库中提取各个第一代码元素的命名及注释,所述预设的代码库中包含多段代码,其中,所述第一代码元素包括变量、常量、函数、类以及文件;获取用户输入的元素用途信息,所述元素用途信息用于描述用户所需创建的第二代码元素的功能;分别计算每一所述注释与所述元素用途信息的相似度,获取与所述元素用途信息的相似度大于预设阈值的所述注释,并确定出与该注释对应的所述第一代码元素;将确定出的所述第一代码元素的命名推荐至所述用户,以使所述用户从各个推荐命名中,选取出与所述第二代码元素相关的命名。
- 根据权利要求11所述的电子设备,其特征在于,所述分别计算每一所 述注释与所述元素用途信息的相似度,获取与所述元素用途信息的相似度大于预设阈值的所述注释,并确定出与该注释对应的所述第一代码元素,包括:根据所述第一代码元素的所述注释以及所述用户输入的元素用途信息,生成所述第一代码元素对应的第一向量以及所述第二代码元素对应的第二向量;计算所述第一向量与所述第二向量的余弦相似度;确定出与所述第二向量的余弦相似度大于预设阈值的所述第一向量,并获取确定出的所述第一向量所对应的所述第一代码元素。
- 根据权利要求12所述的电子设备,其特征在于,所述根据所述第一代码元素的所述注释以及所述用户输入的元素用途信息,生成所述第一代码元素对应的第一向量以及所述第二代码元素对应的第二向量,包括:对所述元素用途信息以及所述注释进行分词处理,以得到分别与所述注释对应的多个第一分词以及与所述元素用途信息对应的多个第二分词;对所述多个第一分词以及所述多个第二分词进行合并去重处理后,输入预先建立的词袋模型;分别统计所述词袋模型中每一分词在所述注释中的词频-逆向文件频率TF-IDF信息,以根据该TF-IDF信息,生成所述注释对应的第一向量;分别统计所述词袋模型中每一分词在所述元素用途信息中的词频-逆向文件频率TF-IDF信息,以根据该TF-IDF信息,生成所述元素用途信息对应的第二向量。
- 根据权利要求11所述的电子设备,其特征在于,所述处理器执行所述计算机序时,还实现如下步骤:在所述代码库中,若不存在与所述元素用途信息的相似度大于预设阈值的所述注释,则对所述元素用途信息进行分词处理,得到多个所述第二分词;分别将各个所述第二分词转换为英文单词;根据当前时刻所述用户所使用的编程语言,获取与所述编码语言匹配的命名算法;基于所述命名算法,对各个所述英文单词进行处理,并将处理后所得到的字符串推荐至所述用户。
- 根据权利要求11所述的电子设备,其特征在于,所述处理器执行所述计算机序时,还实现如下步骤:将所述元素用途信息与各个推荐命名绑定存储;将当前时刻用户选取的推荐命名的累积选取次数加一,以在再次接收到用户输入的所述元素用途信息时,根据对应的所述累积选取次数的高低顺序,将与所述元素用途信息绑定的各个推荐命名依次推荐至所述用户。
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被至少一个处理器执行时实现如下步骤:从预设代码库中提取各个第一代码元素的命名及注释,所述预设的代码库中包含多段代码,其中,所述第一代码元素包括变量、常量、函数、类以及文件;获取用户输入的元素用途信息,所述元素用途信息用于描述用户所需创建的第二代码元素的功能;分别计算每一所述注释与所述元素用途信息的相似度,获取与所述元素用途信息的相似度大于预设阈值的所述注释,并确定出与该注释对应的所述第一代码元素;将确定出的所述第一代码元素的命名推荐至所述用户,以使所述用户从各个推荐命名中,选取出与所述第二代码元素相关的命名。
- 根据权利要求16所述的计算机可读存储介质,其特征在于,所述分别计算每一所述注释与所述元素用途信息的相似度,获取与所述元素用途信息的相似度大于预设阈值的所述注释,并确定出与该注释对应的所述第一代码元素,包括:根据所述第一代码元素的所述注释以及所述用户输入的元素用途信息,生 成所述第一代码元素对应的第一向量以及所述第二代码元素对应的第二向量;计算所述第一向量与所述第二向量的余弦相似度;确定出与所述第二向量的余弦相似度大于预设阈值的所述第一向量,并获取确定出的所述第一向量所对应的所述第一代码元素。
- 根据权利要求17所述的计算机可读存储介质,其特征在于,所述根据所述第一代码元素的所述注释以及所述用户输入的元素用途信息,生成所述第一代码元素对应的第一向量以及所述第二代码元素对应的第二向量,包括:对所述元素用途信息以及所述注释进行分词处理,以得到分别与所述注释对应的多个第一分词以及与所述元素用途信息对应的多个第二分词;对所述多个第一分词以及所述多个第二分词进行合并去重处理后,输入预先建立的词袋模型;分别统计所述词袋模型中每一分词在所述注释中的词频-逆向文件频率TF-IDF信息,以根据该TF-IDF信息,生成所述注释对应的第一向量;分别统计所述词袋模型中每一分词在所述元素用途信息中的词频-逆向文件频率TF-IDF信息,以根据该TF-IDF信息,生成所述元素用途信息对应的第二向量。
- 根据权利要求16所述的计算机可读存储介质,其特征在于,所述计算机可读指令被至少一个处理器执行时,还实现如下步骤:在所述代码库中,若不存在与所述元素用途信息的相似度大于预设阈值的所述注释,则对所述元素用途信息进行分词处理,得到多个所述第二分词;分别将各个所述第二分词转换为英文单词;根据当前时刻所述用户所使用的编程语言,获取与所述编码语言匹配的命名算法;基于所述命名算法,对各个所述英文单词进行处理,并将处理后所得到的字符串推荐至所述用户。
- 根据权利要求16所述的计算机可读存储介质,其特征在于,所述计算 机可读指令被至少一个处理器执行时,还实现如下步骤:将所述元素用途信息与各个推荐命名绑定存储;将当前时刻用户选取的推荐命名的累积选取次数加一,以在再次接收到用户输入的所述元素用途信息时,根据对应的所述累积选取次数的高低顺序,将与所述元素用途信息绑定的各个推荐命名依次推荐至所述用户。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710674688.5 | 2017-08-09 | ||
CN201710674688.5A CN107463683B (zh) | 2017-08-09 | 2017-08-09 | 代码元素的命名方法及终端设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019028990A1 true WO2019028990A1 (zh) | 2019-02-14 |
Family
ID=60548738
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/104537 WO2019028990A1 (zh) | 2017-08-09 | 2017-09-29 | 代码元素的命名方法、装置、电子设备及介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107463683B (zh) |
WO (1) | WO2019028990A1 (zh) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109918058B (zh) * | 2017-12-13 | 2022-08-12 | 富士通株式会社 | 信息处理装置和方法以及在编程环境中推荐代码的方法 |
CN108170679B (zh) * | 2017-12-28 | 2021-09-03 | 中国联合网络通信集团有限公司 | 基于计算机可识别自然语言描述的语义匹配方法及系统 |
CN108427580B (zh) * | 2018-01-08 | 2020-01-10 | 平安科技(深圳)有限公司 | 配置对命名重复的检测方法、存储介质和智能设备 |
CN108664237B (zh) * | 2018-05-14 | 2019-04-12 | 北京理工大学 | 一种基于启发式和神经网络的非api成员推荐方法 |
CN108717470B (zh) * | 2018-06-14 | 2020-10-23 | 南京航空航天大学 | 一种具有高准确度的代码片段推荐方法 |
CN109828748A (zh) * | 2018-12-15 | 2019-05-31 | 深圳壹账通智能科技有限公司 | 代码命名方法、系统、计算机装置及计算机可读存储介质 |
CN111061688B (zh) * | 2019-12-13 | 2023-06-09 | 深圳前海环融联易信息科技服务有限公司 | 统计变量命名方式的方法、装置、计算机设备及存储介质 |
CN112307235B (zh) * | 2020-05-09 | 2024-02-20 | 支付宝(杭州)信息技术有限公司 | 前端页面元素的命名方法、装置及电子设备 |
CN112463162B (zh) * | 2020-12-11 | 2022-12-20 | 苏州浪潮智能科技有限公司 | 一种代码命名的推荐方法、系统、存储介质及设备 |
CN112579098B (zh) * | 2020-12-25 | 2024-02-06 | 平安银行股份有限公司 | 软件发布方法、装置、电子设备及可读存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100131925A1 (en) * | 2008-11-25 | 2010-05-27 | Sap Ag | Dynamic naming conventions in a source code editor |
CN102819575A (zh) * | 2012-07-20 | 2012-12-12 | 南京大学 | 一种用于Web服务推荐的个性化搜索方法 |
CN103914296A (zh) * | 2013-01-03 | 2014-07-09 | 国际商业机器公司 | 用于本机语言ide代码帮助的方法和系统 |
CN104809139A (zh) * | 2014-01-29 | 2015-07-29 | 日本电气株式会社 | 代码文件查询方法和装置 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2145265A4 (en) * | 2007-03-30 | 2011-09-14 | Amazon Tech Inc | CLUSTER-BASED ASSESSMENT OF USER INTERESTS |
KR102141272B1 (ko) * | 2014-06-30 | 2020-08-04 | 마이크로소프트 테크놀로지 라이센싱, 엘엘씨 | 코드 추천 기법 |
-
2017
- 2017-08-09 CN CN201710674688.5A patent/CN107463683B/zh active Active
- 2017-09-29 WO PCT/CN2017/104537 patent/WO2019028990A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100131925A1 (en) * | 2008-11-25 | 2010-05-27 | Sap Ag | Dynamic naming conventions in a source code editor |
CN102819575A (zh) * | 2012-07-20 | 2012-12-12 | 南京大学 | 一种用于Web服务推荐的个性化搜索方法 |
CN103914296A (zh) * | 2013-01-03 | 2014-07-09 | 国际商业机器公司 | 用于本机语言ide代码帮助的方法和系统 |
CN104809139A (zh) * | 2014-01-29 | 2015-07-29 | 日本电气株式会社 | 代码文件查询方法和装置 |
Non-Patent Citations (1)
Title |
---|
GAO, YUAN ET AL.: "Method Name Recommendation Based on Source Code Repository and Feature Matching", JOURNAL OF SOFTWARE, vol. 26, no. 12, 31 December 2015 (2015-12-31), pages 3062 - 3074, ISSN: 1000-9825 * |
Also Published As
Publication number | Publication date |
---|---|
CN107463683A (zh) | 2017-12-12 |
CN107463683B (zh) | 2018-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019028990A1 (zh) | 代码元素的命名方法、装置、电子设备及介质 | |
CN107463605B (zh) | 低质新闻资源的识别方法及装置、计算机设备及可读介质 | |
CN110390044B (zh) | 一种相似网络页面的搜索方法及设备 | |
CN109783787A (zh) | 一种结构化文档的生成方法、装置及存储介质 | |
CN110210038B (zh) | 核心实体确定方法及其系统、服务器和计算机可读介质 | |
CN107273546B (zh) | 仿冒应用检测方法以及系统 | |
CN110895961A (zh) | 医疗数据中的文本匹配方法及装置 | |
CN112328655B (zh) | 文本标签挖掘方法、装置、设备及存储介质 | |
US20200004817A1 (en) | Method, device, and program for text classification | |
CN110427453B (zh) | 数据的相似度计算方法、装置、计算机设备及存储介质 | |
CN111767713A (zh) | 关键词的提取方法、装置、电子设备及存储介质 | |
CN111651674B (zh) | 双向搜索方法、装置及电子设备 | |
KR20210089340A (ko) | 문서 내 텍스트를 분류하는 방법 및 장치 | |
CN114743012B (zh) | 一种文本识别方法及装置 | |
CN111160445B (zh) | 投标文件相似度计算方法及装置 | |
CN113986950A (zh) | 一种sql语句处理方法、装置、设备及存储介质 | |
CN116029280A (zh) | 一种文档关键信息抽取方法、装置、计算设备和存储介质 | |
CN109753646B (zh) | 一种文章属性识别方法以及电子设备 | |
Rexha et al. | Towards Authorship Attribution for Bibliometrics using Stylometric Features. | |
US11347928B2 (en) | Detecting and processing sections spanning processed document partitions | |
CN113935387A (zh) | 文本相似度的确定方法、装置和计算机可读存储介质 | |
CN112307235B (zh) | 前端页面元素的命名方法、装置及电子设备 | |
KR20210146832A (ko) | 토픽 키워드의 추출 장치 및 방법 | |
WO2021056740A1 (zh) | 语言模型构建方法、系统、计算机设备及可读存储介质 | |
Selivanov et al. | Package ‘text2vec’ |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10/07/2020) |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12-02-2021) |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17921404 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17921404 Country of ref document: EP Kind code of ref document: A1 |