CN110673836A

CN110673836A - Code completion method and device, computing equipment and storage medium

Info

Publication number: CN110673836A
Application number: CN201910779909.4A
Authority: CN
Inventors: 鲁志强
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2019-08-22
Filing date: 2019-08-22
Publication date: 2020-01-10
Anticipated expiration: 2039-08-22
Also published as: CN110673836B

Abstract

The present specification provides a code completion method, apparatus, computing device, and storage medium, wherein the code completion method includes: acquiring an n-gram sequence calculated from a code base according to a code keyword, wherein the n-gram sequence comprises a gram sequence larger than a gram; in a code statement where a user cursor is located, taking the position of the user cursor as an initial position, and taking out prefix words with a plurality of grammar sequence orders forward; using prefix words with the number of grammar sequence orders to search grammar taking the prefix words as prefixes from grammar sequences larger than a unary; and if the grammar taking the prefix word as the prefix is found, taking a subsequent word of the prefix in the grammar as a code completion candidate corresponding to the n-element grammar. The candidate item of code completion is predicted from probability based on n-gram, the code capable of being completed can comprise codes with various meanings, even constants and the like, the code amount input by programmers is reduced to the greatest extent, and the code development efficiency is improved.

Description

Code completion method and device, computing equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a code completion method and apparatus, a computing device, and a storage medium.

Background

A code editor is a tool used by programmers to edit code.

In order to reduce the amount of code input by a programmer and improve the efficiency of code development, a code completion function may be provided in a code editor. Currently, code completion is implemented by displaying a list of available member functions according to program context when a programmer edits codes in a code editor, and automatically completing the codes after the programmer selects the members from the list. However, the code that the programmer needs to enter is far more than a member function that can be determined according to the program context, and includes code with other meanings, constants, and the like. As the amount of code development becomes larger, the amount of code input by a programmer becomes larger, hindering the efficiency of code development.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a code completion method, apparatus, computing device and storage medium, so as to solve technical defects in the prior art.

According to a first aspect of embodiments herein, there is provided a code completion method, including: acquiring an n-gram sequence calculated from a code base according to a code keyword, wherein the n-gram sequence comprises a gram sequence larger than a unary; in a code statement where a user cursor is located, taking the position of the user cursor as a starting position, and taking out prefix words with the number of orders of the grammar sequence larger than the unary forward; searching the grammar with the prefix words as prefixes from the grammar sequence which is larger than the unary by using the prefix words with the number of orders of the grammar sequence which is larger than the unary; and if the grammar taking the prefix word as the prefix is found, taking a subsequent word of the prefix in the grammar as a code completion candidate corresponding to the n-element grammar.

Optionally, the method further comprises: and determining the sorting positions of the subsequent words of the prefix in the grammar in all code completion candidates according to the times of the occurrence of the grammar in the code base.

Optionally, the n-gram sequence comprises a unary grammar sequence and a grammar sequence greater than unary; the method further comprises the following steps: and if the prefix words with the number of orders of the grammar sequence larger than the unary are used, the grammar taking the prefix words as prefixes is not found from all the grammar sequences larger than the unary, and the words in the unary grammar sequence are used as code completion candidates corresponding to the n-gram.

Optionally, in the code sentence where the user cursor is located, taking the user cursor position as an initial position, taking forward prefix words with an order number greater than that of the unary grammar sequence includes: taking the highest order in the grammar sequence which is greater than the unary as the current order; and in the code sentence where the user cursor is positioned, taking the position of the user cursor as a starting position, and taking forward prefix words with the number of orders of the current-order grammar sequence. The searching, using the prefix words with the number of orders of the syntax sequence greater than one unit, the syntax with the prefix words as prefixes from the syntax sequence greater than one unit includes: and searching the grammar taking the prefix words as prefixes from the grammar sequence of the current order by using the prefix words with the number of the current order. If the grammar with the prefix word as the prefix is found, taking the subsequent word of the prefix in the grammar as a code completion candidate corresponding to the n-gram comprises the following steps: if the grammar taking the prefix word as the prefix is searched from the grammar sequence of the current order, taking the subsequent word of the prefix in the grammar taking the prefix as a code completion candidate corresponding to the n-element grammar; if no grammar using the prefix words as prefixes is found in the grammar sequence of the current order and the next order of the current order is not the unary grammar sequence, updating the order of the current order to the next order of the current order in the n-element grammar sequence, re-entering the code sentence where the user cursor is located, and taking out the prefix words with the number of orders of the grammar sequence of the current order forward by using the position of the user cursor as the starting position. If the prefix words with the number of orders greater than the unary are used and the grammar with the prefix words as prefixes is not found from all the unary-greater grammar sequences, taking the words in the unary grammar sequence as the code completion candidates corresponding to the n-gram includes: and if the prefix words with the number of the orders of the current-order grammar sequence are used, the grammar taking the prefix words as prefixes is not found from the current-order grammar sequence, and the next order of the current-order grammar sequence is a unitary grammar sequence, taking a word set in the unitary grammar sequence as a code completion candidate corresponding to the n-element grammar.

Optionally, the method further comprises: determining a context state of a position of the user cursor; searching out a code completion candidate item corresponding to the context state; and merging the code completion candidate item corresponding to the n-element grammar with the code completion candidate item corresponding to the context state to obtain a code completion candidate item complete set.

Optionally, in the complete set of code completion candidates, the ranking weight of the code completion candidate corresponding to the context state is zero, in the code completion candidate corresponding to the n-gram, the ranking weight of a word found from a grammar sequence greater than a unary is the number of times that the grammar in which the word is found appears in the code base, and the ranking weight of a word found from a unary grammar sequence is the number of times that the word appears in the code base.

According to a second aspect of embodiments herein, there is provided a code completion apparatus including: a grammar acquisition module configured to acquire n-gram sequences calculated from a code library according to code keywords, wherein the n-gram sequences include grammar sequences larger than a unary. And the prefix acquisition module is configured to take out the prefix words with the number of orders of the grammar sequence larger than one element forwards by taking the position of the user cursor as an initial position in the code sentence where the user cursor is positioned. And the grammar searching module is configured to search the grammar with the prefix words as prefixes from the grammar sequence which is greater than the unary by using the prefix words with the number of orders of the grammar sequence which is greater than the unary. And the code completion module is configured to take a subsequent word of the prefix in the grammar as a code completion candidate option corresponding to the n-gram if the grammar taking the prefix word as the prefix is found.

Optionally, the apparatus further comprises: a number ranking module configured to determine a ranking position of a word succeeding the prefix in the grammar among all the code completion candidates according to the number of times the grammar appears in the code library.

Optionally, the n-gram sequence comprises a unary grammar sequence and a grammar sequence greater than unary; the device further comprises: and the unary searching module is configured to find the grammar taking the prefix words as prefixes from all the unary grammar sequences if the prefix words with the number of the orders of the unary grammar sequences are used, and take the words in the unary grammar sequences as the code completion candidates corresponding to the n-gram.

Optionally, the prefix obtaining module includes: a high-order start submodule configured to take a highest order in the syntax sequence larger than one element as a current order. And the prefix acquisition submodule is configured to take out prefix words with the number of orders of the current-order grammar sequence forward by taking the position of the user cursor as an initial position in the code sentence where the user cursor is located. The grammar searching module is configured to search a grammar using the prefix words with the number of orders of the current order from the grammar sequence of the current order. The code completion module comprises: and the current grammar code completion sub-module is configured to take each subsequent word of the prefix in the grammar taking the prefix as a code completion candidate corresponding to the n-element grammar if the grammar taking the prefix as the prefix is found in the current-order grammar sequence. And the order updating sub-module is configured to update the order of the current order to the next order of the current order in the n-element grammar sequence and re-trigger the prefix obtaining sub-module to execute if the grammar taking the prefix words as prefixes is not found in the grammar sequence of the current order and the next order of the current order is not the unary grammar sequence. The unary search module is configured to, if a grammar with the prefix words as prefixes is not found in the grammar sequence of the current order by using the prefix words with the number of orders of the grammar sequence of the current order, and the next order of the grammar sequence of the current order is the unary grammar sequence, take the word set in the unary grammar sequence as the code completion candidate corresponding to the n-gram.

Optionally, the apparatus further comprises: a context state determination module configured to determine a context state of a location where the user cursor is located. And the code completion candidate searching module is configured to search out code completion candidate items corresponding to the context states. And the merging module is configured to merge the code completion candidate item corresponding to the n-gram with the code completion candidate item corresponding to the context state to obtain a code completion candidate item complete set.

According to a third aspect of embodiments herein, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the code completion method when executing the instructions.

According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the code completion method.

In the embodiment of the present specification, n-gram sequences collected from a code base according to code keywords are obtained, prefix words with the number of orders of the grammar sequences before a user cursor are taken out from a code sentence where the user cursor is located, prefix words with the number of orders are used, a grammar with the prefix words as prefixes is searched from the grammar sequences larger than one unit, if a grammar with the prefix words as prefixes is found, a subsequent word of the prefixes in the grammar is used as a code completion candidate corresponding to the n-gram, so that code completion is performed on the n-gram of all codes in the code base, because the n-gram model is a probabilistic grammar built on a markov model, candidates for code completion are predicted in the embodiment of the present specification in terms of probability, and the codes capable of being completed can include member functions, and all codes of the code base, Codes with various meanings, even constants and the like greatly reduce the code amount input by programmers and improve the code development efficiency.

Drawings

FIG. 1 is a block diagram of a computing device provided by embodiments of the present description;

FIG. 2 is a flow chart of a method for code completion provided by an embodiment of the present specification;

FIG. 3 is a flow chart of a method of code completion provided by another embodiment of the present description;

FIG. 4 is a flow chart of a method of code completion provided by yet another embodiment of the present description;

FIG. 5 is a flow chart of a method of code completion provided by yet another embodiment of the present description;

FIG. 5a is a diagram illustrating the state flow of a finite state machine according to an embodiment of the present disclosure;

FIG. 6 is a block diagram of a code completion apparatus provided by an embodiment of the present specification;

fig. 7 is a block diagram of a code completion apparatus according to another embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

n-gram: refers to n words occurring in succession in the text, and is also referred to as unigram, bigram, and trigram, respectively, for example, when n is 1, 2, and 3, respectively.

The n-gram sequence is an n-gram comprising a plurality of grammars. For example, when n is 2, the bigram sequence includes a plurality of bigrams. The n-gram model is a probabilistic language model based on (n-1) order Markov chains, and thus the order of the n-gram sequence is n-1, e.g., the order of the bigram sequence is first order.

In the present specification, a code completion method, apparatus, computing device, and storage medium are provided, and detailed descriptions are made one by one in the following embodiments.

FIG. 1 shows a block diagram of a computing device 100, according to an embodiment of the present description. The components of the computing device 100 include, but are not limited to, a memory 110 and a processor 120. The processor 120 is coupled to the memory 110 via a bus 130 and a database 150 is used to store data.

Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 140 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 100 and other components not shown in FIG. 1 may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 1 is for purposes of example only and is not limiting as to the scope of the description. Those skilled in the art may add or replace other components as desired.

Computing device 100 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.

The processor 120 may perform the steps of the code completion method shown in fig. 2. Fig. 2 shows a flowchart of a code completion method according to an embodiment of the present specification, including steps 210 to 240.

Step 210: and acquiring an n-gram sequence calculated by a code key word from a code base, wherein the n-gram sequence comprises a grammar sequence which is larger than a unary.

For example, the n-gram sequences described in the embodiments of the present specification may include: unary grammar sequences, binary grammar sequences, and ternary grammar sequences. The n-gram sequence can take higher order n-grams to get more accurate prediction, depending on implementation environment requirements. For the sparse problem of the n-gram, a smoothing algorithm related to the n-gram can be adopted to solve the sparse problem.

The embodiment of the specification can be applied to the code completion of the SQL code editor, and the code library is an SQL code library. SQL, a special purpose programming language, is used to manage relational database management systems, or to perform stream processing in relational flow data management systems. Specifically, known SQL code libraries may be collected, and a unigram sequence U-Gram, a bigram sequence B-Gram, and a trigram sequence T-Gram may be calculated for the collected SQL code libraries by taking words, such as keywords, variables, and the like, in the SQL code as units. And when code completion is required, acquiring pre-calculated U-Gram, B-Gram and T-Gram.

For example, assume that the known code library has SQL code as follows:

sql

select name from table2where id＝100

the obtained n-gram grammar sequence is as follows:

U-GRAM sequence: [ 'select', 'name', 'from', 'table 2', 'where', 'id', '100' ]

B-GRAM sequence: "select ', ' name '," name ', from ', "table 2 '," table2 ', "where '," id ', "white '," 100 ' ], or a combination thereof

T-GRAM sequence: "from", "table 2", "from", "table 2", "where", "table 2", "where", "id", "100", "etc.", are used to describe the object of the invention

Step 220: and in the code sentence where the user cursor is positioned, taking the position of the user cursor as an initial position, and taking out the prefix words with the number of orders of the grammar sequence larger than the unary forward.

For example, the SQL statement S where the user cursor is located may be distinguished first. The prefix word sequence SW before the user cursor is taken out of the sentence S (w1, w2, … wn). According to the grammar sequence order, taking prefix words with the number of orders from the prefix word sequence, for example, for the ternary grammar sequence, according to the order of two, the taken prefix words are (w (n-1), wn). For a bigram sequence, the prefix word is taken as (w) according to the order of one_n)。

More specifically, for example, assuming that the sentence S is "select from table2 where", the word sequence SW is [ 'select', 'from', 'table 2', 'where' ]. For a trigram sequence, the prefix words that need to be fetched from the sentence S are [ 'table 2', 'where' ]. For bigram sequences, the prefix word that needs to be retrieved from statement S is [ 'where' ].

Step 230: and searching the grammar with the prefix words as prefixes from the grammar sequence which is greater than the unary by using the prefix words with the number of orders of the grammar sequence which is greater than the unary.

For example, a grammar prefixed by (w (n-1), wn) may be looked up using the prefix word (w (n-1), wn) in the trigram sequence T-Gram. In the bigram sequence B-Gram, a prefix word (wn) is used to find the grammar prefixed by (wn).

In the embodiments of the present specification, the implementation manner of searching for a grammar prefixed by the prefix word from a grammar sequence larger than a unary is not limited. For example, concurrent lookups may be performed for each sequence of grammars greater than a unary, such that a grammar matching a prefix may be found from each sequence of grammars of order. However, in consideration of the large calculation amount and low efficiency of the search method, the embodiments of the present specification may start the step-by-step search from the highest order, and once a grammar matching a prefix is found in a grammar sequence of a certain order, a subsequent word of the prefix in the grammar matching the prefix in the grammar sequence of the order is used as a code completion candidate, and the search is stopped.

Step 240: and if the grammar taking the prefix word as the prefix is found, taking a subsequent word of the prefix in the grammar as a code completion candidate corresponding to the n-element grammar.

For example, if a grammar with (w (n-1), wn) as a prefix is found in the trigram sequence T-Gram, all the subsequent words of (w (n-1), wn) in the grammar with (w (n-1), wn) as the prefix are used as code completion candidates.

In an embodiment of this specification, the order positions of the subsequent words of the prefix in the grammar in all the code completion candidates may also be determined according to the number of times the found grammar appears in the code library. By the implementation mode, the code completion candidates provided for the programmer have a certain priority order, and the programmer can determine the codes needing to be completed more easily and quickly.

As can be seen, in the embodiments of the present specification, an n-gram sequence collected from a code library according to code keywords is obtained, prefix words with an order number of the grammar sequence before a user cursor are taken out from a code sentence where the user cursor is located, prefix words with an order number are used, a grammar with the prefix words as prefixes is searched from the grammar sequence larger than one-gram, and if a grammar with the prefix words as prefixes is found, subsequent words of the prefixes in the grammar are used as code completion candidates corresponding to the n-gram, so that code completion is performed on the n-gram based on the code of the code library. Because the n-gram model is a probabilistic grammar built on the Markov model, the embodiment of the specification predicts the candidate of code completion in terms of probability, and the completeable codes can comprise member functions of code library codes, codes with various meanings, even constants and the like, so that the code amount input by a programmer is greatly reduced, and the code development efficiency is improved. For example, the sentence S "select × from ribbon where dt ═ where the user cursor is located is subjected to code completion, and the user cursor is after" ═ "in. Assuming that there are many sentences like "select id from table _ xxx where dt ═ $ { yyyymmdd } '", the search for the code completion candidate is performed based on the n-gram obtained from the code base, and a constant like "$ yymmdd }'" can be obtained as the code completion candidate.

For example, the core function of the one-stop data engineering platform is data research and development, wherein SQL code editing is the main mode of data research and development, and the data research and development experience can be improved by accurately and intelligently performing SQL code completion according to the code completion method of the embodiment of the specification.

In an embodiment of this specification, in order to achieve code completion to a greater extent, in consideration that a syntax sequence larger than a unary may not find a syntax whose corresponding prefix word is a prefix, the n-gram syntax sequence described in this specification may include a unary syntax sequence and a syntax sequence larger than a unary. And if the prefix words with the number of orders of the grammar sequence larger than the unary are used, the grammar taking the prefix words as prefixes is not found from each grammar sequence larger than the unary, and the words in the unary grammar sequence are taken as the code completion candidates corresponding to the n-gram. In addition, the sequence positions of the words in all the code completion candidates can be determined according to the times of the words in the unary grammar sequence appearing in the code library.

In order to improve the search efficiency, embodiments of the present specification may start with the highest-order stepwise search, and once a grammar matching a prefix is found in a grammar sequence of a certain order, all subsequent words of the prefixes in the grammars matching the prefixes in the grammar sequence of the order are used as the completion candidates of the codes, and stop the search, and if the stepwise search is performed until the final unary grammar sequence, the words in the unary grammar sequence are used as the completion candidates of the codes corresponding to the n-gram.

In order to make this embodiment easier to understand, the following describes in detail the embodiment with reference to a flowchart of a code completion method according to another embodiment of the present specification shown in fig. 3. As shown in fig. 3, steps 310 to 350 are included.

Step 310: and acquiring an n-gram sequence calculated from a code base according to the code key words, wherein the n-gram sequence comprises a unary grammar sequence and a grammar sequence larger than the unary.

Step 311: and taking the highest order in the grammar sequence which is greater than the unary as the current order.

Step 320: and in the code sentence where the user cursor is located, taking the position of the user cursor as an initial position, and taking out prefix words with the number of orders of the current-order grammar sequence forward.

Step 330: and searching the grammar taking the prefix words as prefixes from the grammar sequence of the current order by using the prefix words with the number of the current order.

Step 331: and judging whether the grammar taking the prefix word as the prefix is searched from the grammar sequence of the current order.

Step 340: and if the grammar taking the prefix word as the prefix is found from the grammar sequence of the current order, taking the subsequent word of the prefix in each grammar taking the prefix word as the prefix as a code completion candidate corresponding to the n-element grammar, and determining the sorting position of the subsequent word of the prefix in the grammar in all the code completion candidates according to the occurrence frequency of the grammar in the code base.

Step 350: and if the grammar taking the prefix word as the prefix is not found in the grammar sequence of the current order, judging whether the next order of the current order in the n-element grammar sequence is an unary grammar sequence.

For example, assuming that the obtained n-gram sequences are a quaternary grammar sequence, a ternary grammar sequence, and a unary grammar sequence, the current level is a ternary grammar sequence, and the next level of the ternary grammar sequence is the unary grammar sequence. For another example, assume that the obtained n-gram sequences are trigram sequences, bigram sequences, and unary sequences, and the current level is a trigram sequence, the next level of the trigram sequence is a bigram sequence.

Step 351: if not, the order of the current order is updated to the next order of the current order in the n-gram sequence, and step 320 is re-entered.

Step 352: and if the word is the unigram sequence, taking a word set in the unigram sequence as a code completion candidate corresponding to the n-gram.

In this embodiment, the ranking positions of the words in all the code completion candidates may also be determined according to the number of times the words in the unary grammar sequence appear in the code base.

In another embodiment of the present specification, n-gram code completion is combined with context state code completion, and in order to make this embodiment easier to understand, this embodiment will be described in detail below with reference to a flowchart of a code completion method according to another embodiment of the present specification shown in fig. 4. As shown in fig. 4, step 401 to step 450 are included.

Step 401: determining a context state of a location of the user cursor.

It should be noted that, in the embodiment of the present specification, a method for determining a context state is not limited, and a next state may be determined according to a current state and each token in a sentence where a user cursor is located. Specifically, for example, the finite state machine may define a circulation relationship of each state, and thus, a finite state machine state circulation method may be adopted to determine a context state of a position where a user cursor is located.

Step 402: and finding out a code completion candidate item corresponding to the context state.

For example, the context state may be predefined, and the corresponding code completion candidates may be preset respectively. After determining the context state of the position of the user cursor, the code completion candidate corresponding to the context state of the position of the user cursor can be found out from the predefined corresponding relation.

Step 410: and acquiring an n-gram sequence calculated from a code base according to the code key words, wherein the n-gram sequence at least comprises a grammar sequence which is larger than a unary.

Step 420: and in the code sentence where the user cursor is positioned, taking the position of the user cursor as an initial position, and taking out the prefix words with the number of orders of the grammar sequence larger than the unary forward.

Step 430: and searching the grammar with the prefix words as prefixes from the grammar sequence which is greater than the unary by using the prefix words with the number of orders of the grammar sequence which is greater than the unary.

Step 440: and if the grammar taking the prefix word as the prefix is found, taking a subsequent word of the prefix in the grammar as a code completion candidate corresponding to the n-element grammar.

Step 450: and merging the code completion candidate item corresponding to the n-element grammar with the code completion candidate item corresponding to the context state to obtain a code completion candidate item complete set.

Therefore, the implementation mode can be used for completing the codes according to the context, giving meaningful code completion aiming at all common SQL statement types, and completing the codes from the aspect of probability prediction by using the n-element grammar, so that the defect of completing the codes according to the context is overcome.

In another embodiment of the present specification, in order to enable the completion candidates to have a certain priority, in the completion candidate set, the ranking weight of the completion candidates corresponding to the context state is zero, in the completion candidates corresponding to the n-gram, the ranking weight of a word found from a grammar sequence greater than one gram is the number of times that the grammar in which the word is located appears in the code base, and the ranking weight of a word found from a grammar sequence is the number of times that the word appears in the code base. According to the embodiment, the code completion candidates can be sorted according to the sorting weight, and a programmer can determine the codes needing completion more easily and quickly.

In order to make the embodiments of the present disclosure more comprehensible, a detailed description is given below of one possible implementation of the combination of some of the above-described embodiments.

FIG. 5 shows a flow diagram of a code completion method according to yet another embodiment of the present description. The code completion method is described by taking the example that the n-gram sequences comprise a unary grammar sequence, a bigram sequence and a trigram sequence and the context is combined for code completion, and comprises the steps 501 to 570.

Step 501: and cutting out the sentence S where the user cursor is positioned.

For example, assume that the code segment is as follows, wherein the user cursor stays at the last position of the code, and the sentence S in which the user cursor is cut out is "select from table2 where".

sql

select id from table1；

select*from table2where

Step 502: determining a context state of a location of the user cursor.

For example, all possible context states SC may be predefined, and the context state C at the position of the user cursor may be determined by a finite state machine state flow method. Schematic diagram of finite state machine state flow as shown in fig. 5a, a predefined context state flows in the direction indicated by the arrow.

In order to determine the context state SC of the position of the user cursor in the statement S, as shown in fig. 5a, starting from the state "START", the first token in S, i.e., "SELECT", is taken, and the state shown in fig. 5a is switched from "START" to "SELECT"; taking the next token in S, namely, the state is unchanged; taking the next token in S, i.e. "FROM", the state is switched FROM "SELECT" to "FROM" as shown in FIG. 5 a; taking the next token in S, i.e., "TABLE 2", the state switches FROM "to" TABLE _ idle "as shown in fig. 5 a; taking the next token in S, i.e., "WHERE", the state is switched from "TABLE _ idle" to "WHERE" as shown in fig. 5 a; at this time, the user cursor position is reached, so the context state C of the position of the user cursor is "WHERE". In this embodiment, the process of the next state, i.e. the method of flowing between states in the finite state, is determined according to the current state and the next token.

Step 503: and finding out the code completion candidate items corresponding to the context state, adding the code completion candidate items into a code completion candidate item set CS (1), and setting the ranking weight of all code completion items in the CS (1) to be 0.

For example, for predefined context states, corresponding code completion candidates may be preset respectively. After determining the context state, the complement candidates of the code corresponding to the determined context state may be found. For example, for the "START" state, corresponding code completion candidates including keywords such as "SELECT", "INSERT", "UPDATE", and the like may be preset; for the state of "WHERE", the corresponding code completion candidates including physical table fields, functions, etc. may be preset. In connection with the above statement S example, it can be found that the code completion candidates corresponding to the context state "WHERE" include SQL functions (SUM, MAX, MIN, AVG, etc.), and physical table fields, i.e., fields in table2, (assumed as id, name). Therefore, CS (1) determined according to the context state of the sentence S is [ 'SUM', 'MAX', 'MIN', 'AVG', 'id', 'name' ].

Step 510: and acquiring a unigram sequence U-Gram, a bigram sequence B-Gram and a trigram sequence T-Gram which are calculated from a code base according to code keywords.

For example, assume that the known code library has SQL code as follows:

sql

select name from table2where id＝100

the obtained n-gram grammar sequence is as follows:

U-GRAM sequence: [ 'select', 'name', 'from', 'table 2', 'where', 'id', '100' ]

Step 520: the prefix word sequence SW before the user cursor of the sentence S is extracted (w1, w2, … wn).

For example, the sentence S is "select from table2 where", and the word sequence SW is [ 'select', 'from', 'table 2', 'where' ].

Step 530: and when the SW length is more than 2, searching the ternary grammar taking (w (n-1) and wn) as the prefix in the ternary grammar sequence T-Gram, if the ternary grammar is found, adding all ending words of the ternary grammar taking (w (n-1) and wn) as the prefix in the ternary grammar sequence T-Gram into a code completion candidate set CS (2), counting the frequency of the found ternary grammar appearing in a code base, and taking the counted frequency as the sorting weight of the corresponding ending words.

For example, the SW length is 5, at this time, a ternary syntax STG with prefix of [ table2 ', ' where ' ] is searched in the T-GRAM sequence, the [ table2 ', ' where ', ' id ' ], the end word is "id", and it is counted that the end word "id" appears once, CS (2) is [ id ' ], wherein the sorting weight of "id" is 1.

Step 540: and when the SW length is larger than 1 and no code completion candidate item is found in the ternary grammar sequence, searching binary grammar taking (wn) as a prefix in the binary grammar sequence B-Gram, if the binary grammar is found, adding all tail words of the binary grammar taking (wn) as the prefix in the binary grammar sequence B-Gram into a code completion candidate item set CS (2), counting the frequency of the found binary grammar appearing in the code base, and taking the counted frequency as the sorting weight of the corresponding tail word.

Step 550: and when the code completion candidate is not found in the bigram sequence, adding all the words in the unary grammar sequence U-Gram into the code completion candidate set CS (2), counting the times of all the unary grammars appearing in the code base, and taking the counted times as the sorting weight of the corresponding words.

Step 560: for each entry Wi in CS (2), check if Wi is in CS (1), if Wi is not in CS (1), add Wi to CS (1), if Wi is in CS (1), set the ordering weight of Wi to the same entry in CS (1).

For example, checking "id" in CS (2) determines that in CS (1), the ranking weight of the code completion candidate "id" in CS (1) is set to 1, and the ranking weights of the remaining code completion candidates in CS (1) are 0.

Step 570: and sorting all the code completion candidate items in the CS (1) according to the sorting weight to obtain a final code completion candidate item complete set CS.

For example, the code completion candidates in CS (1) are sorted according to the sorting weight, and the final sorted complete set of code completion candidates CS is [ 'id', 'SUM', 'MAX', 'MIN', 'AVG', 'name' ].

Corresponding to the above method embodiments, the present specification further provides a code completion apparatus embodiment, and fig. 6 shows a block diagram of the code completion apparatus according to an embodiment of the present specification. As shown in fig. 6, the apparatus includes: a syntax obtaining module 610, a prefix obtaining module 620, a syntax searching module 630 and a code complementing module 640.

The grammar acquisition module 610 may be configured to acquire n-gram sequences calculated from a code library according to code keywords, wherein the n-gram sequences include grammar sequences larger than a unary.

The prefix obtaining module 620 may be configured to take out prefix words with the number of orders greater than the unary grammar sequence forward in the code sentence where the user cursor is located, with the user cursor position as a starting position.

The syntax lookup module 630 may be configured to use the prefix words with the number of orders of the syntax sequence greater than the unary to lookup the syntax prefixed by the prefix words from the syntax sequence greater than the unary.

The code completion module 640 may be configured to, if a grammar with the prefix word as a prefix is found, take a subsequent word of the prefix in the grammar as a code completion candidate corresponding to the n-gram.

As can be seen, in the embodiments of the present specification, n-gram sequences collected from a code library according to code keywords are obtained, prefix words with the number of orders of the grammar sequences before a user cursor are taken out from a code sentence where the user cursor is located, the grammar with the prefix words as prefixes is searched from the grammar sequences larger than one-gram by using the prefix words with the number of orders, and if the grammar with the prefix words as prefixes is found, subsequent words of the prefixes in the grammar are used as code completion candidates corresponding to the n-gram, so that code completion is performed on the n-gram based on all codes in the code library. Because the n-gram model is a probabilistic grammar built on the Markov model, the embodiment of the specification predicts the candidate of code completion in terms of probability, and the code capable of being completed can comprise member functions of all codes in a code library, codes with various meanings, even constants and the like, so that the code amount input by a programmer is greatly reduced, and the code development efficiency is improved. For example, the sentence S "select × from ribbon where dt ═ where the user cursor is located is subjected to code completion, and the user cursor is after" ═ "in. Assuming that there are many sentences like "select id from table _ xxx where dt ═ $ { yyyymmdd } '", the search for the code completion candidate is performed based on the n-gram obtained by the code base, and a constant such as "$ yymmdd }'" can be obtained.

Fig. 7 is a block diagram of a code completion apparatus according to another embodiment of the present specification. As shown in fig. 7, the apparatus further includes: the grammar number ordering module 650 may be configured to determine an ordering position of a subsequent word of the prefix in the grammar among all the code completion candidates according to the found number of times the grammar appears in the code base. By the implementation mode, the code completion candidates provided for the programmer have a certain priority, and the programmer can determine the codes needing to be completed more easily and quickly.

In an embodiment of this specification, in order to achieve code completion to a greater extent, in consideration that a syntax sequence larger than a unary may not find a syntax whose corresponding prefix word is a prefix, the n-gram syntax sequence described in this specification may include a unary syntax sequence and a syntax sequence larger than a unary. In this embodiment, as shown in fig. 7, the apparatus further includes: the unary search module 660 may, if the prefix words with the number of the orders of the grammar sequence greater than the unary are used, find no grammar with the prefix words as prefixes from all grammar sequences greater than the unary, and use the words in the unary grammar sequence as the code completion candidates corresponding to the n-gram.

As shown in fig. 7, in this embodiment, the prefix obtaining module 620 may include: a high-order starting sub-module 621, which may be configured to take the highest order in the syntax sequence greater than the unary as the current order. The prefix obtaining sub-module 622 may be configured to, in the code statement where the user cursor is located, take the user cursor position as a starting position, and forward fetch prefix words of the number of orders of the current-order grammar sequence. The syntax lookup module 630 may be configured to use the prefix words with the number of orders of the current order to lookup the syntax prefixed by the prefix word from the syntax sequence of the current order. The code completing module 640 may include: the current grammar code completion sub-module 641 may be configured to, if a grammar with the prefix word as a prefix is found from the current-order grammar sequence, take a subsequent word of the prefix in each grammar with the prefix word as a code completion candidate corresponding to the n-gram. The order updating sub-module 642 may be configured to, if no syntax prefixed with the prefix word is found in the current-order syntax sequence and the next order of the current order is not the unary syntax sequence, update the order of the current order to the next order of the current order in the n-gram sequence, and re-trigger the prefix obtaining sub-module 622 to execute. The unary search module 660 may be configured to, if a grammar prefixed by the prefix word is not found in the current-order grammar sequence using the prefix words with the number of orders of the current-order grammar sequence, and the next order of the current-order grammar sequence is the unary grammar sequence, take a word set in the unary grammar sequence as a completion candidate for a code corresponding to the n-gram.

In yet another embodiment of the present specification, the completion by n-gram codefilling is combined with the completion by context state codefilling. As shown in fig. 7, in this embodiment, the apparatus may further include: the context state determination module 670 may be configured to determine a context state of a location of the user cursor. A code completion candidate lookup module 671 may be configured to find code completion candidates corresponding to the context state. The merging module 672 may be configured to merge the code completion candidate item corresponding to the n-gram with the code completion candidate item corresponding to the context state, so as to obtain a complete set of code completion candidate items. By the implementation method, code completion can be performed according to the context, meaningful code completion can be provided for all common SQL statement types, code completion can be performed from the aspect of probability prediction by using the n-gram, and the defect of code completion according to the context is overcome.

In another embodiment of this specification, in order to enable the completion candidates of the codes to have a certain priority, in the completion candidate set of the codes, the ranking weight of the completion candidates of the codes corresponding to the context state is zero, in the completion candidates of the codes corresponding to the n-gram, the ranking weight of a word found from a grammar sequence larger than a unary is the number of times that the grammar in which the word is located appears in the code base, and the ranking weight of a word found from a unary grammar sequence is the number of times that the word appears in the code base.

An embodiment of the present specification further provides a computing device, which includes a memory, a processor, and computer instructions stored in the memory and executable on the processor, wherein the processor implements the steps of the code complementing method when executing the instructions.

An embodiment of the present specification further provides a computer readable storage medium storing computer instructions, which when executed by a processor, implement the steps of the code complementing method as described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the code completion method belong to the same concept, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the code completion method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as subject to legislation and patent practice.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are all described as a series of action combinations, but those skilled in the art should understand that the present specification is not limited by the described action sequences, because some steps can be performed in other sequences or simultaneously according to the present specification. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for this description.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the specification and its practical application, to thereby enable others skilled in the art to best understand the specification and its practical application. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A method of code completion, comprising:

acquiring an n-gram sequence calculated from a code base according to a code keyword, wherein the n-gram sequence comprises a gram sequence larger than a unary;

in a code statement where a user cursor is located, taking the position of the user cursor as an initial position, and taking out prefix words with the number of orders of the grammar sequence larger than the unary forward;

searching the grammar with the prefix words as prefixes from the grammar sequence which is larger than the unary by using the prefix words with the number of orders of the grammar sequence which is larger than the unary;

and if the grammar taking the prefix word as the prefix is found, taking a subsequent word of the prefix in the grammar as a code completion candidate corresponding to the n-element grammar.

2. The method of claim 1, further comprising:

and determining the sorting positions of the subsequent words of the prefix in the grammar in all code completion candidates according to the times of appearance of the grammar in the code library.

3. The method of claim 1, wherein the n-gram sequences include unary grammar sequences and grammar sequences greater than unary; the method further comprises the following steps:

and if the prefix words with the number of orders of the grammar sequence larger than the unary are used, the grammar taking the prefix words as prefixes is not found from all the grammar sequences larger than the unary, and the words in the unary grammar sequence are used as code completion candidates corresponding to the n-gram.

4. The method according to claim 3, wherein the taking forward the prefix words with the number of orders greater than the number of the unary grammar sequences in the code sentence where the user cursor is located, with the user cursor position as a starting position, comprises:

taking the highest order in the grammar sequence which is greater than the unary as the current order;

in a code statement where a user cursor is located, taking the position of the user cursor as an initial position, and taking out prefix words with the number of orders of the current-order grammar sequence forward;

the searching, using the prefix words with the number of orders of the syntax sequence greater than the unary, the syntax with the prefix words as prefixes from the syntax sequence greater than the unary includes:

searching grammar taking the prefix words as prefixes from the grammar sequence of the current order by using the prefix words with the number of the current order;

if the grammar with the prefix word as the prefix is found, taking a subsequent word of the prefix in the grammar as a code completion candidate corresponding to the n-gram comprises the following steps:

if the grammar taking the prefix word as the prefix is found from the grammar sequence of the current order, taking the subsequent word of the prefix in the grammar taking the prefix word as the prefix as a code completion candidate corresponding to the n-element grammar;

if no grammar with the prefix words as prefixes is found in the grammar sequence of the current order and the next order of the current order is not the unary grammar sequence, updating the order of the current order to the next order of the current order in the n-element grammar sequence, re-entering the code sentence where the user cursor is located, and taking out the prefix words with the number of orders of the grammar sequence of the current order forward by taking the position of the user cursor as an initial position;

if the prefix words with the number of orders greater than the unary are used and the grammar with the prefix words as prefixes is not found from all the unary-greater grammar sequences, taking the words in the unary grammar sequence as the code completion candidates corresponding to the n-gram includes:

and if the prefix words with the number of the orders of the current-order grammar sequence are used, the grammar taking the prefix words as prefixes is not found from the current-order grammar sequence, and the next order of the current-order grammar sequence is a unitary grammar sequence, taking a word set in the unitary grammar sequence as a code completion candidate corresponding to the n-gram.

5. The method according to any one of claims 1-4, further comprising:

determining a context state of a position of the user cursor;

finding out a code completion candidate item corresponding to the context state;

and merging the code completion candidate item corresponding to the n-element grammar with the code completion candidate item corresponding to the context state to obtain a code completion candidate item complete set.

6. The method of claim 5 wherein in the completion candidate set, the completion candidate for a word corresponding to the context state has a zero ranking weight, and in the completion candidate for a n-gram, the word found in a higher-than-unary grammar sequence has a ranking weight that is the number of times its grammar appears in the codebase, and the word found in a unary grammar sequence has a ranking weight that is the number of times the word appears in the codebase.

7. A code completion apparatus, comprising:

a grammar acquisition module configured to acquire n-gram sequences calculated from a code library according to code keywords, wherein the n-gram sequences include grammar sequences larger than a unary;

a prefix obtaining module configured to take out prefix words with the number of orders greater than the unary grammar sequence forward in a code sentence where a user cursor is located by taking the user cursor position as an initial position;

a grammar searching module configured to search a grammar prefixed by the prefixed word from the grammar sequence larger than the unary by using the prefixed word with the number of orders of the grammar sequence larger than the unary;

and the code completion module is configured to take a subsequent word of the prefix in the grammar as a code completion candidate corresponding to the n-gram if the grammar taking the prefix word as the prefix is found.

8. The apparatus of claim 7, further comprising:

a number ranking module configured to determine a ranking position of a word succeeding the prefix in the grammar among all the code completion candidates according to the number of times the grammar appears in the codebase.

9. The apparatus of claim 7, wherein the n-gram sequences comprise a unary grammar sequence and a grammar sequence greater than unary; the device further comprises:

and the unary searching module is configured to find the grammar taking the prefix words as prefixes from all the unary grammar sequences if the prefix words with the number of the orders of the unary grammar sequences are used, and take the words in the unary grammar sequences as the code completion candidates corresponding to the n-gram.

10. The apparatus of claim 9, wherein the prefix obtaining module comprises:

a high-order start submodule configured to take a highest order in the syntax sequence larger than the unary as a current order;

a prefix obtaining submodule configured to take out prefix words of the number of orders of the current-order grammar sequence forward in a code sentence where a user cursor is located by taking the user cursor position as an initial position;

the grammar searching module is configured to search a grammar with the prefix words as prefixes from the grammar sequence of the current order by using the prefix words with the number of the current order;

the code completion module comprises:

a current grammar code completion sub-module configured to, if a grammar with the prefix word as a prefix is found from a current-order grammar sequence, take a subsequent word of the prefix in each grammar with the prefix word as a code completion candidate corresponding to the n-gram;

the order updating sub-module is configured to update the order of the current order to the next order of the current order in the n-gram sequence and re-trigger the prefix obtaining sub-module to execute if no grammar taking the prefix word as the prefix is found in the grammar sequence of the current order and the next order of the current order is not the unary grammar sequence;

the unary search module is configured to, if a grammar with the prefix words as prefixes is not found in the grammar sequence of the current order by using the prefix words with the number of orders of the grammar sequence of the current order, and the next order of the grammar sequence of the current order is the unary grammar sequence, take the word set in the unary grammar sequence as the code completion candidate corresponding to the n-gram.

11. The apparatus according to any one of claims 7-10, further comprising:

a context state determination module configured to determine a context state of a location of the user cursor;

a code completion candidate search module configured to search for a code completion candidate corresponding to the context state;

and the merging module is configured to merge the code completion candidate item corresponding to the n-gram with the code completion candidate item corresponding to the context state to obtain a code completion candidate item complete set.

12. The apparatus of claim 11 wherein in the completion candidate set, the completion candidate for a word corresponding to the context state has a zero ranking weight, and in the completion candidate for an n-gram, the word found in a higher-than-unary grammar sequence has a ranking weight that is the number of times its grammar appears in the codebase, and the word found in a unary grammar sequence has a ranking weight that is the number of times the word appears in the codebase.

13. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-6 when executing the instructions.

14. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 6.