CN110673836B

CN110673836B - Code complement method, device, computing equipment and storage medium

Info

Publication number: CN110673836B
Application number: CN201910779909.4A
Authority: CN
Inventors: 鲁志强
Original assignee: Advanced New Technologies Co Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2019-08-22
Filing date: 2019-08-22
Publication date: 2023-05-23
Anticipated expiration: 2039-08-22
Also published as: CN110673836A

Abstract

The specification provides a code complement method, a device, a computing device and a storage medium, wherein the code complement method comprises the following steps: acquiring an n-gram sequence calculated from a code library according to code keywords, wherein the n-gram sequence comprises a grammar sequence larger than one element; in the code sentence where the user cursor is located, taking the position of the user cursor as a starting position, and taking out prefix words with a plurality of grammar sequence orders forwards; searching grammar with prefix words as prefixes from grammar sequences larger than one element by using prefix words with the number of grammar sequence orders; if the grammar taking the prefix word as the prefix is found, taking the subsequent word of the prefix in the grammar as a code complement candidate corresponding to the n-gram. The code complement candidates are predicted from probability based on the n-gram, the codes capable of being complemented can comprise codes with various meanings, even constants and the like, the code quantity input by a programmer is greatly reduced, and the code development efficiency is improved.

Description

Code complement method, device, computing equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a code complement method, a device, a computing device, and a storage medium.

Background

A code editor is a tool used by programmers to edit code.

In order to reduce the amount of code input by a programmer and improve code development efficiency, a code complement function may be provided in a code editor. Currently, code completion is that in a code editor, when a programmer edits a code, an available member function list is displayed according to a program context, and the programmer automatically completes after selecting the list. However, the code that the programmer needs to input includes far more than member functions determinable from the program context, codes of other meanings, constants, and the like. As the code development amount is larger and larger, the code input by a programmer is larger and larger, which hinders the code development efficiency.

Disclosure of Invention

In view of the foregoing, embodiments of the present application provide a code complement method, apparatus, computing device, and storage medium, so as to solve the technical drawbacks in the prior art.

According to a first aspect of an embodiment of the present application, there is provided a code complement method, including: acquiring an n-gram sequence calculated from a code library according to code keywords, wherein the n-gram sequence comprises a grammar sequence larger than one element; taking the position of the user cursor as a starting position in a code sentence where the user cursor is positioned, and taking out prefix words with the number of orders greater than the unitary grammar sequence forwards; searching grammar with the prefix words as prefixes from the grammar sequence with the size larger than the unit by using the prefix words with the size larger than the unit; if the grammar taking the prefix word as the prefix is found, taking the subsequent word of the prefix in the grammar as a code complement candidate corresponding to the n-gram.

Optionally, the method further comprises: and determining the ordering positions of the follow-up words of the prefix in the grammar in all code complement candidates according to the occurrence times of the grammar in the code base.

Optionally, the n-gram sequence includes a unitary gram sequence and a grammar sequence greater than unitary; the method further comprises the steps of: if the prefix words with the order number greater than the gram sequence of the unitary are used, not finding the grammar with the prefix words as prefixes from all the grammar sequences greater than the unitary, and taking the words in the grammar sequence of the unitary as code complement candidates corresponding to the n-gram.

Optionally, in the code sentence where the user cursor is located, taking the position of the user cursor as a starting position, the step of taking out the prefix words with the orders greater than the order number of the unitary grammar sequence forward includes: taking the highest order in the grammar sequence which is larger than the unary as the current order; and taking the position of the user cursor as a starting position in a code sentence where the user cursor is positioned, and taking out prefix words with a plurality of orders of the grammar sequence of the current order forwards. The searching the grammar using the prefix words as prefixes from the grammar sequence with the greater than unary includes: and searching grammar using the prefix words as prefixes from the grammar sequence of the current order by using the prefix words with the number of the orders of the current order. If the grammar using the prefix word as the prefix is found, the code complement candidate corresponding to the n-gram by using the subsequent word of the prefix in the grammar comprises the following steps: if the grammar taking the prefix word as the prefix is searched from the grammar sequence of the current level, taking the subsequent words of the prefix in the grammar taking the prefix word as the prefix as code complement candidates corresponding to the n-gram; and if grammar taking the prefix word as a prefix is not found in the grammar sequence of the current level and the next level of the current level is not a unitary grammar sequence, updating the level of the current level as the next level of the current level in the n-gram sequence, re-entering the code sentence where the user cursor is positioned, taking the position of the user cursor as a starting position, and taking the prefix words with the number of levels of the grammar sequence of the current level forward. If the prefix words with the order number of the grammar sequences larger than the unary are used, not finding the grammar using the prefix words as prefixes from all the grammar sequences larger than the unary, and taking the words in the unary grammar sequences as code complement candidates corresponding to the n-gram comprises the following steps: if the prefix words with the number of orders of the grammar sequence of the current order are used, grammar taking the prefix words as prefixes is not found in the grammar sequence of the current order, and the next order of the current order is a single grammar sequence, a word set in the single grammar sequence is used as a code complement candidate corresponding to n-gram.

Optionally, the method further comprises: determining the context state of the position of the user cursor; searching out code complement candidates corresponding to the context state; and merging the code complement candidate corresponding to the n-gram with the code complement candidate corresponding to the context state to obtain a code complement candidate complete set.

Optionally, in the code complement candidate total set, the ranking weight of the code complement candidate corresponding to the context state is zero, in the code complement candidate corresponding to the n-gram, the ranking weight of the word found in the grammar sequence greater than one gram is the number of times that the grammar in which the word is located appears in the code base, and the ranking weight of the word found in the grammar sequence is the number of times that the word appears in the code base.

According to a second aspect of embodiments of the present application, there is provided a code complement apparatus, including: and the grammar acquisition module is configured to acquire an n-gram sequence calculated according to the code keywords from the code library, wherein the n-gram sequence comprises a grammar sequence larger than one element. And the prefix acquisition module is configured to take the user cursor position as a starting position in the code statement where the user cursor is positioned, and forward take out the prefix words with the orders of the grammar sequence larger than the unary. And the grammar searching module is configured to search grammar prefixed by the prefix words from the grammar sequence which is larger than the unary by using the prefix words with the order number of the grammar sequence which is larger than the unary. And the code complement module is configured to take the following word of the prefix in the grammar as a code complement candidate corresponding to the n-gram if the grammar taking the prefix word as the prefix is found.

Optionally, the apparatus further comprises: and the number ordering module is configured to determine the ordering positions of the follow-up words of the prefix in the grammar in all code complement candidates according to the number of times the grammar appears in the code base.

Optionally, the n-gram sequence includes a unitary gram sequence and a grammar sequence greater than unitary; the apparatus further comprises: and the unary searching module is configured to use words in the unary grammar sequence as code complement candidates corresponding to n-gram if the prefix words with the prefix words as prefixes are not searched in all the grammatical sequences larger than unary.

Optionally, the prefix acquisition module includes: and the higher-order starting sub-module is configured to take the highest order in the syntax sequence larger than the unary as the current order. And the prefix acquisition sub-module is configured to take the user cursor position as a starting position in a code statement where the user cursor is positioned, and forward take out prefix words with the number of orders of the grammar sequence of the current order. The grammar searching module is configured to search grammar with the prefix words as prefixes from the grammar sequence of the current level by using the prefix words of the number of the orders of the current level. The code complement module includes: and the current grammar code complement sub-module is configured to take all the subsequent words of the prefix in the grammar taking the prefix word as code complement candidates corresponding to the n-gram if the grammar taking the prefix word as the prefix is searched from the grammar sequence of the current stage. And the order updating sub-module is configured to update the order of the current order to the next order of the current order in the n-gram sequence and re-trigger the prefix acquisition sub-module to execute if the grammar prefixed by the prefix word is not found in the grammar sequence of the current order and the next order of the current order is not a unitary grammar sequence. The unary searching module is configured to use a word set in the unary grammar sequence as a code complement candidate item corresponding to an n-gram if the prefix words with the prefix words as prefixes are not searched in the current-level grammar sequence and the next level of the current-level grammar sequence is the unary grammar sequence by using the prefix words with the number of levels of the current-level grammar sequence.

Optionally, the apparatus further comprises: and the context state determining module is configured to determine the context state of the position of the user cursor. And the code complement candidate searching module is configured to search out code complement candidates corresponding to the context state. And the merging module is configured to merge the code complement candidate item corresponding to the n-gram and the code complement candidate item corresponding to the context state to obtain a code complement candidate item complete set.

According to a third aspect of embodiments of the present application, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the code complement method when executing the instructions.

According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the code complement method.

In the embodiment of the application, the n-gram sequence collected from the code library according to the code keywords is obtained, the prefix words with the number of orders of the grammar sequence before the user cursor are taken out from the code statement where the user cursor is located, the grammar with the prefix words as the prefix is searched from the grammar sequence with the number of orders greater than unity, if the grammar with the prefix words as the prefix is searched, the subsequent words of the prefix in the grammar are used as code complement candidates corresponding to the n-gram, so that the code complement based on the n-gram of all codes of the code library is realized, and because the n-gram model is a probability grammar established on the Markov model, the candidate of the code complement is predicted from probability, the code complement can comprise member functions of all codes of the code library, codes with various meanings, even constants and the like, the code input by a programmer is greatly reduced, and the code development efficiency is improved.

Drawings

FIG. 1 is a block diagram of a computing device provided by an embodiment of the present application;

FIG. 2 is a flow chart of a code completion method provided by an embodiment of the present application;

FIG. 3 is a flow chart of a code completion method provided by another embodiment of the present application;

FIG. 4 is a flow chart of a code completion method provided by a further embodiment of the present application;

FIG. 5 is a flow chart of a code completion method provided in accordance with yet another embodiment of the present application;

FIG. 5a is a schematic diagram of finite state machine state circulation according to an embodiment of the present application;

FIG. 6 is a block diagram of a code completion apparatus provided by an embodiment of the present application;

fig. 7 is a block diagram of a code complement apparatus provided in another embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.

The terminology used in one or more embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of one or more embodiments of the application. As used in this application in one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

First, terms related to one or more embodiments of the present application will be explained.

n-gram: refers to n words that appear consecutively in text, e.g., when n is 1, 2, 3, respectively, also referred to as a unigram, a bigram, and a trigram, respectively.

An n-gram sequence is an n-gram that includes a plurality of grammars. For example, when n is 2, the binary grammar sequence includes a plurality of binary grammars. The n-gram model is a probabilistic language model based on (n-1) order markov chains, and thus the order of the n-gram sequence is n-1, e.g., the order of the binary grammar sequence is first order.

In the present application, a code complement method, apparatus, computing device, and storage medium are provided, and detailed description is given one by one in the following embodiments.

FIG. 1 illustrates a block diagram of a computing device 100, according to an embodiment of the present application. The components of the computing device 100 include, but are not limited to, a memory 110 and a processor 120. Processor 120 is coupled to memory 110 via bus 130 and database 150 is used to store data.

Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 140 may include one or more of any type of network interface, wired or wireless (e.g., a Network Interface Card (NIC)), such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present application, the above-described components of computing device 100, as well as other components not shown in FIG. 1, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device shown in FIG. 1 is for exemplary purposes only and is not intended to limit the scope of the present application. Those skilled in the art may add or replace other components as desired.

Computing device 100 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.

Wherein the processor 120 may perform the steps of the code completion method shown in fig. 2. Fig. 2 shows a flow chart of a code completion method according to an embodiment of the present application, including steps 210 to 240.

Step 210: and obtaining an n-gram sequence calculated from the code library according to the code keywords, wherein the n-gram sequence comprises a grammar sequence larger than one element.

For example, the n-gram sequence described in the embodiment of the present application may include: a unigram sequence, a bigram sequence, and a trigram sequence. The n-gram sequence may take higher order n-grams to get more accurate predictions, depending on implementation environment requirements. The sparse problem of the n-gram can be solved by adopting a smoothing algorithm related to the n-gram.

The method and the device can be applied to code completion of an SQL code editor, and the code library is an SQL code library. SQL, a specific purpose programming language, is used to manage a relational database management system or to perform stream processing in a relational stream data management system. Specifically, a known SQL code library can be collected, and all the unigram sequences U-Gram, the binary grammar sequences B-Gram and the ternary grammar sequences T-Gram are calculated by taking words such as keywords, variables and the like in the SQL code as units for the collected SQL code library. When the code complement is needed, the pre-calculated U-Gram, B-Gram and T-Gram are obtained.

For example, assume that SQL code is in a known code library as follows:

sql

select name from table2 where id = 100

the obtained n-gram sequence is as follows:

U-GRAM sequence: [ ' select ', ' name ', ' from ', ' table2', ' where ', ' id ', ' = '100' ]

B-GRAM sequence: [ (selected ', (name'), (name ', (from'), [ (from ', (table 2'), [ (table 2', (window'), [ (id ',') ], [ (= ', [ (100') ] ]

T-GRAM sequence: [ ' select ', ' name ', ' from ', [ (name ', ' from ', ' table2', [ (from ', ' table2', ' window ', [ ' Table2', ' sphere ', ' id ', [ (sphere ', ' id ', ' = ' ], [ (sphere ', ' id ', '100', ' ] ], and the like

Step 220: and taking the position of the user cursor as a starting position in a code sentence where the user cursor is positioned, and taking out the prefix words with the orders of the grammar sequences larger than the unary forward.

For example, the SQL statement S in which the user cursor is located may be distinguished first. The prefix word sequence SW (w 1, w2, … wn) before the user' S cursor is fetched from sentence S. The number of order prefix words is extracted from the prefix word sequence according to the grammar sequence order, for example, for a ternary grammar sequence, the extracted prefix words are (w (n-1), wn) according to the order being two. For a binary grammar sequence, according to the order being one, the prefix word is fetched as (w _n ）。

More specifically, for example, assuming that the sentence S is "select from table2 where", the word sequence SW is [ 'select', 'from', 'table2', 'where' ]. For a ternary grammar sequence, the prefix word that needs to be fetched from statement S is [ 'table2', 'sphere' ]. For a binary grammar sequence, the prefix word that needs to be extracted from statement S is [ 'where' ].

Step 230: and searching grammar with the prefix words as prefixes from the grammar sequence with the greater than unity by using the prefix words with the order number of the grammar sequence with the greater than unity.

For example, a grammar prefixed with (w (n-1), wn) may be found in the ternary grammar sequence T-Gram using the prefix word (w (n-1), wn). In the binary grammar sequence B-Gram, a grammar prefixed with (wn) is found using the prefix word (wn).

In the embodiment of the present application, the embodiment of searching for the grammar prefixed by the prefix word from the grammar sequence larger than the unary is not limited. For example, all grammar sequences greater than unity can be searched concurrently, so that all grammars matching the prefix can be searched out from each order grammar sequence. However, considering that the searching mode has large calculated amount and low efficiency, the embodiment of the application can start to search step by step from the highest level, once the grammar matched with the prefix is searched in a certain order grammar sequence, the subsequent words of the prefix in all the grammars matched with the prefix in the order grammar sequence are used as code complement candidates, and the searching is stopped, so that the searching amount of the searching mode is small and the efficiency is high.

Step 240: if the grammar taking the prefix word as the prefix is found, taking the subsequent word of the prefix in the grammar as a code complement candidate corresponding to the n-gram.

For example, if a grammar prefixed with (w (n-1), wn) is found in the ternary grammar sequence T-Gram, all subsequent words prefixed with (w (n-1), wn) are taken as code complement candidates.

In an embodiment of the present application, the sorting position of the following word of the prefix in the grammar in all the code complement candidates may be further determined according to the found number of times the grammar appears in the code library. By the implementation mode, the code complement candidates provided for the programmer have a certain priority order, so that the programmer can determine the codes to be complemented more easily and quickly.

As can be seen, in the embodiment of the present application, by acquiring an n-gram sequence collected from a code library according to a code keyword, extracting prefix words with a number of orders of the grammar sequence before a user cursor from code sentences where the user cursor is located, using prefix words with a number of orders, searching for a grammar using the prefix word as a prefix from the grammar sequence greater than unity, and if a grammar using the prefix word as a prefix is found, using a subsequent word of the prefix in the grammar as a code complement candidate corresponding to the n-gram, thereby implementing code complement based on n-gram of all codes of the code library. Because the n-gram model is a probability grammar established on the Markov model, the embodiment of the application predicts candidates for code completion from probability, and the codes capable of being completed can comprise member functions of all codes of a code base, codes with various meanings, even constants and the like, so that the code quantity input by a programmer is greatly reduced, and the code development efficiency is improved. For example, code complement is performed on the statement S "select from tablea where dt =" where the user cursor is located, and the user cursor is located after "=". Assuming that there are many sentences like "select id from table _xxx where dt= ' $ { yyymmdd } '" in the code base, a constant like "$ { yyyymmdd } '" can be obtained by searching for the code complement candidates based on the n-gram obtained from the code base.

For example, the core function of the one-stop data engineering platform is data research and development, wherein SQL code editing is a main mode of data research and development, and the accurate and intelligent SQL code supplementation can be performed according to the code supplementation method of the embodiment of the application, so that data research and development experience can be improved.

In an embodiment of the present application, considering that a grammar whose corresponding prefix word is a prefix may not be found in a grammar sequence greater than one element, in order to achieve code completion to a greater extent, the n-gram sequence in the embodiment of the present application may include a grammar sequence greater than one element and a grammar sequence greater than one element. If the prefix words with the order number greater than the gram sequence of the unitary are used, not finding the grammar with the prefix words as prefixes from all the grammar sequences greater than the unitary, and taking the words in the grammar sequence of the unitary as code complement candidates corresponding to the n-gram. In addition, the ordering position of the words in the unigram sequence in all the code complement candidates can be determined according to the occurrence times of the words in the code base.

In order to improve the searching efficiency, the embodiment of the application can start to search step by step from the highest level, once the grammar matched with the prefix is searched in a certain level grammar sequence, taking the subsequent words of the prefix in all the grammars matched with the prefix in the level grammar sequence as code complement candidates, stopping searching, and if the step by step search is performed until the last unitary grammar sequence, taking the words in the unitary grammar sequence as code complement candidates corresponding to n-gram.

In order to make this embodiment easier to understand, the following describes in detail the embodiment with reference to a flowchart of a code complement method according to another embodiment of the present application shown in fig. 3. As shown in fig. 3, steps 310 through 350 are included.

Step 310: and obtaining an n-gram sequence calculated from the code library according to the code keywords, wherein the n-gram sequence comprises a unitary gram sequence and a grammar sequence larger than the unitary.

Step 311: and taking the highest order in the syntax sequence larger than the unary as the current order.

Step 320: and taking the position of the user cursor as a starting position in a code sentence where the user cursor is positioned, and taking out prefix words with a plurality of orders of the grammar sequence of the current order forwards.

Step 330: and searching grammar using the prefix words as prefixes from the grammar sequence of the current order by using the prefix words with the number of the orders of the current order.

Step 331: judging whether grammar with the prefix word as the prefix is searched from the grammar sequence of the current stage.

Step 340: if the grammar taking the prefix word as the prefix is searched from the grammar sequence of the current level, taking all the subsequent words taking the prefix word as the prefix in the grammar taking the prefix word as the code complement candidates corresponding to the n-gram, and determining the sorting position of the subsequent words taking the prefix word as the prefix in all the code complement candidates according to the frequency of occurrence of the grammar in the code library.

Step 350: if grammar using the prefix word as prefix is not found in the grammar sequence of the current order, judging whether the next order of the current order in the n-gram sequence is a unitary grammar sequence.

For example, assuming that the acquired n-gram sequence is a four-gram sequence, a three-gram sequence, and a one-gram sequence, the current level is a three-gram sequence, and the next level of the three-gram sequence is a one-gram sequence. For another example, assuming that the obtained n-gram sequence is a ternary grammar sequence, a binary grammar sequence, and a unitary grammar sequence, the current level is a ternary grammar sequence, and the next level of the ternary grammar sequence is a binary grammar sequence.

Step 351: if not, the current level is updated to be the next level of the current level in the n-gram sequence, and step 320 is re-entered.

Step 352: and if the word set is a single grammar sequence, taking the word set in the single grammar sequence as a code complement candidate corresponding to the n-gram.

In this embodiment, the ordering position of the word in all the code complement candidates may also be determined according to the number of times the word in the unigram sequence appears in the code base.

In yet another embodiment of the present application, n-gram code completion and context-dependent state code completion are combined, and in order to make this embodiment easier to understand, the following describes the embodiment in detail with reference to a flowchart of a code completion method according to yet another embodiment of the present application shown in fig. 4. As shown in fig. 4, steps 401 to 450 are included.

Step 401: and determining the context state of the position of the user cursor.

It should be noted that, the method for determining the context state according to the embodiment of the present application is not limited, and the next state may be determined according to the current state and each token in the sentence where the user cursor is located. Specifically, for example, the finite state machine may define a circulation relationship of each state, and thus, a finite state machine state circulation method may be used to determine a context state of a position where a user cursor is located.

Step 402: and searching out the code complement candidates corresponding to the context state.

For example, the context states may be predefined, and corresponding code complement candidates may be preset, respectively. After the context state of the position of the user cursor is determined, code complement candidates corresponding to the context state of the position of the user cursor can be found out from the predefined corresponding relation.

Step 410: and obtaining an n-gram sequence calculated from the code library according to the code keywords, wherein the n-gram sequence at least comprises a grammar sequence larger than one element.

Step 420: and taking the position of the user cursor as a starting position in a code sentence where the user cursor is positioned, and taking out the prefix words with the orders of the grammar sequences larger than the unary forward.

Step 430: and searching grammar with the prefix words as prefixes from the grammar sequence with the greater than unity by using the prefix words with the order number of the grammar sequence with the greater than unity.

Step 440: if the grammar taking the prefix word as the prefix is found, taking the subsequent word of the prefix in the grammar as a code complement candidate corresponding to the n-gram.

Step 450: and merging the code complement candidate corresponding to the n-gram with the code complement candidate corresponding to the context state to obtain a code complement candidate complete set.

Therefore, the implementation mode can make code complement according to the context, give meaningful code complement to all common SQL sentence types, and make up the defect of making code complement according to the context by making use of n-gram to make code complement from the perspective of probability prediction.

In still another embodiment of the present application, in order to enable the code complement candidates to have a certain priority, the ranking weight of the code complement candidates corresponding to the context state is zero in the code complement candidate set, the ranking weight of the word found in the grammar sequence greater than unity in the code complement candidates corresponding to the n-gram is the number of times the grammar in which the word is found in the code library occurs, and the ranking weight of the word found in the grammar sequence of unity is the number of times the word occurs in the code library. According to the embodiment, the code complement candidates can be ranked according to the ranking weight, so that a programmer can easily and quickly determine codes needing to be complemented.

In order to make the embodiments of the present application easier to understand, a possible implementation manner in which some of the embodiments described above are combined is described in detail below.

Fig. 5 shows a flow chart of a code complement method according to a further embodiment of the present application. The code complement method takes n-gram sequence including unitary grammar sequence, binary grammar sequence and ternary grammar sequence as an example, and describes the code complement by combining the context, and the method comprises steps 501 to 570.

Step 501: and cutting out the statement S where the cursor of the user is positioned.

For example, assume that a code segment is as follows, where the user cursor stays at the last position of the code, and the statement S where the user cursor is located is cut out to be "select x from table2 where".

sql

select id from table1;

select * from table2 where

Step 502: and determining the context state of the position of the user cursor.

For example, all possible context states SC may be predefined, and the context state C of the position of the user cursor may be determined by a finite state machine state circulation method. A schematic diagram of the finite state machine state flow is shown in FIG. 5a, and the predefined context state flows according to the arrow direction.

To determine the context state SC of the sentence S where the user cursor is located, as shown in FIG. 5a, starting from the state "START", the first token in S is taken as "SELECT", and the state is switched from "START" to "SELECT" as shown in FIG. 5 a; taking the next token in S, namely' the state is unchanged; taking the next token in S, namely "FROM", and switching the state FROM "SELECT" to "FROM" as shown in fig. 5 a; taking the next token in S, namely 'TABLE 2', and switching the state FROM 'FROM' to 'TABLE_IDENT' as shown in FIG. 5 a; taking the next token in S, i.e. "WHERE", the state is switched from "TABLE_IDENT" to "WHERE" as shown in FIG. 5 a; at this point, the user cursor position is reached, so the context state C of the user cursor position is "WHERE". In this embodiment, the process of determining the next state, i.e., the method of flowing between states in a finite state, is based on the current state and the next token.

Step 503: and searching out the code complement candidate items corresponding to the context state, adding the code complement candidate items into the code complement candidate item set CS (1), and setting the sorting weight of all the code complement items in the CS (1) to be 0.

For example, for a predefined context state, corresponding code complement candidates may be respectively preset. After the context state is determined, the code complement candidate corresponding to the determined context state can be found. For example, the corresponding code complement candidates may be preset for the "START" state, including keywords such as "SELECT", "INSERT", "UPDATE", etc.; corresponding code complement candidates including physical table fields, functions, etc. may be preset for the "WHERE" state. In connection with the statement S example above, it may be found that the code complement candidate corresponding to the context state "WHERE" includes the SQL function (SUM, MAX, MIN, AVG, etc.), and a physical table field, i.e., a field in table2, (assumed to be id, name). Therefore, CS (1) determined according to the context state of the sentence S is [ ' SUM ', ' MAX ', ' MIN ', ' AVG ', ' id ', ' name ', '.

Step 510: and acquiring a unitary grammar sequence U-Gram, a binary grammar sequence B-Gram and a ternary grammar sequence T-Gram calculated from a code library according to the code keywords.

For example, assume that SQL code is in a known code library as follows:

sql

select name from table2 where id = 100

the obtained n-gram sequence is as follows:

Step 520: the sentence S is fetched the prefix word sequence SW (w 1, w2, … wn) before the user' S cursor.

For example, statement S is "select from table2 where," then word sequence SW is [ 'select', 'from', 'table2', 'where'.

Step 530: when the SW length is greater than 2, searching a ternary grammar taking (w (n-1), wn) as a prefix in a ternary grammar sequence T-Gram, if so, adding code complement candidate item sets CS (2) into all end words of the ternary grammar taking (w (n-1), wn) as the prefix in the ternary grammar sequence T-Gram, counting the occurrence times of the searched ternary grammar in a code base, and taking the counted times as the sequencing weight of the corresponding end words.

For example, SW length is 5, when the ternary grammar STG prefixed by [ 'table2', 'where', 'id' can be found in the T-GRAM sequence, the ending word is "id", and if it is counted that the ending word "id" appears once, CS (2) is [ 'id' ], where the ordering weight of "id" is 1.

Step 540: when the SW length is larger than 1 and no code complement candidate is found in the ternary grammar sequence, searching the binary grammar with (wn) as a prefix in the binary grammar sequence B-Gram, if the binary grammar is found, adding all end words of the binary grammar with (wn) as the prefix in the binary grammar sequence B-Gram into a code complement candidate set CS (2), counting the occurrence times of the found binary grammar in a code library, and taking the counted times as the sequencing weight of the corresponding end words.

Step 550: when the code complement candidate is not found in the binary grammar sequence, adding all words in the unitary grammar sequence U-Gram into the code complement candidate set CS (2), counting the times of occurrence of all the unitary grammars in a code library, and taking the counted times as the ordering weight of the corresponding words.

Step 560: for each Wi in CS (2), checking if Wi is in CS (1), adding Wi to CS (1) if Wi is not in CS (1), and setting the ordering weight of Wi to the same item in CS (1) if Wi is in CS (1).

For example, checking the "id" in CS (2) determines that it is in CS (1), the ranking weight of the code complement candidates "id" in CS (1) is set to 1, and the ranking weights of the remaining code complement candidates in CS (1) are set to 0.

Step 570: and (3) sorting all the code complement candidates in the CS (1) according to the sorting weight to obtain a final code complement candidate complete set CS.

For example, the code complement candidates in CS (1) are ranked according to the ranking weight, resulting in a ranked final code complement candidate corpus CS of [ ' id ', ' SUM ', ' MAX ', ' MIN ', ' AVG ', ' name ].

Corresponding to the above method embodiments, the present application further provides a code complement device embodiment, and fig. 6 shows a block diagram of the code complement device according to one embodiment of the present application. As shown in fig. 6, the apparatus includes: grammar retrieval module 610, prefix retrieval module 620, grammar lookup module 630, and code completion module 640.

The grammar retrieval module 610 may be configured to retrieve an n-gram sequence calculated from a code library according to code keywords, wherein the n-gram sequence includes a grammar sequence greater than unity.

The prefix obtaining module 620 may be configured to take forward the prefix words of the number of orders greater than the first gram sequence, with the user cursor position as a starting position, in the code sentence where the user cursor is located.

The grammar lookup module 630 may be configured to use the prefix words of the order number of the grammar sequences greater than unity to find a grammar prefixed by the prefix words from the grammar sequences greater than unity.

The code complement module 640 may be configured to, if a grammar prefixed by the prefix word is found, use a word subsequent to the prefix in the grammar as a code complement candidate corresponding to an n-gram.

As can be seen, in the embodiment of the present application, by acquiring an n-gram sequence collected from a code library according to a code keyword, extracting prefix words with a number of orders of the grammar sequence before a user cursor from code sentences where the user cursor is located, using prefix words corresponding to the number of orders, searching for a grammar using the prefix word as a prefix from the grammar sequence greater than unity, and if a grammar using the prefix word as a prefix is found, using a subsequent word of the prefix in the grammar as a code complement candidate corresponding to the n-gram, thereby implementing code complement based on n-gram of all codes of the code library. Because the n-gram model is a probability grammar established on the Markov model, the embodiment of the application predicts candidates for code completion from probability, and the codes capable of being completed can comprise member functions of all codes of a code base, codes with various meanings, even constants and the like, so that the code quantity input by a programmer is greatly reduced, and the code development efficiency is improved. For example, code complement is performed on the statement S "select from tablea where dt =" where the user cursor is located, and the user cursor is located after "=". Assuming that there are many sentences like "select id from table _xxx where dt= ' $ { yyymmdd } '" in the code base, a constant like "$ { yyyymmdd } '" can be obtained by searching for the code complement candidates based on the n-gram obtained from the code base.

Fig. 7 shows a block diagram of a code complement apparatus according to another embodiment of the present application. As shown in fig. 7, the apparatus further includes: the grammar number ordering module 650 may be configured to determine an ordering position of a subsequent word of the prefix in the grammar among all code complement candidates according to the number of times the found grammar appears in the code base. By the implementation mode, the code complement candidates provided for the programmer have a certain priority order, so that the programmer can determine the codes to be complemented more easily and quickly.

In an embodiment of the present application, considering that a grammar whose corresponding prefix word is a prefix may not be found in a grammar sequence greater than one element, in order to achieve code completion to a greater extent, the n-gram sequence in the embodiment of the present application may include a grammar sequence greater than one element and a grammar sequence greater than one element. In this embodiment, as shown in fig. 7, the apparatus further includes: the unary search module 660 may use the words in the unary grammar sequence as the code complement candidates corresponding to the n-gram if the prefix words with the order number greater than the unary grammar sequence are used, and the grammar with the prefix words as the prefix is not found in all the grammar sequences greater than the unary.

As shown in fig. 7, in this embodiment, the prefix acquisition module 620 may include: the higher-order start sub-module 621 may be configured to take the highest order in the syntax sequence greater than unity as the current order. The prefix acquisition sub-module 622 may be configured to forward extract, in a code sentence in which a user cursor is located, a number of prefix words of an order number of the grammar sequence of the current order, with the user cursor position as a starting position. The grammar lookup module 630 may be configured to use the number of prefix words of the number of orders of the current order to find a grammar prefixed by the prefix words from the grammar sequence of the current order. The code complement module 640 may include: the current grammar code completion sub-module 641 may be configured to, if a grammar prefixed by the prefix word is found from the grammar sequence of the current level, use all the subsequent words of the prefix in the grammar prefixed by the prefix word as code completion candidates corresponding to the n-gram. The order update sub-module 642 may be configured to update the order of the current order to be the next order of the current order in the n-gram sequence, and re-trigger the prefix acquisition sub-module 622 to execute if a grammar prefixed by the prefix word is not found in the grammar sequence of the current order and the next order of the current order is not a unary grammar sequence. The unary search module 660 may be configured to, if a prefix word with the prefix word as a prefix is not found in the current level grammar sequence using the prefix words with the number of levels of the current level grammar sequence, and the next level of the current level is the unary grammar sequence, use the word set in the unary grammar sequence as a code complement candidate corresponding to the n-gram.

In yet another embodiment of the present application, the n-gram code complement is combined with the context-dependent state code complement. As shown in fig. 7, in this embodiment, the apparatus may further include: the context state determination module 670 may be configured to determine a context state of the location of the user cursor. The code complement candidate lookup module 671 may be configured to find code complement candidates corresponding to the context state. The merging module 672 may be configured to merge the code complement candidate corresponding to the n-gram and the code complement candidate corresponding to the context state to obtain a code complement candidate complete set. The implementation mode not only can carry out code complementation according to the context, provides meaningful code complementation for all common SQL sentence types, but also can carry out code complementation from the aspect of probability prediction by utilizing n-gram, and makes up the defect of carrying out code complementation according to the context.

In still another embodiment of the present application, in order to enable the code complement candidates to have a certain priority, the ranking weight of the code complement candidates corresponding to the context state is zero in the code complement candidate set, the ranking weight of the word found in the grammar sequence greater than unity in the code complement candidates corresponding to the n-gram is the number of times the grammar in which the word is found in the code library occurs, and the ranking weight of the word found in the grammar sequence of unity is the number of times the word occurs in the code library.

An embodiment of the present application further provides a computing device including a memory, a processor, and computer instructions stored on the memory and executable on the processor, where the processor implements the steps of the code complement method when executing the instructions.

An embodiment of the present application also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the code complement method as described above.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the code complement method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the code complement method.

The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the patent practice, for example, the computer readable medium does not include an electric carrier signal and a telecommunication signal according to the patent practice.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The above-disclosed preferred embodiments of the present application are provided only as an aid to the elucidation of the present application. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of this application. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This application is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. A code complement method, comprising:

acquiring an n-gram sequence calculated from a code library according to code keywords, wherein the n-gram sequence comprises a grammar sequence larger than one element;

in the code statement where the user cursor is located, taking the position of the user cursor as a starting position, and taking out the prefix words with the orders of more than one gram sequence forwards, wherein the order of the n-gram sequence is n-1;

Searching grammar with the prefix words as prefixes from the grammar sequence with the size larger than the unit by using the prefix words with the size larger than the unit;

if the grammar taking the prefix word as the prefix is found, taking the subsequent word of the prefix in the grammar as a code complement candidate corresponding to the n-gram.

2. The method according to claim 1, wherein the method further comprises:

and determining the ordering positions of the follow-up words of the prefix in the grammar in all code complement candidates according to the occurrence times of the grammar in the code base.

3. The method of claim 1, wherein the n-gram sequence comprises a unary gram sequence and a syntax sequence greater than unary; the method further comprises the steps of:

if the prefix words with the order number greater than the gram sequence of the unitary are used, not finding the grammar with the prefix words as prefixes from all the grammar sequences greater than the unitary, and taking the words in the grammar sequence of the unitary as code complement candidates corresponding to the n-gram.

4. The method of claim 3, wherein the taking forward the prefix words of the order number of the syntax sequence greater than unity in the code sentence in which the user cursor is located, taking the user cursor position as a starting position, comprises:

Taking the highest order in the grammar sequence which is larger than the unary as the current order;

taking the position of the user cursor as a starting position in a code sentence where the user cursor is positioned, and taking out prefix words with a plurality of orders of the grammar sequence of the current order forwards;

the searching the grammar using the prefix words as prefixes from the grammar sequence with the greater than unary includes:

searching grammar using the prefix words as prefixes from the grammar sequence of the current order by using the prefix words with the number of the orders of the current order;

if the grammar using the prefix word as the prefix is found, the code complement candidate corresponding to the n-gram by using the subsequent word of the prefix in the grammar comprises the following steps:

if the grammar taking the prefix word as the prefix is searched from the grammar sequence of the current level, taking the subsequent word of the prefix in each grammar taking the prefix word as the prefix as the code complement candidate item corresponding to the n-gram;

if grammar using the prefix word as a prefix is not found in the grammar sequence of the current level and the next level of the current level is not a unitary grammar sequence, updating the level of the current level to be the next level of the current level in the n-gram sequence, re-entering the code sentence where the user cursor is positioned, taking the position of the user cursor as a starting position, and taking out prefix words with a plurality of levels of the grammar sequence of the current level forward;

If the prefix words with the order number of the grammar sequences larger than the unary are used, not finding the grammar using the prefix words as prefixes from all the grammar sequences larger than the unary, and taking the words in the unary grammar sequences as code complement candidates corresponding to the n-gram comprises the following steps:

if the prefix words with the number of orders of the grammar sequence of the current order are used, grammar taking the prefix words as prefixes is not found in the grammar sequence of the current order, and the next order of the current order is a single grammar sequence, a word set in the single grammar sequence is used as a code complement candidate corresponding to n-gram.

5. The method according to any one of claims 1-4, further comprising:

determining the context state of the position of the user cursor;

searching out code complement candidates corresponding to the context state;

and merging the code complement candidate corresponding to the n-gram with the code complement candidate corresponding to the context state to obtain a code complement candidate complete set.

6. The method of claim 5, wherein the ranking weight of the code complement candidate corresponding to the context state is zero in the code complement candidate corresponding to the n-gram, the ranking weight of the word found from the grammar sequence greater than unity is the number of times the grammar in which the word is found appears in the code base, and the ranking weight of the word found from the grammar sequence of unity is the number of times the word appears in the code base.

7. A code complement apparatus, comprising:

a grammar acquisition module configured to acquire an n-gram sequence calculated from a code library according to a code keyword, wherein the n-gram sequence includes a grammar sequence greater than unity;

the prefix acquisition module is configured to take the user cursor position as a starting position in a code statement where the user cursor is located, and forward take out the prefix words with the orders of the grammar sequence larger than unity, wherein the order of the n-gram sequence is n-1;

a grammar searching module configured to search a grammar prefixed by the prefix words from the grammar sequence larger than the unary by using the prefix words with the order number of the grammar sequence larger than the unary;

and the code complement module is configured to take the following word of the prefix in the grammar as a code complement candidate corresponding to the n-gram if the grammar taking the prefix word as the prefix is found.

8. The apparatus of claim 7, wherein the apparatus further comprises:

and the number ordering module is configured to determine the ordering positions of the follow-up words of the prefix in the grammar in all code complement candidates according to the number of times the grammar appears in the code base.

9. The apparatus of claim 7, wherein the n-gram sequence comprises a unary gram sequence and a syntax sequence greater than unary; the apparatus further comprises:

and the unary searching module is configured to use words in the unary grammar sequence as code complement candidates corresponding to n-gram if the prefix words with the prefix words as prefixes are not searched in all the grammatical sequences larger than unary.

10. The apparatus of claim 9, wherein the prefix acquisition module comprises:

a higher order initiation sub-module configured to take a highest order in the syntax sequence greater than unity as a current order;

a prefix acquisition sub-module, configured to take forward prefix words of a number of orders of the grammar sequence of the current order with the user cursor position as a starting position in a code sentence where the user cursor is located;

the grammar searching module is configured to search grammar with the prefix words as prefixes from the grammar sequence of the current level by using the prefix words with the number of the orders of the current level;

The code complement module includes:

the current grammar code complement sub-module is configured to take the following words of the prefix in each grammar taking the prefix word as the prefix as code complement candidates corresponding to n-gram if the grammar taking the prefix word as the prefix is found in the grammar sequence of the current stage;

the order updating sub-module is configured to update the order of the current order to the next order of the current order in the n-gram sequence and re-trigger the prefix acquisition sub-module to execute if the grammar prefixed by the prefix word is not found in the grammar sequence of the current order and the next order of the current order is not a unitary grammar sequence;

the unary searching module is configured to use a word set in the unary grammar sequence as a code complement candidate item corresponding to an n-gram if the prefix words with the prefix words as prefixes are not searched in the current-level grammar sequence and the next level of the current-level grammar sequence is the unary grammar sequence by using the prefix words with the number of levels of the current-level grammar sequence.

11. The apparatus according to any one of claims 7-10, wherein the apparatus further comprises:

The context state determining module is configured to determine the context state of the position of the user cursor;

the code complement candidate searching module is configured to search out code complement candidates corresponding to the context state;

and the merging module is configured to merge the code complement candidate item corresponding to the n-gram and the code complement candidate item corresponding to the context state to obtain a code complement candidate item complete set.

12. The apparatus of claim 11, wherein the ranking weight of the code complement candidate corresponding to the context state is zero in the code complement candidate corresponding to the n-gram, the ranking weight of the word found from the grammar sequence greater than unity is the number of times the grammar in which the word is found appears in the code base, and the ranking weight of the word found from the grammar sequence of unity is the number of times the word appears in the code base.

13. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor, when executing the instructions, implements the steps of the method of any of claims 1-6.

14. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 6.