US20210256122A1 - System and method for detecting source code anomalies - Google Patents
System and method for detecting source code anomalies Download PDFInfo
- Publication number
- US20210256122A1 US20210256122A1 US16/793,189 US202016793189A US2021256122A1 US 20210256122 A1 US20210256122 A1 US 20210256122A1 US 202016793189 A US202016793189 A US 202016793189A US 2021256122 A1 US2021256122 A1 US 2021256122A1
- Authority
- US
- United States
- Prior art keywords
- source code
- style
- style features
- predefined
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/72—Code refactoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
Definitions
- the present disclosure relates generally to establishing and maintaining source code. More particularly, in certain embodiments, the present disclosure is related to a system and method for detecting source code anomalies.
- Source code is programming code presented in a human-readable programming language (e.g., as opposed to binary machine code).
- a given program, or computing task may be implemented using source code.
- Specialized training and knowledge of a source code's programming language is generally required to both understand the function(s) of a given piece of source code and to create new code using the source code as a starting point to perform a desired task.
- a system in an embodiment, includes a source code repository configured to store source code entries created by a plurality of users. Each source code entry includes instructions in a programming language for performing a computing task.
- a style repository is configured to store a style profile for each of the plurality of users. Each style profile includes predefined style features associated with formatting characteristics of the stored source code entries for a corresponding user.
- a source code analyzer is communicatively coupled to the source code repository and the style repository.
- a processor of the source code analyzer receives, from a first user, a first source code which includes instructions in the programming language for performing a first computing task.
- First style features of the first source code are determined. The first style features include characteristics of a format of the first source code.
- the processor determines whether the first style features correspond to first predefined style features indicated by a first style profile associated with the first user. In response to determining that this is the case, the source code is stored in the source code repository. In response to determining that this is not the case, storage of the first source code in the source code repository is prevented.
- a system in yet another embodiment, includes a source code repository which stores source code entries, which include instructions in a programming language for performing computing tasks.
- a code generator receives, from a user, an input which includes a request in a natural language to perform a first computing task. Keywords are identified in the input. The keywords include a variable-associated keyword and a function-related keyword. Based on the identified keywords, code-line entries are determined which, when executed in an ordered combination, achieve the first computing task. The code-line entries include a variable-declaration entry, a function-definition entry, and a function-call entry. Based on the variable-associated keyword, one or more variables appearing in the source code repository are determined to declare in order to perform the first computing task.
- one or more functions appearing in the source code repository are determined to define and call to perform the first computing task.
- a custom code is generated, in the programming language, which includes a declaration of the determined variables at the variable-declaration entry, a definition of the determined function(s) at the function-definition entry, and a call to the determined function(s) using the declared variables at the function-call entry.
- This disclosure encompasses the recognition of previously unidentified problems associated with previous technology used to maintain collections of source code and adapting this source code to generate code to perform a desired task or function. For instance, previous approaches to storing source code generally relied on programmers to manually annotate code with comments and save the code in a fashion that allowed future use. However, different users tend to have different approaches to writing code in the programming language and formatting the code, resulting in source code entries that may be of limited use to others (i.e., because the purpose of the code is difficult or impossible to decipher). Using previous technology, multiple copies of the same or similar source code may be stored that perform the same function, resulting in inefficient use of computing resources.
- a preferred source code e.g., a most efficient source code, or a source code with a particular style or format
- Previous technology also fails to detect and correct problematic source code (e.g., whether the code is incorrectly formatted for a given task, includes inefficient protocols, or is intentionally malicious).
- Certain embodiments of the systems, methods, and device of this disclosure provide unique solutions to these newly recognized problems described above and other technical problems by facilitating the reliable storage of source code and the efficient generation of new, customized code.
- the disclosed system provides several technical advantages which include 1) automatic detection and correction of any anomalies in the source code prior to its storage for future use; 2) determination of natural language descriptions of source code (e.g., of “stories”), which can be easily interpreted even without specialized knowledge and training in a programming language; 3) the efficient and reliable generation of new source code for a custom task and with a user-specific style; and 4) the provision of candidate source code to a user's query to perform a given task and/or for source code related to a given entity or group with which the user is affiliated.
- the system described in this disclosure may improve the function of computer systems used to store source code for future use and generate new source code.
- the system may also or alternatively reduce or eliminate practical and technical barriers to repurposing existing source code for to perform new functions or tasks.
- the system described in this disclosure may particularly be integrated into a practical application for storing source code used to perform calculations using a first set of variables and/or functions, and automatically repurposing this code to perform the same or similar calculations using a second set of user-identified variables and/or functions without manually modifying the underlying programming language in the source code (i.e., without writing any code in a specialized programming language).
- Certain embodiments of this disclosure are related to a source code analyzer which determines whether newly provided code is appropriate for storage and future use as source code. For instance, style features can be extracted from the source code and used to identify anomalies in order to detect unapproved or malicious source code.
- the source code analyzer may generate a repository of natural language descriptions of source code, or “stories,” which may include specialized badges, or tags, which link portions of the descriptions (and/or the associated lines of the corresponding source code) to particular formulas, business units and the like. Examples of such embodiments are described below with respect to FIGS. 1-5 .
- Certain embodiments of this disclosure are related to a custom code generator which uses natural language inputs (e.g., commands) and/or other queries from a user to generate custom code. Generated code can be automatically customized according to the user input and adjusted to match a user's predetermined coding style (e.g., number and length of comments, spacing and indentation format, and the like). Examples of such embodiments are described below with respect to FIGS. 1 and 6-7 .
- natural language inputs e.g., commands
- Generated code can be automatically customized according to the user input and adjusted to match a user's predetermined coding style (e.g., number and length of comments, spacing and indentation format, and the like). Examples of such embodiments are described below with respect to FIGS. 1 and 6-7 .
- FIG. 1 is a schematic diagram of an example system for source code maintenance and generation, according to an illustrative embodiment of this disclosure
- FIG. 2 is a flow diagram illustrating an example operation of the style analyzer of the system illustrated in FIG. 1 ;
- FIG. 3 is a flow diagram illustrating the determination of code anomalies
- FIG. 4 is a flowchart of a method for operating the story generator of the system illustrated in FIG. 1 ;
- FIG. 5 illustrates example source code and example results generated at various steps of the method of FIG. 4 ;
- FIG. 6 is a flowchart illustrating an example method of operating the custom code generator of the system illustrated in FIG. 1 ;
- FIG. 7 illustrates examples of various elements associated with steps of the method of FIG. 6 ;
- FIG. 8 is a diagram of an example device configured to implement the system of FIG. 1 .
- this disclosure facilitates the efficient maintenance of a source code repository and, optionally, a story repository, which stores natural-language descriptions of stored source code (e.g., as described with respect to FIGS. 1-5 ).
- this disclosure includes a custom code generator which facilitates the generation of customized code in an efficient and user-friendly manner (e.g., as described with respect to FIGS. 1 and 6-7 )
- a natural language corresponds to a an established language (e.g., English) used for human-to-human communication.
- a programming language refers to a formalized text-based language which includes instructions for implementing functions and/or tasks using a computer. Examples of programming languages include C, C++, C #, Python, JAVA, HTML, and the like. These programming languages are provided for example only. This disclosure contemplates the use of any programming language.
- FIG. 1 is a schematic diagram of an example system 100 for source code maintenance and generation.
- the system 100 includes user devices 102 a,b , a source code analyzer 106 , a story repository 116 , a source code repository 122 , a style repository 126 , and a custom code generator 130 .
- the source code analyzer 106 of system 100 is generally configured to receive source code 108 from a particular user 104 a,b and detect any possible anomalies in the source code 108 before the source code 108 is stored in the source code repository 122 (e.g., as stored source code 124 ).
- the style analyzer 114 may detect anomalies associated with style features in the source code 108 , and, if an anomaly is detected, the source code 108 may be corrected prior to its storage in the source code repository 122 . Further examples of the implementation of the source code analyzer 106 are described below and with respect to FIGS. 2-5 .
- the custom code generator 130 is generally configured to receive a user input 132 , which includes instructions for performing desired computing tasks in a natural language, and generate a corresponding custom code 140 in an appropriate programming language for implementing the task. Further examples of the implementation of the custom code generator 130 are described below and with respect to FIGS. 6 and 7 .
- User devices 102 a,b are generally any computing devices operable to receive user input associated with source code 108 and communicate the source code 108 to the source code analyzer 106 .
- a user device 102 a,b may include an appropriate interface and input device for inputting a source code 108 .
- Source code 108 includes instructions in a programming language for performing a computing task (e.g., a calculation).
- source code 108 may include comments which are written in a natural language and provide context or a brief description of the purpose of certain lines or sections of the code 108 .
- User devices 102 a,b may also be operable to provide a user input 132 and/or user query 134 to the custom code generator 130 .
- each of the user devices 102 a,b may be a computer or a mobile device.
- device 102 a is associated with a first user 104 a
- user device 102 b is associated with a second user 104 b.
- whether source code 108 is provided to the source code analyzer 106 by the first computing device 102 a associated with the first user 104 a or the second computing device 102 b associated with the second user 104 b may determine how the source code 108 is analyzed and subsequently stored in the source code repository 122 (e.g. or prevented from being stored in the source code repository 122 ).
- whether user input 132 and/or query 134 is provided to the code generator 130 by the first computing device 102 a associated with a first user 104 a or the second computing device 102 b associated with a second user 104 b may determine how custom code 140 is generated (e.g., in an appropriate user-specific fashion).
- Devices 102 a,b may be implemented using the hardware, memory, and interface of device 800 described with respect to FIG. 8 below
- Source code analyzer 106 may be any computing device, or collection of computing devices, configured to receive source code 108 from user devices 102 a,b and analyze the source code 108 .
- the source code analyzer 106 may be configured to review received source code 108 , detect any anomalies in the source code, and correct the anomalies when possible/appropriate.
- the source code analyzer 106 may be implemented using the hardware, memory, and interface of device 800 described with respect to FIG. 8 below.
- the source code analyzer 106 may be implemented on a user device 102 a,b (e.g., using appropriate instructions stored in a memory of the device 102 a,b and executed by a processor of the device 102 a,b ).
- the source code analyzer 106 may be implemented using a separate device, or a collection of computing devices (e.g., configured as a server).
- the source code analyzer 106 may include a story generator 110 and a style analyzer 114 .
- the story generator 110 generally determines, for the source code 108 , a corresponding story 112 .
- the story 112 is a natural language description of instructions included in the source code 108 .
- the source code analyzer 106 may store the story 112 in the story repository 116 (e.g., as one of the stories 118 ).
- the generated story 112 is generally stored such that it is associated with the source code 108 . This allows the story 112 to be reviewed at a later time by a user 104 a,b and allows the user 104 a,b to identify the corresponding source code 108 .
- the story 112 may be determined in a first language (e.g., English) and subsequently translated to a more appropriate language for a given user 104 a,b (e.g., a preferred language for the user 104 a,b ).
- a first language e.g., English
- a more appropriate language for a given user 104 a,b e.g., a preferred language for the user 104 a,b
- Example implementation of the story generator 110 is described in greater detail below with respect to FIGS. 4-5 .
- the style analyzer 114 generally determines style features of the source code 108 and determines, based at least in part on these style features, whether to store the source code 108 in the source code repository 122 , modify the source code 108 prior to its storage, or whether to prevent storage of the source code 108 .
- the style analyzer 114 may detect anomalies in style features of the source code 108 (e.g., irregular use of comments, spaces, and/or punctuation in the source code and/or in the comments, e.g., changes to language in the comments or to the variable naming conventions).
- the style analyzer 114 may automatically edit the format of the source code 108 to correct the anomalies prior to storing the edited source code 108 in the source code repository 122 (e.g., as an entry of stored source code 124 ). If the anomalies are severe, the style analyzer 114 may prevent storage of the source code 108 . In some cases, the style analyzer 114 , rather than permanently preventing the storage of source code 108 with detected anomaly(ies), the source code 108 may be flagged for human review, and the source code may 108 may be prevented from being stored at least until results of such a review are received.
- Examples of detected anomalies include a length of indentations in the source code 108 that is outside of a predefined range, location of gap lines (i.e., empty lines of code) in the source code 108 not conforming to predefined conventions, a frequency of gap lines in the source code 108 that is outside of a predefined range, a frequency and/or location of punctuation in the source code 108 that does not conform to predefined conventions, a number of spaces following variables or other text in a line of the source code 108 that is outside a predefined range, and the like.
- location of gap lines i.e., empty lines of code
- the style analyzer 114 may determine and store style profiles 128 a,b for corresponding users 104 a,b in the style repository 126 .
- Style profiles 128 a,b generally store the predefined style features that have been determined for the corresponding users 104 a,b (e.g., based on previous code prepared by these users 104 a,b ).
- the style profiles 128 a,b may be used to aid in detecting anomalous source code 108 (e.g., if source code 108 received from a given user 104 a,b does not include style features which correspond to those of that user's style profile 128 a,b ) and to generate custom code using the custom code generator 130 (described further below).
- Example implementation of the style analyzer 114 is described in greater detail below with respect to FIGS. 2-3 .
- the story repository 116 is generally a data store, or database, configured to store stories 118 (e.g., natural-language descriptions of the source code 124 stored in the source code repository 122 ).
- stories 118 may include the story 112 generated for the source code 108 along with descriptions of other source code 124 previously received by the source code analyzer 106 , as described briefly above and in greater detail below with respect to FIGS. 4-5 .
- Each entry of source code 124 may have a corresponding story 118 in the story repository 116 .
- the story repository 116 may also store summaries 120 of the stories (e.g., more succinct versions of the stories 118 ).
- story repository 116 storing information (e.g., stories 118 and/or summaries 120 ) arranged in any appropriate format.
- the story repository 116 may be stored in memory of a dedicated device and/or in a memory of one or more of the user devices 102 a,b , source code analyzer 106 , and custom code generator 130 .
- the story repository 116 may be implemented using the hardware, memory, and interface of device 800 described with respect to FIG. 8 below.
- the story repository 116 may provide further insights for improving the efficiency associated with storing source code 124 in the source code repository 122 .
- source code 124 with the same or similar stories 118 , or summaries 120 may be associated with one another.
- Such related source code may be flagged for review to identify differences in the source code 124 and/or determine a preferred entry of source code 124 to use in the future.
- a preferred code 124 may be retained in the source code repository 122 , while a non-preferred entry of source code 124 with the same or a similar story 118 (e.g., less efficient code for performing the same task) may be discarded.
- the source code repository 122 is generally a data store, or database, configured to store source code 124 .
- Source code 124 may include the source code 108 as received or as-edited by the source code analyzer 106 , as described briefly above and in greater detail below with respect to FIG. 2-3 .
- the source code repository 122 also stores previously received source code 124 .
- This disclosure contemplates source code repository 122 storing information (e.g., source code 124 ) arranged in any appropriate format.
- the source code repository 122 may be stored in memory of a dedicated device and/or in a memory of one or more of the user devices 102 a,b , source code analyzer 106 , and the custom code generator 130 .
- the source code repository 122 may be implemented using the hardware, memory, and interface of device 800 described with respect to FIG. 8 below.
- the style repository 126 is generally a data store, or database, configured to store style profiles 128 a,b for users 104 a,b .
- the style repository 126 may be implemented using the hardware, memory, and interface of device 800 described with respect to FIG. 8 below.
- Each style profile 128 a,b is generally associated with a corresponding user 104 a,b and reflects the formatting conventions commonly used by the users 104 a,b when writing in the programming language used to prepare source code 108 .
- each style profile 128 a,b generally includes predefined style features associated with how the users 104 a,b prepare (or are expected to prepare) source code 108 .
- the style profiles 128 a,b may store user-specific features such as the length and/or frequency of indentations in the source code 108 by the corresponding user 102 a,b when writing in the programming language, location of gap lines in source code 108 , frequency of gap lines in source code 108 generated by the corresponding user 102 a,b when writing in the programming language, the frequency and/or location of punctuation (e.g., colons, semicolons) and/or use of capitalization in comments in the source code 108 by the corresponding user 102 a,b when writing in the programming language, the frequency and/or location of comments (e.g., before functions, after variable declarations), and a number of gaps (i.e., empty lines) following a line in in the source code 108 prepared by the corresponding user 102 a,b when writing in the programming language.
- user-specific features such as the length and/or frequency of indentations in the source code 108 by the corresponding user 102 a,b when writing in the
- the style profiles 128 a,b may also include threshold ranges by which a style feature can differ from a predefined style feature for the user 104 a,b before an anomaly is detected.
- the implementation of style profiles 128 a,b is described in greater detail with respect to FIG. 3 below.
- the style profiles 128 a,b may be determined using a number of source code entries (e.g., stored as entries 124 in the source code repository 122 ) prepared by the users 104 a,b over a period of time (e.g., weeks or months). For instance, the style profiles 128 a,b may be determined using a set of heuristics and/or using appropriate method of machine learning.
- This disclosure contemplates style repository 126 storing information (e.g., style profiles 128 a,b ) arranged in any appropriate format.
- the style repository 126 may be stored in memory of a dedicated device and/or in a memory of one or more of the user devices 102 a,b , source code analyzer 106 , and custom code generator 130 .
- the custom code generator 130 is generally configured to receive a user input 132 , which includes text in a natural language (e.g., English or any other appropriate language for the users 104 a,b ), and generate corresponding custom code 140 .
- the user input 132 may include a description of a computing task a user 104 a,b desires the source code 140 to perform.
- the code writer 136 may use information in the story repository 116 and/or the source code repository 122 to identify and modify, as needed, portions of the stored source code 124 to generate custom code 140 .
- the code writer 136 may identify keywords in the user input 132 that are linked with portions of stories 118 and provide the corresponding source code 124 to the user 104 a,b .
- the code writer 136 may use the source code 124 that corresponds to this portion of the story 118 in order to write the custom code 140 .
- the custom code generator 130 may also include a style modifier 138 , which is generally configured to edit (e.g., or “fix”) the style of code generated by the code writer 136 such that custom code 140 has a style that is aligned with the user's style profile 128 a,b .
- the style modifier 138 generally employs the style profiles 128 a,b to perform such modifications.
- the source code generator 130 facilitates the efficient and reliable repurposing of stored source code 124 , which may be associated with a first task or function (e.g., for performing calculations using a first set of variables and/or functions associated with a first entity or business unit), into a custom code 140 , which is configured for a different task or function (e.g., for performing calculations using a second set of variables and/or functions associated with a second entity or business unit) without requiring any technical or programming expertise from the user 104 a,b who provided the natural-language input 132 .
- An example operation of the custom code generator is described in greater detail below with respect to FIG. 6 .
- the custom code generator 130 may be implemented using the hardware, memory, and interface of device 800 described with respect to FIG. 8 below
- a user 104 a,b provides source code 108 to the source code analyzer 106 for storage in the source code repository 122 .
- the style analyzer 114 determines whether the source code 108 meets certain criteria for storing the source code in the source code repository 122 . For instance, the style analyzer may determine whether style features of the source code 108 correspond to the expected style features indicated by the user's style profile 128 a,b . An example of this is described with respect to FIG. 3 below. If the style features are not within an expected range, the code 108 may be edited so that the style of the code 108 is brought into accordance with the user's style profile 128 a,b before the code 108 is stored in the source code repository 122 .
- code 108 may be flagged for further review and storage of the code 108 may be prevented at least for a period of time (e.g., at least until results of administrator review are received indicating the code 108 is approved for storage).
- the source code analyzer 106 may also or alternatively determine a natural language description, or story 112 , for the source code 108 .
- the story 112 may be stored in the story repository 116 for future use, for example, by the custom code generator 130 . Further examples of the operation of the source code analyzer 106 are described below with respect to FIGS. 2-5 .
- a natural-language user input 132 is provided by a user 104 a,b to the custom code generator 130 .
- the code writer 136 may use stories 118 from the story repository 116 and source code 124 from the source code repository 122 to generate custom code 140 , based on the user input 132 . For instance, keywords identified in the user input 132 may be matched to those of the stories 118 . Source code 124 associated with the matching stories 118 may be appropriately combined to generate the custom code 140 .
- the style modifier 138 uses the style profiles 128 a,b to modify the style of the custom code 140 such that it matches a predefined programming style for the user 104 a,b (e.g., in accordance with style profiles 128 a,b ).
- the user input 132 may further include feedback to the custom code generator 130 , which may be used to improve performance of the code writer 136 and/or style modifier 138 .
- a user 104 a,b may further edit the custom code 140 by providing a user query 134 , which includes a search phrase or other request to identify appropriate existing source code 124 to include in the custom code 140 . Further examples of the operation of the custom code generator 130 are described below with respect to FIGS. 6 and 7 .
- FIG. 2 shows a flow diagram 200 illustrating example operation of the style analyzer 114 of the source code analyzer 106 .
- the style analyzer 114 receives previously stored code 202 a associated with user 104 a and previously stored code 202 b associated with user 104 b .
- the previously stored code 202 a,b may be received from the source code repository (i.e., the code 202 a,b may be included in the stored source code 124 of FIG. 1 )
- Stored source code 202 a may correspond to a first set of source code (e.g., instructions written in a programming language) associated with (e.g., generated by) first user 104 a of FIG.
- a first set of source code e.g., instructions written in a programming language
- stored source code 202 b may correspond to a second set of source code (e.g., instructions written in a programming language) associated with (e.g., generated by) second user 104 b of FIG. 1 .
- the style analyzer 114 uses the previously stored code 202 a,b to determine style profiles 128 a,b for the users 104 a,b . As explained further below, these style profiles 128 a,b may be employed by the style analyzer 114 to evaluate new source code 204 a,b received from users 104 a,b.
- Style extraction 206 generally involves the determination of style features 210 a,b for the stored code 202 a,b associated with the users 104 a,b .
- style extraction 206 may involve determining style features 210 a,b prevalent in (e.g., commonly found in) the source code 202 a,b .
- the style features 210 a,b may include one or more of a length of indentations in the source code 202 a,b , location of gap lines (e.g., whether empty lines are left after comments, calls to functions, or the like) in the source code 202 a,b , a frequency of gap lines (e.g., how frequently empty lines are found in) the source code 202 a,b , a frequency and/or location of punctuation in the source code 202 a,b (e.g., how often periods, commas, semicolons, and the like appear in the source code 202 a,b and/or whether such punctuation is commonly found in comments, calls to functions, following variables, etc.), and the like.
- the style features 210 a,b are not limited to these example features and may include any other appropriate features associated with a format or style of source code 202 a,b.
- style analyzer 114 proceeds to creation 208 of style profiles 128 a,b .
- Profile creation 208 involves associating the determined style features 210 a,b with a user identifier 212 a,b for the user 104 a,b who generated the associated stored code 202 a,b .
- the style profiles 128 a,b are generally stored in the style repository 126 , such that this information is available for future use, for example, by the style analyzer 114 and the custom code generator 130 (see FIG. 1 ).
- the style analyzer 114 may proceed with style extraction 206 , similarly to as described above. For example, the style analyzer 114 may determine new style features 210 a,b for the received source code 204 a,b . The style analyzer 114 then makes a determination 214 of whether an anomaly is detected in the source code 204 a,b . The determination 214 may employ machine learning or artificial intelligence to determine whether the new code 204 a,b has a style that corresponds to that of the appropriate style profile 128 a,b and can, thus, reliably be stored in the source code repository 122 .
- a machine learning model may be trained based on the previous source code 202 a,b (i.e., and any other appropriate source code 124 associated with the style profile 128 a,b ). Also or alternatively determination 214 may involve one or more heuristics or rules to determine if the new code 204 a,b has a style that corresponds to that of the appropriate style profile 128 a,b or an anomaly (e.g., a style anomaly) is detected.
- an anomaly e.g., a style anomaly
- FIG. 3 is a diagram 300 illustrating an example of anomaly determination 214 in greater detail.
- a newly determined style feature 302 is compared to a corresponding predefined style feature 304 .
- the determined style feature 302 may be any of the example style features 210 a,b described above, or any other appropriate feature associated with the formatting of the new code 204 a,b .
- the predefined style feature 304 may be one of the style features 210 a,b for the user 104 a,b who provided the new source code 204 a,b being analyzed (see also FIGS. 1 and 2 ).
- a comparator 306 is used to compare the determined style feature 302 to the corresponding predefined style feature 304 in order to determine a feature difference 308 (e.g., an extent to which the determined feature 302 is different from the predefined style feature 304 ).
- the feature difference 308 may correspond, for example, to a value by which another value associated with the determined style feature 302 is different from a value associated with the predefined style feature 304 .
- the feature difference 308 is compared to a threshold range 310 via a second comparator 312 to determine whether the difference 308 is within a threshold range 310 .
- the threshold range 310 generally corresponds to an amount that the determined feature 302 can differ from the predefined feature 304 .
- the threshold range 310 for a given feature type may be different for each user 104 a,b (e.g., as determined by the style profiles 128 a,b ). For instance, if the determined style feature 302 indicates that the new code 204 a,b of FIG.
- the threshold range 310 may be a range from negative one to positive one (i.e., indicating that an expected number of gap lines for the user 104 a,b (e.g., as indicated by the user's style profile 128 a,b ) may be the value associated with the predefined style feature 304 plus or minus one).
- the feature difference 308 of two gap lines is not within the threshold range 310 , and, therefore, the feature 302 fails to correspond to the user's style profile 128 a,b , resulting in an anomaly determination 314 that is positive.
- the comparator 312 determines that the feature 302 has a negative anomaly determination 314 (i.e., an anomaly is not detected for the feature 302 ).
- a negative anomaly determination 314 generally indicates that the feature 302 is in agreement with the user's style profile 128 a,b , and an anomaly is not detected at determination 214 of FIG. 2 .
- the anomaly determination 314 is positive, indicating that the feature 302 is not in agreement with the user's style profile 128 a,b , and an anomaly is detected at determination 214 of FIG. 2 .
- a plurality of features 302 for a given entry of new code 204 a,b are evaluated according to the process illustrated in FIG. 3 .
- at least a minimum number of features 302 must be within the threshold as determined by comparator 312 in order for an anomaly not to be detected at determination 214 of FIG. 2 .
- at least 80% of the features 302 may need to have a negative anomaly determination 314 in order for an anomaly not to be detected at determination 214 of FIG. 2 . If fewer than the minimum number of features 302 has a negative anomaly determination 314 , an anomaly is detected at step 214 of FIG. 2 .
- the style analyzer 114 proceeds to storage 216 of the code 204 a,b .
- the code 204 a,b is generally stored in the source code repository 122 of FIG. 1 , for example, such that the new code 202 a,b is subsequently available to aid in the generation of custom code 140 by the source code generator 130 , as described in greater detail below.
- the style analyzer 114 may provide an alert 218 indicating review of the code 204 a,b is needed. For instance, having been determined to be anomalous, the code 204 a,b may be provided to an administrator for review. The administrator may determine whether the code 204 a,b is acceptable (e.g., whether anomalies in the code 204 a,b are associated with malicious intent (not acceptable) or whether detected anomalies are associated with error or some other non-malicious intent. The results 220 of this review may be used to determine whether the style analyzer 114 should proceed to prevention 222 of storage of the source code 204 a,b or to editing 224 the source code 204 a,b .
- the determination 214 may provide further instructions for determining if the code 204 a,b is acceptable at 220 for storage 226 after being edited 224 or if the style analyzer 114 should prevent 222 storage of the code 204 a,b.
- the style analyzer 114 may automatically edit (e.g., “fix”) 224 the source code 204 a,b .
- the code 204 a,b may be edited such that the feature difference 308 is brought back within the threshold range 310 .
- the style analyzer 114 may modify 224 the code 204 a,b such that two gap lines are added after a function call.
- the style analyzer 114 then stores 226 the edited code 204 a,b in the source code repository 122 (e.g., as an entry of the source code 124 of FIG. 1 ).
- the above-described edits 224 to the code 204 a,b may be performed following a positive determination 214 of a style anomaly (e.g., in response to determining that a determined feature difference 308 of FIG. 3 is outside the corresponding threshold range 310 ).
- the style analyzer 114 prevents storage 222 of the first source code in the source code repository 122 .
- the style analyzer 114 may determine that prevention 222 of code storage is appropriate based on one or more of the number of style feature differences 308 of FIG. 3 that are not within the corresponding threshold ranges 310 , the extent to which one or more of the feature differences 308 depart from the corresponding acceptable threshold ranges 310 , and the like.
- At least two determined style features 302 of a given code 204 a,b must fail the comparison performed by comparator 312 of FIG. 3 in order for the style analyzer 114 to automatically prevent storage 222 of the code 204 a,b .
- both the number of gap lines following a call to function and the length of comments may have to be outside a predefined range in order for a positive anomaly determination 314 to be made.
- a feature difference 308 of FIG. 3 must be outside the corresponding threshold range 310 by a minimum amount in order to proceed to prevention 222 of storage of the code 204 a,b .
- a determined feature 302 and a predefined feature 304 are four (e.g., if a code 204 a,b included six gap lines following a call to a function rather than the expected two gap lines for that user 104 a,b ), and the threshold range is from negative one to one, this example feature difference 308 of four would be outside the threshold range 310 by greater than a minimum amount of three.
- the style analyzer 114 may detect entries of source code 124 which have been intentionally altered (e.g., maliciously altered) and stored in the source code repository 122 . For instance, the style analyzer 114 may intermittently check the stored source code 124 and identify inconsistencies or changes in the source code 124 over time. For instance if a given entry of the stored source code 124 has no or less than a threshold number of anomalies (see FIG. 3 and corresponding description above) at a first time stamp and an increase in anomalies is detected at a second time stamp after the first time stamp, the style analyzer 114 may flag this entry of source code 124 for further review.
- a threshold number of anomalies see FIG. 3 and corresponding description above
- the style analyzer 114 may change a permission flag on this entry of the source code 124 to prevent use of the code until it has passed further review. For instance, an altered permission of this entry of source code 124 may prevent the source code 124 from being used by the custom code generator 130 (described in greater detail below). This may provide further improvements to the security and reliability of the stored source code 124 .
- the style analyzer 114 may search for personal information that is included in the stored source code 124 . For instance, the style analyzer 114 may search for and flag any personal user information (e.g., user names, addresses, account numbers). This information may be automatically removed if not necessary for implementation of the code 124 . Also or alternatively, this information may be automatically anonymized to prevent its compromise. This may provide further improved data security to the source code analyzer 106 of FIG. 1 .
- personal user information e.g., user names, addresses, account numbers
- the style analyzer 114 may search for keywords associated with known problems in the source code 124 . For instance, the style analyzer 114 may search predefined words and/or phrases such as “to do,” “fix me,” “please fix,” and the like. An administrator may identify such terms commonly used by users 104 a,b to identify that a portion of code 124 is not complete or requires attention. These terms may be searched for, and any stored code 124 containing these terms may be flagged for further review and/or correction. In some embodiments, the style analyzer 114 may detect unused and/or redundant objects or functions in stored source code 124 . These unused and/or redundant items may be automatically removed from the source code 124 , thereby making both the source code repository 122 and the stored source code 124 more efficient.
- FIG. 4 is a flowchart of an example method 400 of story generation.
- the story generator 110 may implement method 300 to generate the story 112 of FIG. 1 .
- the method 400 generally facilitates the determination of a corresponding description, in a natural language, of the instructions included in the source code 108 for performing a task or function and the subsequent storage of this natural-language description, or story 112 , in the story repository 116 .
- Method 400 may begin at step 402 where source code 108 is received by the story generator 110 .
- a user 104 a,b may provide the source code 108 to the source code analyzer 108 , as described above with respect to FIG. 1 .
- the story generator 110 determines, for each line of the source code 108 , a badge associated with a programming task.
- a badge may be associated with a description of the programming function associated with the line of the source code 108 , or the information included in the line of the source code 108 .
- FIG. 5 shows an example code portion 502 , which may be included in source code 108 .
- Each line of code portion 502 has a corresponding line description 504 .
- the comment at the top of code portion 502 has a corresponding line description 504 of “Headline,” while the second comment in the code portion 502 has a corresponding line description 504 of “Comment Line.”
- the story generator 110 determines these descriptions 504 and uses them to determine a corresponding intelligent badge 508 for each line of the source code 108 .
- functions appearing in the source code 108 are replaced with predefined text which describes the functions.
- an equal sign when used to define a variable value in the source code 108 , may be replaced with the text “is assigned as.”
- the equal sign may be replaced with a phrase such as “is calculated as,” “is computed as,” or the like. This facilitates the transformation of otherwise abstract functions and arithmetic symbols into readily interpretable natural language.
- the intelligent badges 508 are illustrated in bold and italic font.
- step 408 the story generator 110 replaces variable names with predefined variable text.
- FIG. 5 illustrates the results 516 of replacing variables 510 , 512 , 514 with corresponding text 518 , 520 , 522 at step 408 .
- step 408 of FIG. 4 may involve replacing the “var_asset” variable 510 and “fee_rate” variable 512 in the code portion 502 with corresponding text descriptions of “variable asset” 518 and “fee rate” 520 , as shown in the progression from results 506 to results 516 in FIG. 5 .
- results 506 is transformed into “result is computed as variable asset multiplied to fee rate” in results 516 of step 408 .
- the story generator 110 removes the badges to generate a natural language story 112 for the original source code 108 .
- FIG. 5 illustrates the results 524 of step 410 .
- Results 524 are an example of a story 112 , or a portion of a story 112 .
- the badges 508 are retained in the story (e.g., such that the results 516 are included in the sty 112 ). In such cases, all of the results 124 (i.e., rather than only line 526 ) may be retained as the summaries 120 .
- Retaining the badges 508 in the story 112 may be beneficial for operation of the custom code generator 130 , because the badges 508 can be used to more effectively associate stories 118 to keywords in the user input 132 and find appropriate stored source code 124 as a starting point for generating custom code 140 , as described in greater detail below.
- the story generator 110 stores the resulting story 112 in the story repository 116 .
- the results 524 (e.g., the story 112 ) may include a summary portion 526 , which may be stored as one of the summaries 120 of FIG. 1 .
- the summary portion 526 generally provides a high level and readily searchable overview of the function of the source code portion 502 .
- the custom code generator 130 facilitates the reliable and user-friendly generation of custom code 140 based on natural language input 132 .
- the custom code 140 may include instructions written in any appropriate programming language for performing one or more user-desired tasks or functions.
- the user input 132 generally involves little or no previous knowledge from the users 104 a,b of the programming language of the custom code 140 .
- a user query 134 may be received by the custom code generator 136 and used to identify stories 118 which are related to the query 134 . If a user selects one of the identified stories 118 , the stored source code 124 that is associated with the selected story 118 may be provided to the user 104 a,b . This may further facilitate the efficient generation of custom code 140 for performing desired computing tasks or functions.
- FIG. 6 is a flowchart of an example method 600 of generating custom code 140 using the custom code generator 130 of FIG. 1 .
- the method 600 may be performed by the custom code generator 130 using the code writer 136 and/or style modifier 138 .
- the method 600 may begin at step 602 where a natural-language user input 132 is received by the custom code generator 130 .
- the input 132 generally includes a description of a computing task or function which a user 104 a,b wishes to perform.
- the input 132 may also include an indication of a programming language in which to generate the custom code 140 .
- the custom code generator 130 may use any appropriate natural language processing algorithm to process the user input 132 , split the input 132 into subsections (e.g., split paragraphs into sentences or portions of sentences), and/or tag keywords in the input 132 .
- FIG. 7 illustrates a portion 702 of a natural language user input 132 .
- This example input portion 702 includes certain tagged keywords and phrases 704 , 706 , 708 , and 710 , which are used by the custom code generator 130 to generate custom code 140 using method 600 .
- the custom code generator 130 may determine code-line entries to include in the custom code 140 , based on the received natural-language input 132 . For instance, words, phrases, or combinations of both included in the user input 132 may be used to determine code-line entries which should be included in the custom code 140 .
- FIG. 7 illustrates example code-line entries 712 to include in a custom code 140 generated based on input portion 702 .
- the code-line entries 712 include a headline entry 714 , a variables declaration entry 716 , a function definition entry 718 , and a function call entry 720 .
- the custom code generator may include a headline entry 714 in custom code 140 such that an initial comment line is provided that describes the use and/or operation of the custom code 140 .
- the custom code generator 130 may determine that variable declarations 714 should be included based on the identification of keywords 706 and 708 (i.e., “fees” and “variable assets”) in the input portion 702 .
- keywords 706 and 708 may be associated with predefined variables by the custom code generator 130 .
- the custom code generator 130 may determine that function definition 718 should be included based on the identification of keywords 704 and 706 (i.e., “calculate” and “fees”).
- Verbs, such as “calculate,” appearing in the input portion 702 may be associated with functions used to perform actions associated the verbs (i.e., calculations in this example).
- the custom code generator 130 may determine that a function-call entry 720 should be included in order to execute the defined for the declared variables.
- an intelligent badge is determined for each code-line entry determined from the user input 132 .
- Examples of intelligent badges 308 are illustrated in FIG. 3 .
- FIG. 7 also illustrates example badges included in each code-line entry 714 , 716 , 718 , 720 .
- Badges may be used, for example, to more efficiently locate related stories 118 in the story repository 116 .
- variable-related words or phrases are identified in the user input 132 and used to determine appropriate variables variable values to use in the custom code 140 being generated.
- the custom code generator 130 may access information stored in the story repository 116 , the source code repository 122 , and/or the style repository 126 to determine appropriate variable names and values to include in the custom code 140 .
- the “variable asset” keyword 708 may be associated with a “var_asset” variable 722 .
- the custom code generator 130 may further determine a variable value 724 of ten for the “var_asset” variable 722 .
- the custom code generator 130 may determine a calculation 726 associated with the “fee” keyword 706 . This calculation 726 includes a further “fee rate” variable 728 , which has an associated variable value 730 of fifteen.
- the values 724 and 730 may be determined based on the user 104 a,b who provided the user input portion 702 .
- the tagged “my group” phrase 710 of input portion 702 may be used to associate the variables 722 and 728 with the appropriate values 724 and 730 for the user 104 a,b or the user's group (e.g., an entity or business group with which the user 104 a,b is associated).
- the custom code generator 130 determines functions to provide in place of function-related text identified in the user input 132 .
- the source code generator 130 may identify certain words, phrases, or combinations of these in the user input 132 which are related to an established function (e.g., a function employed in any of the stored source code 124 ).
- FIG. 7 illustrates, a determined calculation 726 associated with the input portion 702 .
- the resulting custom code portion 732 (described further with respect to step 612 below) may include function-definition code 738 associated with the determined calculation 726 .
- custom code 140 is generated based on the determined function(s), variable(s), and badge(s) of steps 606 , 608 , and 610 .
- An example of a determined code portion 732 is illustrated in FIG. 7 .
- the code portion 732 includes a headline portion 734 , a variable-declaration portion 736 , the function-definition portion 738 , and a function-call portion 740 .
- the headline portion 734 is generally a summary of the operation or use of the code portion 732 .
- the variable-declaration portion 736 defines the values of variables to include in the code portion 732 .
- the function-definition portion 738 defines calculations to include in the code portion 732 (i.e., the calculation indicated by the input portion 702 ).
- the function-call portion 740 generally includes code for calling the defined function 738 using the declared variables 736 .
- the custom code generator 130 may determine whether the style of the custom code 140 being generated should be edited (or “fixed”) to correspond to an appropriate style for the user 104 a,b who provided the user input 132 and/or to the group or entity with which the user 104 a,b is affiliated (e.g., the entity associated with the tagged “my group” keyword 710 of the input portion 702 ). For instance, the custom code generator 130 (e.g., the style modifier 138 ) may compare style features of the code 140 generated at step 612 to predefined style features for the user 104 a,b (e.g., from the user's style profile 128 a,b ).
- step 614 may involve the approach described above with respect to FIG. 3 .
- the custom code generator 130 proceeds to step 616 to adjust the code 140 .
- a negative anomaly determination 314 is made (i.e., when style features 302 of the custom code 140 correspond to predefined features 304 )
- the custom code generator may proceed to step 618 without adjusting the custom code 140 .
- the custom code generator 130 edits the custom code 140 generated at step 612 .
- the code 140 may be “fixed” such that the format or style of the code 140 is in accordance with the style profile 128 a,b of the user 104 a,b who provided the user input 132 received at step 602 .
- the style is generally fixed by modifying the code 140 such that the style features are aligned with the user's predefined style features (e.g., as indicated by the user's style profile 128 a,b ).
- An example of such an adjustment is described above with respect to element 224 of FIG. 2 above.
- Fixed code portion 7 illustrates an example fixed code portion 742 where the code 732 has been modified to include style features 744 and 746 , which bring the style of code portion 742 into accordance with the expected style of the user 104 a,b who provided the user input portion 702 .
- Fixed code portion 742 includes additional gap lines 744 and an additional comment line 746 not found in the code portion 732 generated at step 612 .
- Modifying or “fixing” code at step 616 may provide further improvements to the performance and reliability of the custom code 140 generated by the custom code generator 130 , for example, by facilitating the generation of custom code 140 that is not only appropriate for performing certain desired tasks but also that meets quality standards associated with the style, format, and presentation of the custom code 140 (i.e., such that the custom code 140 is readable to appropriately trained programmers and can be trusted for use in future applications). Accordingly, custom code 140 may be particularly appropriate for storage in the source code repository 122 as an entry of the stored source code 124 , such that the code 140 can be used in the future and repurposed, as needed, using the custom code generator 130 .
- the custom code generator 130 may determine whether a user query 134 is received.
- a user query 134 generally corresponds to a request from the user 104 a,b to identify and view or use an entry of stored source code 124 .
- a user query 134 may include a natural-language question or search phrase for locating associated source code 124 . If a user query 134 is not received at step 618 , the custom code generator 130 provides, at step 626 , the generated code 140 to the user 104 a,b who provided the user input 132 . The user 104 a,b may then use the custom code 140 as desired.
- the custom code generator 130 may proceed to step 620 to identify one or more related stories 118 in the story repository 116 . For instance, the custom code generator 130 may identify stories 118 with similar text to that of the user query 134 . This identification may be performed using any appropriate text-based search algorithm. For instance, a keywords may be identified in the query 134 , and stories 118 which include the same or associated keywords may identified and presented to the user 104 a,b .
- the custom code generator 130 determines whether a user selection of one or more of the presented stories 118 is received. If a user selection is not received at step 622 , the custom code generator 130 generally proceeds to step 626 . However, if a user selection is received at step 622 , the custom code generator 130 proceeds to step 624 .
- the custom code generator 130 may append the source code 124 corresponding to the selected story(ies) 118 to the custom source code 140 and/or provide the source code 124 corresponding to the selected story(ies) 118 to the user 104 a,b who provided the user query 134 .
- the custom code generator 130 may provide suggestions for preferred source code 124 to include in the custom code 140 . For instance, if a user query 134 involves a request to locate source code 124 associated with two functions being performed in series, the source code generator 130 may suggest a single entry of source code 124 which performs both functions in series as a preferred option compared to providing two separate entries of source code 124 , which each perform only one of the desired functions.
- the custom code generator 130 may instead only provide a preferred third entry of source code 124 the performs the first and second tasks sequentially.
- the custom code generator 130 may identify existing source code 124 for performing a desired task on a first set of variables (e.g., associated with a user input 132 and/or query 134 ) and repurpose this source code 124 to perform the same desired task (e.g., calculations) using a second set of variables which were identified in the user input 132 and/or query 134 .
- the code generator 130 may receive a query 134 comprising a request to perform a computing task using a first set of variables.
- the custom code generator 130 may then identify (e.g., based on keywords identified in the query 134 ) a story 118 stored in the story repository 116 , that is related to performing the second computing task.
- the identified story 118 may be presented to the user 104 a,b . If the user 104 a,b selected the story 118 , the source code 124 corresponding to the story may be determined. If the source code 124 performs the desired task using a different set of variables, the source code 124 may be edited to replace the different set of variables with the set of variables indicated in the user query 134 .
- the custom code 140 (e.g., as optionally modified at step 624 ) is provided to the user 104 a,b .
- the user 104 a,b may then use the custom code 140 as appropriate.
- FIG. 8 is an embodiment of a device 800 configured to implement the query generation system 100 .
- the device 800 comprises a processor 802 , a memory 804 , and a network interface 806 .
- the device 800 may be configured as shown or in any other suitable configuration.
- the device 800 may be and/or may be used to implement computing devices 102 a,b , source code analyzer 106 , story repository 116 , source code repository 122 , style repository 126 , and custom code generator 130 of FIG. 1 .
- the processor 802 comprises one or more processors operably coupled to the memory 804 .
- the processor 802 is any electronic circuitry including, but not limited to, state machines, one or more central processing unit (CPU) chips, logic units, cores (e.g. a multi-core processor), field-programmable gate array (FPGAs), application specific integrated circuits (ASICs), or digital signal processors (DSPs).
- the processor 802 may be a programmable logic device, a microcontroller, a microprocessor, or any suitable combination of the preceding.
- the processor 802 is communicatively coupled to and in signal communication with the memory 804 and the network interface 806 .
- the one or more processors are configured to process data and may be implemented in hardware or software.
- the processor 802 may be 8-bit, 16-bit, 32-bit, 64-bit or of any other suitable architecture.
- the processor 802 may include an arithmetic logic unit (ALU) for performing arithmetic and logic operations, processor registers that supply operands to the ALU and store the results of ALU operations, and a control unit that fetches instructions from memory and executes them by directing the coordinated operations of the ALU, registers and other components.
- ALU arithmetic logic unit
- the one or more processors are configured to implement various instructions.
- the one or more processors are configured to execute instructions to implement the function disclosed herein, such as some or all of methods 400 and 600 .
- the function described herein is implemented using logic units, FPGAs, ASICs, DSPs, or any other suitable hardware or electronic circuitry.
- the memory 804 is operable to store source code 108 , 124 , stories 118 , summaries 120 , style profiles 128 a,b , and any other data, instructions, logic, rules, or code operable to execute the function described herein.
- the memory 804 comprises one or more disks, tape drives, or solid-state drives, and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution.
- the memory 804 may be volatile or non-volatile and may comprise read-only memory (ROM), random-access memory (RAM), ternary content-addressable memory (TCAM), dynamic random-access memory (DRAM), and static random-access memory (SRAM).
- the network interface 806 is configured to enable wired and/or wireless communications.
- the network interface 806 is configured to communicate data between the device 800 and other network devices, systems, or domain(s).
- the network interface 806 may comprise a WIFI interface, a local area network (LAN) interface, a wide area network (WAN) interface, a modem, a switch, or a router.
- the processor 802 is configured to send and receive data using the network interface 806 .
- the network interface 806 may be configured to use any suitable type of communication protocol as would be appreciated by one of ordinary skill in the art.
Abstract
Description
- The present disclosure relates generally to establishing and maintaining source code. More particularly, in certain embodiments, the present disclosure is related to a system and method for detecting source code anomalies.
- Source code is programming code presented in a human-readable programming language (e.g., as opposed to binary machine code). A given program, or computing task, may be implemented using source code. Specialized training and knowledge of a source code's programming language is generally required to both understand the function(s) of a given piece of source code and to create new code using the source code as a starting point to perform a desired task.
- In an embodiment, a system includes a source code repository configured to store source code entries created by a plurality of users. Each source code entry includes instructions in a programming language for performing a computing task. A style repository is configured to store a style profile for each of the plurality of users. Each style profile includes predefined style features associated with formatting characteristics of the stored source code entries for a corresponding user. A source code analyzer is communicatively coupled to the source code repository and the style repository. A processor of the source code analyzer receives, from a first user, a first source code which includes instructions in the programming language for performing a first computing task. First style features of the first source code are determined. The first style features include characteristics of a format of the first source code. The processor determines whether the first style features correspond to first predefined style features indicated by a first style profile associated with the first user. In response to determining that this is the case, the source code is stored in the source code repository. In response to determining that this is not the case, storage of the first source code in the source code repository is prevented.
- In yet another embodiment, a system includes a source code repository which stores source code entries, which include instructions in a programming language for performing computing tasks. A code generator receives, from a user, an input which includes a request in a natural language to perform a first computing task. Keywords are identified in the input. The keywords include a variable-associated keyword and a function-related keyword. Based on the identified keywords, code-line entries are determined which, when executed in an ordered combination, achieve the first computing task. The code-line entries include a variable-declaration entry, a function-definition entry, and a function-call entry. Based on the variable-associated keyword, one or more variables appearing in the source code repository are determined to declare in order to perform the first computing task. Based on the function-associated keyword, one or more functions appearing in the source code repository are determined to define and call to perform the first computing task. A custom code is generated, in the programming language, which includes a declaration of the determined variables at the variable-declaration entry, a definition of the determined function(s) at the function-definition entry, and a call to the determined function(s) using the declared variables at the function-call entry.
- This disclosure encompasses the recognition of previously unidentified problems associated with previous technology used to maintain collections of source code and adapting this source code to generate code to perform a desired task or function. For instance, previous approaches to storing source code generally relied on programmers to manually annotate code with comments and save the code in a fashion that allowed future use. However, different users tend to have different approaches to writing code in the programming language and formatting the code, resulting in source code entries that may be of limited use to others (i.e., because the purpose of the code is difficult or impossible to decipher). Using previous technology, multiple copies of the same or similar source code may be stored that perform the same function, resulting in inefficient use of computing resources. Moreover, using previous technology, a preferred source code (e.g., a most efficient source code, or a source code with a particular style or format) may not be used as a starting point for generating new programming code because there was previously no means for identifying this preferred source code. Previous technology also fails to detect and correct problematic source code (e.g., whether the code is incorrectly formatted for a given task, includes inefficient protocols, or is intentionally malicious).
- Certain embodiments of the systems, methods, and device of this disclosure provide unique solutions to these newly recognized problems described above and other technical problems by facilitating the reliable storage of source code and the efficient generation of new, customized code. For example, the disclosed system provides several technical advantages which include 1) automatic detection and correction of any anomalies in the source code prior to its storage for future use; 2) determination of natural language descriptions of source code (e.g., of “stories”), which can be easily interpreted even without specialized knowledge and training in a programming language; 3) the efficient and reliable generation of new source code for a custom task and with a user-specific style; and 4) the provision of candidate source code to a user's query to perform a given task and/or for source code related to a given entity or group with which the user is affiliated.
- As such, the system described in this disclosure may improve the function of computer systems used to store source code for future use and generate new source code. The system may also or alternatively reduce or eliminate practical and technical barriers to repurposing existing source code for to perform new functions or tasks. The system described in this disclosure may particularly be integrated into a practical application for storing source code used to perform calculations using a first set of variables and/or functions, and automatically repurposing this code to perform the same or similar calculations using a second set of user-identified variables and/or functions without manually modifying the underlying programming language in the source code (i.e., without writing any code in a specialized programming language).
- Certain embodiments of this disclosure are related to a source code analyzer which determines whether newly provided code is appropriate for storage and future use as source code. For instance, style features can be extracted from the source code and used to identify anomalies in order to detect unapproved or malicious source code. The source code analyzer may generate a repository of natural language descriptions of source code, or “stories,” which may include specialized badges, or tags, which link portions of the descriptions (and/or the associated lines of the corresponding source code) to particular formulas, business units and the like. Examples of such embodiments are described below with respect to
FIGS. 1-5 . - Certain embodiments of this disclosure are related to a custom code generator which uses natural language inputs (e.g., commands) and/or other queries from a user to generate custom code. Generated code can be automatically customized according to the user input and adjusted to match a user's predetermined coding style (e.g., number and length of comments, spacing and indentation format, and the like). Examples of such embodiments are described below with respect to
FIGS. 1 and 6-7 . - Certain embodiments of this disclosure may include some, all, or none of these advantages. These advantages and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
- For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
-
FIG. 1 is a schematic diagram of an example system for source code maintenance and generation, according to an illustrative embodiment of this disclosure; -
FIG. 2 is a flow diagram illustrating an example operation of the style analyzer of the system illustrated inFIG. 1 ; -
FIG. 3 is a flow diagram illustrating the determination of code anomalies; -
FIG. 4 is a flowchart of a method for operating the story generator of the system illustrated inFIG. 1 ; -
FIG. 5 illustrates example source code and example results generated at various steps of the method ofFIG. 4 ; -
FIG. 6 is a flowchart illustrating an example method of operating the custom code generator of the system illustrated inFIG. 1 ; -
FIG. 7 illustrates examples of various elements associated with steps of the method ofFIG. 6 ; and -
FIG. 8 is a diagram of an example device configured to implement the system ofFIG. 1 . - As described above, prior to this disclosure, there was a lack of tools for reliably maintaining records of established source code and effectively leveraging such records of source code to create new code to perform a desired task or function. Using previous technology, the generation of a new programming code based on stored source code is technically challenging and inefficient. A user generally requires specialized knowledge of a particular programming language used to write the code. Even with this knowledge, significant time can be expended attempting to understand and successfully repurpose existing source code. In many cases, in an effort to avoid these challenges, a programmer may create an all new code, effectively wasting the existing source code (and the associated technical resources used to store and maintain a record of source code).
- Various embodiments of this disclosure may solve these and/or other technical problems associated with previous technology. For instance, in certain embodiments, this disclosure facilitates the efficient maintenance of a source code repository and, optionally, a story repository, which stores natural-language descriptions of stored source code (e.g., as described with respect to
FIGS. 1-5 ). In certain embodiments, this disclosure includes a custom code generator which facilitates the generation of customized code in an efficient and user-friendly manner (e.g., as described with respect toFIGS. 1 and 6-7 ) - As used in this disclosure, a natural language corresponds to a an established language (e.g., English) used for human-to-human communication. As used in the disclosure, a programming language refers to a formalized text-based language which includes instructions for implementing functions and/or tasks using a computer. Examples of programming languages include C, C++, C #, Python, JAVA, HTML, and the like. These programming languages are provided for example only. This disclosure contemplates the use of any programming language.
-
FIG. 1 is a schematic diagram of anexample system 100 for source code maintenance and generation. Thesystem 100 includesuser devices 102 a,b, asource code analyzer 106, astory repository 116, asource code repository 122, astyle repository 126, and a custom code generator 130. Thesource code analyzer 106 ofsystem 100 is generally configured to receivesource code 108 from aparticular user 104 a,b and detect any possible anomalies in thesource code 108 before thesource code 108 is stored in the source code repository 122 (e.g., as stored source code 124). For example, thestyle analyzer 114 may detect anomalies associated with style features in thesource code 108, and, if an anomaly is detected, thesource code 108 may be corrected prior to its storage in thesource code repository 122. Further examples of the implementation of thesource code analyzer 106 are described below and with respect toFIGS. 2-5 . The custom code generator 130 is generally configured to receive auser input 132, which includes instructions for performing desired computing tasks in a natural language, and generate acorresponding custom code 140 in an appropriate programming language for implementing the task. Further examples of the implementation of the custom code generator 130 are described below and with respect toFIGS. 6 and 7 . -
User devices 102 a,b are generally any computing devices operable to receive user input associated withsource code 108 and communicate thesource code 108 to thesource code analyzer 106. For instance, auser device 102 a,b may include an appropriate interface and input device for inputting asource code 108.Source code 108 includes instructions in a programming language for performing a computing task (e.g., a calculation). In addition to instructions in a programming language,source code 108 may include comments which are written in a natural language and provide context or a brief description of the purpose of certain lines or sections of thecode 108.User devices 102 a,b may also be operable to provide auser input 132 and/oruser query 134 to the custom code generator 130. For example, each of theuser devices 102 a,b may be a computer or a mobile device. In the illustrative example ofFIG. 1 ,device 102 a is associated with afirst user 104 a, whileuser device 102 b is associated with asecond user 104 b. - As described in greater detail below, whether
source code 108 is provided to thesource code analyzer 106 by thefirst computing device 102 a associated with thefirst user 104 a or thesecond computing device 102 b associated with thesecond user 104 b may determine how thesource code 108 is analyzed and subsequently stored in the source code repository 122 (e.g. or prevented from being stored in the source code repository 122). As also described in greater detail below, whetheruser input 132 and/or query 134 is provided to the code generator 130 by thefirst computing device 102 a associated with afirst user 104 a or thesecond computing device 102 b associated with asecond user 104 b may determine howcustom code 140 is generated (e.g., in an appropriate user-specific fashion).Devices 102 a,b may be implemented using the hardware, memory, and interface ofdevice 800 described with respect toFIG. 8 below -
Source code analyzer 106 may be any computing device, or collection of computing devices, configured to receivesource code 108 fromuser devices 102 a,b and analyze thesource code 108. Thesource code analyzer 106 may be configured to review receivedsource code 108, detect any anomalies in the source code, and correct the anomalies when possible/appropriate. Thesource code analyzer 106 may be implemented using the hardware, memory, and interface ofdevice 800 described with respect toFIG. 8 below. In some embodiments, thesource code analyzer 106 may be implemented on auser device 102 a,b (e.g., using appropriate instructions stored in a memory of thedevice 102 a,b and executed by a processor of thedevice 102 a,b). In other embodiments, thesource code analyzer 106 may be implemented using a separate device, or a collection of computing devices (e.g., configured as a server). - As illustrated in
FIG. 1 , thesource code analyzer 106 may include astory generator 110 and astyle analyzer 114. Thestory generator 110 generally determines, for thesource code 108, acorresponding story 112. Thestory 112 is a natural language description of instructions included in thesource code 108. Thesource code analyzer 106 may store thestory 112 in the story repository 116 (e.g., as one of the stories 118). The generatedstory 112 is generally stored such that it is associated with thesource code 108. This allows thestory 112 to be reviewed at a later time by auser 104 a,b and allows theuser 104 a,b to identify thecorresponding source code 108. In some embodiments, thestory 112 may be determined in a first language (e.g., English) and subsequently translated to a more appropriate language for a givenuser 104 a,b (e.g., a preferred language for theuser 104 a,b). Example implementation of thestory generator 110 is described in greater detail below with respect toFIGS. 4-5 . - The
style analyzer 114 generally determines style features of thesource code 108 and determines, based at least in part on these style features, whether to store thesource code 108 in thesource code repository 122, modify thesource code 108 prior to its storage, or whether to prevent storage of thesource code 108. For instance, thestyle analyzer 114 may detect anomalies in style features of the source code 108 (e.g., irregular use of comments, spaces, and/or punctuation in the source code and/or in the comments, e.g., changes to language in the comments or to the variable naming conventions). Thestyle analyzer 114 may automatically edit the format of thesource code 108 to correct the anomalies prior to storing the editedsource code 108 in the source code repository 122 (e.g., as an entry of stored source code 124). If the anomalies are severe, thestyle analyzer 114 may prevent storage of thesource code 108. In some cases, thestyle analyzer 114, rather than permanently preventing the storage ofsource code 108 with detected anomaly(ies), thesource code 108 may be flagged for human review, and the source code may 108 may be prevented from being stored at least until results of such a review are received. Examples of detected anomalies include a length of indentations in thesource code 108 that is outside of a predefined range, location of gap lines (i.e., empty lines of code) in thesource code 108 not conforming to predefined conventions, a frequency of gap lines in thesource code 108 that is outside of a predefined range, a frequency and/or location of punctuation in thesource code 108 that does not conform to predefined conventions, a number of spaces following variables or other text in a line of thesource code 108 that is outside a predefined range, and the like. - In order to facilitate these and other functionalities of the
style analyzer 114, thestyle analyzer 114 may determine and store style profiles 128 a,b for correspondingusers 104 a,b in thestyle repository 126. Style profiles 128 a,b generally store the predefined style features that have been determined for the correspondingusers 104 a,b (e.g., based on previous code prepared by theseusers 104 a,b). The style profiles 128 a,b may be used to aid in detecting anomalous source code 108 (e.g., ifsource code 108 received from a givenuser 104 a,b does not include style features which correspond to those of that user'sstyle profile 128 a,b) and to generate custom code using the custom code generator 130 (described further below). Example implementation of thestyle analyzer 114 is described in greater detail below with respect toFIGS. 2-3 . - The
story repository 116 is generally a data store, or database, configured to store stories 118 (e.g., natural-language descriptions of thesource code 124 stored in the source code repository 122).Stories 118 may include thestory 112 generated for thesource code 108 along with descriptions ofother source code 124 previously received by thesource code analyzer 106, as described briefly above and in greater detail below with respect toFIGS. 4-5 . Each entry ofsource code 124 may have acorresponding story 118 in thestory repository 116. Thestory repository 116 may also storesummaries 120 of the stories (e.g., more succinct versions of the stories 118). This disclosure contemplatesstory repository 116 storing information (e.g.,stories 118 and/or summaries 120) arranged in any appropriate format. Thestory repository 116 may be stored in memory of a dedicated device and/or in a memory of one or more of theuser devices 102 a,b,source code analyzer 106, and custom code generator 130. Thestory repository 116 may be implemented using the hardware, memory, and interface ofdevice 800 described with respect toFIG. 8 below. - The
story repository 116 may provide further insights for improving the efficiency associated with storingsource code 124 in thesource code repository 122. For instance, in some embodiments,source code 124 with the same orsimilar stories 118, orsummaries 120, may be associated with one another. Such related source code may be flagged for review to identify differences in thesource code 124 and/or determine a preferred entry ofsource code 124 to use in the future. For instance, apreferred code 124 may be retained in thesource code repository 122, while a non-preferred entry ofsource code 124 with the same or a similar story 118 (e.g., less efficient code for performing the same task) may be discarded. - The
source code repository 122 is generally a data store, or database, configured to storesource code 124.Source code 124 may include thesource code 108 as received or as-edited by thesource code analyzer 106, as described briefly above and in greater detail below with respect toFIG. 2-3 . Thesource code repository 122 also stores previously receivedsource code 124. This disclosure contemplatessource code repository 122 storing information (e.g., source code 124) arranged in any appropriate format. Thesource code repository 122 may be stored in memory of a dedicated device and/or in a memory of one or more of theuser devices 102 a,b,source code analyzer 106, and the custom code generator 130. Thesource code repository 122 may be implemented using the hardware, memory, and interface ofdevice 800 described with respect toFIG. 8 below. - The
style repository 126 is generally a data store, or database, configured to store style profiles 128 a,b forusers 104 a,b. Thestyle repository 126 may be implemented using the hardware, memory, and interface ofdevice 800 described with respect toFIG. 8 below. Eachstyle profile 128 a,b is generally associated with acorresponding user 104 a,b and reflects the formatting conventions commonly used by theusers 104 a,b when writing in the programming language used to preparesource code 108. As such, eachstyle profile 128 a,b generally includes predefined style features associated with how theusers 104 a,b prepare (or are expected to prepare)source code 108. As a non-limiting example, the style profiles 128 a,b may store user-specific features such as the length and/or frequency of indentations in thesource code 108 by thecorresponding user 102 a,b when writing in the programming language, location of gap lines insource code 108, frequency of gap lines insource code 108 generated by thecorresponding user 102 a,b when writing in the programming language, the frequency and/or location of punctuation (e.g., colons, semicolons) and/or use of capitalization in comments in thesource code 108 by thecorresponding user 102 a,b when writing in the programming language, the frequency and/or location of comments (e.g., before functions, after variable declarations), and a number of gaps (i.e., empty lines) following a line in in thesource code 108 prepared by thecorresponding user 102 a,b when writing in the programming language. The style profiles 128 a,b may also include threshold ranges by which a style feature can differ from a predefined style feature for theuser 104 a,b before an anomaly is detected. The implementation ofstyle profiles 128 a,b is described in greater detail with respect toFIG. 3 below. - The style profiles 128 a,b may be determined using a number of source code entries (e.g., stored as
entries 124 in the source code repository 122) prepared by theusers 104 a,b over a period of time (e.g., weeks or months). For instance, the style profiles 128 a,b may be determined using a set of heuristics and/or using appropriate method of machine learning. This disclosure contemplatesstyle repository 126 storing information (e.g., style profiles 128 a,b) arranged in any appropriate format. Thestyle repository 126 may be stored in memory of a dedicated device and/or in a memory of one or more of theuser devices 102 a,b,source code analyzer 106, and custom code generator 130. - The custom code generator 130 is generally configured to receive a
user input 132, which includes text in a natural language (e.g., English or any other appropriate language for theusers 104 a,b), and generatecorresponding custom code 140. For instance, theuser input 132 may include a description of a computing task auser 104 a,b desires thesource code 140 to perform. Thecode writer 136 may use information in thestory repository 116 and/or thesource code repository 122 to identify and modify, as needed, portions of the storedsource code 124 to generatecustom code 140. For example, thecode writer 136 may identify keywords in theuser input 132 that are linked with portions ofstories 118 and provide thecorresponding source code 124 to theuser 104 a,b. As another example, if a portion of theuser input 132 is the same as, or similar to a portion of astory 118, thecode writer 136 may use thesource code 124 that corresponds to this portion of thestory 118 in order to write thecustom code 140. - The custom code generator 130 may also include a
style modifier 138, which is generally configured to edit (e.g., or “fix”) the style of code generated by thecode writer 136 such thatcustom code 140 has a style that is aligned with the user'sstyle profile 128 a,b. Thestyle modifier 138 generally employs the style profiles 128 a,b to perform such modifications. In some embodiments, the source code generator 130 facilitates the efficient and reliable repurposing of storedsource code 124, which may be associated with a first task or function (e.g., for performing calculations using a first set of variables and/or functions associated with a first entity or business unit), into acustom code 140, which is configured for a different task or function (e.g., for performing calculations using a second set of variables and/or functions associated with a second entity or business unit) without requiring any technical or programming expertise from theuser 104 a,b who provided the natural-language input 132. An example operation of the custom code generator is described in greater detail below with respect toFIG. 6 . The custom code generator 130 may be implemented using the hardware, memory, and interface ofdevice 800 described with respect toFIG. 8 below - In an example operation of the
system 100, auser 104 a,b providessource code 108 to thesource code analyzer 106 for storage in thesource code repository 122. Thestyle analyzer 114 determines whether thesource code 108 meets certain criteria for storing the source code in thesource code repository 122. For instance, the style analyzer may determine whether style features of thesource code 108 correspond to the expected style features indicated by the user'sstyle profile 128 a,b. An example of this is described with respect toFIG. 3 below. If the style features are not within an expected range, thecode 108 may be edited so that the style of thecode 108 is brought into accordance with the user'sstyle profile 128 a,b before thecode 108 is stored in thesource code repository 122. In some cases,code 108 may be flagged for further review and storage of thecode 108 may be prevented at least for a period of time (e.g., at least until results of administrator review are received indicating thecode 108 is approved for storage). In some cases, thesource code analyzer 106 may also or alternatively determine a natural language description, orstory 112, for thesource code 108. Thestory 112 may be stored in thestory repository 116 for future use, for example, by the custom code generator 130. Further examples of the operation of thesource code analyzer 106 are described below with respect toFIGS. 2-5 . - In another example operation of the
system 100, a natural-language user input 132 is provided by auser 104 a,b to the custom code generator 130. Thecode writer 136 may usestories 118 from thestory repository 116 andsource code 124 from thesource code repository 122 to generatecustom code 140, based on theuser input 132. For instance, keywords identified in theuser input 132 may be matched to those of thestories 118.Source code 124 associated with thematching stories 118 may be appropriately combined to generate thecustom code 140. In some cases, thestyle modifier 138 uses the style profiles 128 a,b to modify the style of thecustom code 140 such that it matches a predefined programming style for theuser 104 a,b (e.g., in accordance withstyle profiles 128 a,b). In some cases theuser input 132 may further include feedback to the custom code generator 130, which may be used to improve performance of thecode writer 136 and/orstyle modifier 138. In some cases, auser 104 a,b may further edit thecustom code 140 by providing auser query 134, which includes a search phrase or other request to identify appropriate existingsource code 124 to include in thecustom code 140. Further examples of the operation of the custom code generator 130 are described below with respect toFIGS. 6 and 7 . -
FIG. 2 shows a flow diagram 200 illustrating example operation of thestyle analyzer 114 of thesource code analyzer 106. In this illustrative example, thestyle analyzer 114 receives previously storedcode 202 a associated withuser 104 a and previously storedcode 202 b associated withuser 104 b. The previously storedcode 202 a,b may be received from the source code repository (i.e., thecode 202 a,b may be included in the storedsource code 124 ofFIG. 1 ) Storedsource code 202 a may correspond to a first set of source code (e.g., instructions written in a programming language) associated with (e.g., generated by)first user 104 a ofFIG. 1 , and storedsource code 202 b may correspond to a second set of source code (e.g., instructions written in a programming language) associated with (e.g., generated by)second user 104 b ofFIG. 1 . Thestyle analyzer 114 uses the previously storedcode 202 a,b to determinestyle profiles 128 a,b for theusers 104 a,b. As explained further below, thesestyle profiles 128 a,b may be employed by thestyle analyzer 114 to evaluatenew source code 204 a,b received fromusers 104 a,b. - Following receipt of the stored
code 202 a,b,style extraction 206 is performed.Style extraction 206 generally involves the determination of style features 210 a,b for the storedcode 202 a,b associated with theusers 104 a,b. For example,style extraction 206 may involve determining style features 210 a,b prevalent in (e.g., commonly found in) thesource code 202 a,b. As an example, the style features 210 a,b may include one or more of a length of indentations in thesource code 202 a,b, location of gap lines (e.g., whether empty lines are left after comments, calls to functions, or the like) in thesource code 202 a,b, a frequency of gap lines (e.g., how frequently empty lines are found in) thesource code 202 a,b, a frequency and/or location of punctuation in thesource code 202 a,b (e.g., how often periods, commas, semicolons, and the like appear in thesource code 202 a,b and/or whether such punctuation is commonly found in comments, calls to functions, following variables, etc.), and the like. The style features 210 a,b are not limited to these example features and may include any other appropriate features associated with a format or style ofsource code 202 a,b. - Following
style extraction 206, thestyle analyzer 114 proceeds tocreation 208 ofstyle profiles 128 a,b.Profile creation 208 involves associating the determined style features 210 a,b with a user identifier 212 a,b for theuser 104 a,b who generated the associated storedcode 202 a,b. The style profiles 128 a,b are generally stored in thestyle repository 126, such that this information is available for future use, for example, by thestyle analyzer 114 and the custom code generator 130 (seeFIG. 1 ). - When
new source code 204 a,b is received by thestyle analyzer 114, thestyle analyzer 114 may proceed withstyle extraction 206, similarly to as described above. For example, thestyle analyzer 114 may determine new style features 210 a,b for the receivedsource code 204 a,b. Thestyle analyzer 114 then makes adetermination 214 of whether an anomaly is detected in thesource code 204 a,b. Thedetermination 214 may employ machine learning or artificial intelligence to determine whether thenew code 204 a,b has a style that corresponds to that of theappropriate style profile 128 a,b and can, thus, reliably be stored in thesource code repository 122. For example, a machine learning model may be trained based on theprevious source code 202 a,b (i.e., and any otherappropriate source code 124 associated with thestyle profile 128 a,b). Also or alternativelydetermination 214 may involve one or more heuristics or rules to determine if thenew code 204 a,b has a style that corresponds to that of theappropriate style profile 128 a,b or an anomaly (e.g., a style anomaly) is detected. -
FIG. 3 is a diagram 300 illustrating an example ofanomaly determination 214 in greater detail. As shown inFIG. 3 , a newly determinedstyle feature 302 is compared to a correspondingpredefined style feature 304. Thedetermined style feature 302 may be any of the example style features 210 a,b described above, or any other appropriate feature associated with the formatting of thenew code 204 a,b. Thepredefined style feature 304 may be one of the style features 210 a,b for theuser 104 a,b who provided thenew source code 204 a,b being analyzed (see alsoFIGS. 1 and 2 ). - A
comparator 306 is used to compare thedetermined style feature 302 to the correspondingpredefined style feature 304 in order to determine a feature difference 308 (e.g., an extent to which thedetermined feature 302 is different from the predefined style feature 304). Thefeature difference 308 may correspond, for example, to a value by which another value associated with thedetermined style feature 302 is different from a value associated with thepredefined style feature 304. For instance, if thedetermined feature 302 indicates that thenew code 204 a,b includes zero gap lines (i.e., empty lines of thecode 204 a,b) after a call to a function and thepredefined style feature 304 indicates that theuser 104 a,b who provided thenew code 204 a,b typically includes two gap lines after a call to a function, thecomparator 306 may determine adifference 308 with a value of two (i.e., 2 expected gap lines−0 observed gap lines=2 gap lines). - The
feature difference 308 is compared to athreshold range 310 via asecond comparator 312 to determine whether thedifference 308 is within athreshold range 310. Thethreshold range 310 generally corresponds to an amount that thedetermined feature 302 can differ from thepredefined feature 304. Thethreshold range 310 for a given feature type may be different for eachuser 104 a,b (e.g., as determined by the style profiles 128 a,b). For instance, if thedetermined style feature 302 indicates that thenew code 204 a,b ofFIG. 2 does not include any empty lines (i.e., “gap lines”) after a called function, and the correspondingpredefined feature 310 indicates that theuser 104 a,b usually includes two gap lines after each call to a function, thecalculated difference 308 is two (i.e., 2 gap lines−0 gap lines=2 gap lines), as described in the example above. In this example, thethreshold range 310 may be a range from negative one to positive one (i.e., indicating that an expected number of gap lines for theuser 104 a,b (e.g., as indicated by the user'sstyle profile 128 a,b) may be the value associated with thepredefined style feature 304 plus or minus one). In this example, thefeature difference 308 of two gap lines is not within thethreshold range 310, and, therefore, thefeature 302 fails to correspond to the user'sstyle profile 128 a,b, resulting in ananomaly determination 314 that is positive. - If the
feature difference 308 is within thethreshold range 310, thecomparator 312 generally determines that thefeature 302 has a negative anomaly determination 314 (i.e., an anomaly is not detected for the feature 302). Anegative anomaly determination 314 generally indicates that thefeature 302 is in agreement with the user'sstyle profile 128 a,b, and an anomaly is not detected atdetermination 214 ofFIG. 2 . However, if thefeature difference 308 is not within thethreshold range 310, theanomaly determination 314 is positive, indicating that thefeature 302 is not in agreement with the user'sstyle profile 128 a,b, and an anomaly is detected atdetermination 214 ofFIG. 2 . In some embodiments, a plurality offeatures 302 for a given entry ofnew code 204 a,b are evaluated according to the process illustrated inFIG. 3 . In such cases, at least a minimum number offeatures 302 must be within the threshold as determined bycomparator 312 in order for an anomaly not to be detected atdetermination 214 ofFIG. 2 . For example, at least 80% of thefeatures 302 may need to have anegative anomaly determination 314 in order for an anomaly not to be detected atdetermination 214 ofFIG. 2 . If fewer than the minimum number offeatures 302 has anegative anomaly determination 314, an anomaly is detected atstep 214 ofFIG. 2 . - Referring again to
FIG. 2 , if an anomaly is not detected atdetermination 214, thestyle analyzer 114 proceeds tostorage 216 of thecode 204 a,b. Thecode 204 a,b is generally stored in thesource code repository 122 ofFIG. 1 , for example, such that thenew code 202 a,b is subsequently available to aid in the generation ofcustom code 140 by the source code generator 130, as described in greater detail below. - In some embodiments, if an anomaly is detected at
determination 214, thestyle analyzer 114 may provide an alert 218 indicating review of thecode 204 a,b is needed. For instance, having been determined to be anomalous, thecode 204 a,b may be provided to an administrator for review. The administrator may determine whether thecode 204 a,b is acceptable (e.g., whether anomalies in thecode 204 a,b are associated with malicious intent (not acceptable) or whether detected anomalies are associated with error or some other non-malicious intent. Theresults 220 of this review may be used to determine whether thestyle analyzer 114 should proceed toprevention 222 of storage of thesource code 204 a,b or to editing 224 thesource code 204 a,b. In other embodiments (e.g., if an alert 218 is not provided), thedetermination 214 may provide further instructions for determining if thecode 204 a,b is acceptable at 220 forstorage 226 after being edited 224 or if thestyle analyzer 114 should prevent 222 storage of thecode 204 a,b. - If an anomaly was detected at
determination 214 and the code is acceptable at 220, thestyle analyzer 114 may automatically edit (e.g., “fix”) 224 thesource code 204 a,b. For example, referring to the example ofFIG. 3 , if thedetermined feature difference 308 is outside of thethreshold range 310, thecode 204 a,b may be edited such that thefeature difference 308 is brought back within thethreshold range 310. For example, in the context of the example described above with respect toFIG. 3 , if the determined feature indicated the presence of zero gap lines after a function call and the correspondingpredefined feature 304 indicated two gap lines should follow a function call, thestyle analyzer 114 may modify 224 thecode 204 a,b such that two gap lines are added after a function call. Thestyle analyzer 114 then stores 226 the editedcode 204 a,b in the source code repository 122 (e.g., as an entry of thesource code 124 ofFIG. 1 ). In some embodiments (e.g., where an alert 218 is not provided), the above-describededits 224 to thecode 204 a,b may be performed following apositive determination 214 of a style anomaly (e.g., in response to determining that adetermined feature difference 308 ofFIG. 3 is outside the corresponding threshold range 310). - If an anomaly was detected at
determination 214 and the code is not acceptable at 220 (e.g., in response to determining that afeature difference 308 ofFIG. 3 is not within thethreshold range 310 indicated by the user'sstyle profile 128 a,b), thestyle analyzer 114 preventsstorage 222 of the first source code in thesource code repository 122. In some embodiments (e.g., where an alert 218 is not provided), thestyle analyzer 114 may determine thatprevention 222 of code storage is appropriate based on one or more of the number ofstyle feature differences 308 ofFIG. 3 that are not within the corresponding threshold ranges 310, the extent to which one or more of thefeature differences 308 depart from the corresponding acceptable threshold ranges 310, and the like. For instance, in some cases, at least two determined style features 302 of a givencode 204 a,b must fail the comparison performed bycomparator 312 ofFIG. 3 in order for thestyle analyzer 114 to automatically preventstorage 222 of thecode 204 a,b. For example, both the number of gap lines following a call to function and the length of comments may have to be outside a predefined range in order for apositive anomaly determination 314 to be made. In some cases, afeature difference 308 ofFIG. 3 must be outside thecorresponding threshold range 310 by a minimum amount in order to proceed toprevention 222 of storage of thecode 204 a,b. For example, if the difference between adetermined feature 302 and apredefined feature 304 is four (e.g., if acode 204 a,b included six gap lines following a call to a function rather than the expected two gap lines for thatuser 104 a,b), and the threshold range is from negative one to one, thisexample feature difference 308 of four would be outside thethreshold range 310 by greater than a minimum amount of three. - In some embodiments, the
style analyzer 114 may detect entries ofsource code 124 which have been intentionally altered (e.g., maliciously altered) and stored in thesource code repository 122. For instance, thestyle analyzer 114 may intermittently check the storedsource code 124 and identify inconsistencies or changes in thesource code 124 over time. For instance if a given entry of the storedsource code 124 has no or less than a threshold number of anomalies (seeFIG. 3 and corresponding description above) at a first time stamp and an increase in anomalies is detected at a second time stamp after the first time stamp, thestyle analyzer 114 may flag this entry ofsource code 124 for further review. Thestyle analyzer 114 may change a permission flag on this entry of thesource code 124 to prevent use of the code until it has passed further review. For instance, an altered permission of this entry ofsource code 124 may prevent thesource code 124 from being used by the custom code generator 130 (described in greater detail below). This may provide further improvements to the security and reliability of the storedsource code 124. - In some cases, the
style analyzer 114 may search for personal information that is included in the storedsource code 124. For instance, thestyle analyzer 114 may search for and flag any personal user information (e.g., user names, addresses, account numbers). This information may be automatically removed if not necessary for implementation of thecode 124. Also or alternatively, this information may be automatically anonymized to prevent its compromise. This may provide further improved data security to thesource code analyzer 106 ofFIG. 1 . - In some embodiments, the
style analyzer 114 may search for keywords associated with known problems in thesource code 124. For instance, thestyle analyzer 114 may search predefined words and/or phrases such as “to do,” “fix me,” “please fix,” and the like. An administrator may identify such terms commonly used byusers 104 a,b to identify that a portion ofcode 124 is not complete or requires attention. These terms may be searched for, and any storedcode 124 containing these terms may be flagged for further review and/or correction. In some embodiments, thestyle analyzer 114 may detect unused and/or redundant objects or functions in storedsource code 124. These unused and/or redundant items may be automatically removed from thesource code 124, thereby making both thesource code repository 122 and the storedsource code 124 more efficient. -
FIG. 4 is a flowchart of anexample method 400 of story generation. Thestory generator 110 may implementmethod 300 to generate thestory 112 ofFIG. 1 . Themethod 400 generally facilitates the determination of a corresponding description, in a natural language, of the instructions included in thesource code 108 for performing a task or function and the subsequent storage of this natural-language description, orstory 112, in thestory repository 116. -
Method 400 may begin atstep 402 wheresource code 108 is received by thestory generator 110. For instance, auser 104 a,b may provide thesource code 108 to thesource code analyzer 108, as described above with respect toFIG. 1 . Atstep 404, thestory generator 110 determines, for each line of thesource code 108, a badge associated with a programming task. For example, a badge may be associated with a description of the programming function associated with the line of thesource code 108, or the information included in the line of thesource code 108. - For illustrative purposes,
FIG. 5 shows anexample code portion 502, which may be included insource code 108. Each line ofcode portion 502 has acorresponding line description 504. For example, the comment at the top ofcode portion 502 has acorresponding line description 504 of “Headline,” while the second comment in thecode portion 502 has acorresponding line description 504 of “Comment Line.” Atstep 404 ofFIG. 4 , thestory generator 110 determines thesedescriptions 504 and uses them to determine a correspondingintelligent badge 508 for each line of thesource code 108. - At
step 406 functions appearing in thesource code 108 are replaced with predefined text which describes the functions. For instance, an equal sign, when used to define a variable value in thesource code 108, may be replaced with the text “is assigned as.” When an equal sign is used as part of an arithmetic function (e.g., “out=x*y” in the example ofFIG. 5 ), the equal sign may be replaced with a phrase such as “is calculated as,” “is computed as,” or the like. This facilitates the transformation of otherwise abstract functions and arithmetic symbols into readily interpretable natural language.FIG. 5 illustrates example results 506 ofsteps method 400, after thestory generator 110 has determinedintelligent badges 508 and replaced functions with corresponding text (e.g., “out=x*y” fromcode portion 502 is replaced with “out is calculated as x multiplied to y” in results 506). Theintelligent badges 508 are illustrated in bold and italic font. - Referring again to
FIG. 4 , atstep 408, thestory generator 110 replaces variable names with predefined variable text. As an example,FIG. 5 illustrates the results 516 of replacingvariables corresponding text step 408. For instance, step 408 ofFIG. 4 may involve replacing the “var_asset” variable 510 and “fee_rate” variable 512 in thecode portion 502 with corresponding text descriptions of “variable asset” 518 and “fee rate” 520, as shown in the progression fromresults 506 to results 516 inFIG. 5 . As another example, step 408 ofFIG. 4 may involve embedding function definitions inside the results of a called function and replacing variable names with descriptions of the variables. For instance, as illustrated inFIG. 5 , “result=fee_calc(var_asset, fee_rate)” inresults 506 is transformed into “result is computed as variable asset multiplied to fee rate” in results 516 ofstep 408. - At
step 410, thestory generator 110 removes the badges to generate anatural language story 112 for theoriginal source code 108.FIG. 5 illustrates theresults 524 ofstep 410.Results 524 are an example of astory 112, or a portion of astory 112. In some cases, thebadges 508 are retained in the story (e.g., such that the results 516 are included in the sty 112). In such cases, all of the results 124 (i.e., rather than only line 526) may be retained as thesummaries 120. Retaining thebadges 508 in thestory 112 may be beneficial for operation of the custom code generator 130, because thebadges 508 can be used to more effectivelyassociate stories 118 to keywords in theuser input 132 and find appropriate storedsource code 124 as a starting point for generatingcustom code 140, as described in greater detail below. - At
step 412, thestory generator 110 stores the resultingstory 112 in thestory repository 116. As illustrated inFIG. 5 , in some cases, the results 524 (e.g., the story 112) may include asummary portion 526, which may be stored as one of thesummaries 120 ofFIG. 1 . Thesummary portion 526 generally provides a high level and readily searchable overview of the function of thesource code portion 502. - As described above with respect to
FIG. 1 the custom code generator 130 facilitates the reliable and user-friendly generation ofcustom code 140 based onnatural language input 132. Thecustom code 140 may include instructions written in any appropriate programming language for performing one or more user-desired tasks or functions. Theuser input 132 generally involves little or no previous knowledge from theusers 104 a,b of the programming language of thecustom code 140. In some cases, auser query 134 may be received by thecustom code generator 136 and used to identifystories 118 which are related to thequery 134. If a user selects one of the identifiedstories 118, the storedsource code 124 that is associated with the selectedstory 118 may be provided to theuser 104 a,b. This may further facilitate the efficient generation ofcustom code 140 for performing desired computing tasks or functions. -
FIG. 6 is a flowchart of anexample method 600 of generatingcustom code 140 using the custom code generator 130 ofFIG. 1 . Themethod 600 may be performed by the custom code generator 130 using thecode writer 136 and/orstyle modifier 138. Themethod 600 may begin atstep 602 where a natural-language user input 132 is received by the custom code generator 130. Theinput 132 generally includes a description of a computing task or function which auser 104 a,b wishes to perform. Theinput 132 may also include an indication of a programming language in which to generate thecustom code 140. The custom code generator 130 may use any appropriate natural language processing algorithm to process theuser input 132, split theinput 132 into subsections (e.g., split paragraphs into sentences or portions of sentences), and/or tag keywords in theinput 132.FIG. 7 illustrates aportion 702 of a naturallanguage user input 132. Thisexample input portion 702 includes certain tagged keywords andphrases custom code 140 usingmethod 600. - Referring again to
FIG. 6 , atstep 604, the custom code generator 130 may determine code-line entries to include in thecustom code 140, based on the received natural-language input 132. For instance, words, phrases, or combinations of both included in theuser input 132 may be used to determine code-line entries which should be included in thecustom code 140.FIG. 7 illustrates example code-line entries 712 to include in acustom code 140 generated based oninput portion 702. The code-line entries 712 include aheadline entry 714, avariables declaration entry 716, afunction definition entry 718, and afunction call entry 720. - For example, the custom code generator may include a
headline entry 714 incustom code 140 such that an initial comment line is provided that describes the use and/or operation of thecustom code 140. The custom code generator 130 may determine thatvariable declarations 714 should be included based on the identification ofkeywords 706 and 708 (i.e., “fees” and “variable assets”) in theinput portion 702.Such keywords function definition 718 should be included based on the identification ofkeywords 704 and 706 (i.e., “calculate” and “fees”). Verbs, such as “calculate,” appearing in theinput portion 702 may be associated with functions used to perform actions associated the verbs (i.e., calculations in this example). The custom code generator 130 may determine that a function-call entry 720 should be included in order to execute the defined for the declared variables. - Referring again to
FIG. 6 , atstep 606, an intelligent badge is determined for each code-line entry determined from theuser input 132. Examples ofintelligent badges 308 are illustrated inFIG. 3 .FIG. 7 also illustrates example badges included in each code-line entry related stories 118 in thestory repository 116. Atstep 608, variable-related words or phrases are identified in theuser input 132 and used to determine appropriate variables variable values to use in thecustom code 140 being generated. For instance, the custom code generator 130 may access information stored in thestory repository 116, thesource code repository 122, and/or thestyle repository 126 to determine appropriate variable names and values to include in thecustom code 140. For instance, as illustrated in the example ofFIG. 7 , the “variable asset”keyword 708 may be associated with a “var_asset”variable 722. The custom code generator 130 may further determine a variable value 724 of ten for the “var_asset”variable 722. The custom code generator 130 may determine a calculation 726 associated with the “fee”keyword 706. This calculation 726 includes a further “fee rate” variable 728, which has an associatedvariable value 730 of fifteen. Thevalues 724 and 730 may be determined based on theuser 104 a,b who provided theuser input portion 702. For example, the tagged “my group”phrase 710 ofinput portion 702 may be used to associate thevariables appropriate values 724 and 730 for theuser 104 a,b or the user's group (e.g., an entity or business group with which theuser 104 a,b is associated). - Referring to
FIG. 6 , atstep 610, the custom code generator 130 determines functions to provide in place of function-related text identified in theuser input 132. For instance, the source code generator 130 may identify certain words, phrases, or combinations of these in theuser input 132 which are related to an established function (e.g., a function employed in any of the stored source code 124). As a non-limiting example,FIG. 7 illustrates, a determined calculation 726 associated with theinput portion 702. The resulting custom code portion 732 (described further with respect to step 612 below) may include function-definition code 738 associated with the determined calculation 726. - Referring to
FIG. 6 , atstep 612,custom code 140 is generated based on the determined function(s), variable(s), and badge(s) ofsteps determined code portion 732 is illustrated inFIG. 7 . As shown in the example ofFIG. 7 , thecode portion 732 includes aheadline portion 734, a variable-declaration portion 736, the function-definition portion 738, and a function-call portion 740. Theheadline portion 734 is generally a summary of the operation or use of thecode portion 732. The variable-declaration portion 736 defines the values of variables to include in thecode portion 732. The function-definition portion 738, as described above with respect to step 610, defines calculations to include in the code portion 732 (i.e., the calculation indicated by the input portion 702). The function-call portion 740 generally includes code for calling the definedfunction 738 using the declaredvariables 736. - Referring to
FIG. 6 , atstep 614, the custom code generator 130 may determine whether the style of thecustom code 140 being generated should be edited (or “fixed”) to correspond to an appropriate style for theuser 104 a,b who provided theuser input 132 and/or to the group or entity with which theuser 104 a,b is affiliated (e.g., the entity associated with the tagged “my group”keyword 710 of the input portion 702). For instance, the custom code generator 130 (e.g., the style modifier 138) may compare style features of thecode 140 generated atstep 612 to predefined style features for theuser 104 a,b (e.g., from the user'sstyle profile 128 a,b). In some embodiments,step 614 may involve the approach described above with respect toFIG. 3 . In such embodiments, if apositive anomaly determination 314 is made (i.e., when style features 302 of thecustom code 140 do not correspond to predefined features 304), the custom code generator 130 proceeds to step 616 to adjust thecode 140. Otherwise, if anegative anomaly determination 314 is made (i.e., when style features 302 of thecustom code 140 correspond to predefined features 304), the custom code generator may proceed to step 618 without adjusting thecustom code 140. - At
step 616, the custom code generator 130 (e.g., thestyle modifier 138 of the custom code generator 130) edits thecustom code 140 generated atstep 612. Thecode 140 may be “fixed” such that the format or style of thecode 140 is in accordance with thestyle profile 128 a,b of theuser 104 a,b who provided theuser input 132 received atstep 602. The style is generally fixed by modifying thecode 140 such that the style features are aligned with the user's predefined style features (e.g., as indicated by the user'sstyle profile 128 a,b). An example of such an adjustment is described above with respect toelement 224 ofFIG. 2 above. As a further example,FIG. 7 illustrates an example fixedcode portion 742 where thecode 732 has been modified to include style features 744 and 746, which bring the style ofcode portion 742 into accordance with the expected style of theuser 104 a,b who provided theuser input portion 702.Fixed code portion 742 includesadditional gap lines 744 and anadditional comment line 746 not found in thecode portion 732 generated atstep 612. - Modifying or “fixing” code at
step 616 may provide further improvements to the performance and reliability of thecustom code 140 generated by the custom code generator 130, for example, by facilitating the generation ofcustom code 140 that is not only appropriate for performing certain desired tasks but also that meets quality standards associated with the style, format, and presentation of the custom code 140 (i.e., such that thecustom code 140 is readable to appropriately trained programmers and can be trusted for use in future applications). Accordingly,custom code 140 may be particularly appropriate for storage in thesource code repository 122 as an entry of the storedsource code 124, such that thecode 140 can be used in the future and repurposed, as needed, using the custom code generator 130. - At
step 618, the custom code generator 130 may determine whether auser query 134 is received. As described above, auser query 134 generally corresponds to a request from theuser 104 a,b to identify and view or use an entry of storedsource code 124. For instance, auser query 134 may include a natural-language question or search phrase for locating associatedsource code 124. If auser query 134 is not received atstep 618, the custom code generator 130 provides, atstep 626, the generatedcode 140 to theuser 104 a,b who provided theuser input 132. Theuser 104 a,b may then use thecustom code 140 as desired. - If a user query is received at
step 618, the custom code generator 130 may proceed to step 620 to identify one or morerelated stories 118 in thestory repository 116. For instance, the custom code generator 130 may identifystories 118 with similar text to that of theuser query 134. This identification may be performed using any appropriate text-based search algorithm. For instance, a keywords may be identified in thequery 134, andstories 118 which include the same or associated keywords may identified and presented to theuser 104 a,b. Atstep 622, the custom code generator 130 determines whether a user selection of one or more of the presentedstories 118 is received. If a user selection is not received atstep 622, the custom code generator 130 generally proceeds to step 626. However, if a user selection is received atstep 622, the custom code generator 130 proceeds to step 624. - At
step 624, the custom code generator 130 may append thesource code 124 corresponding to the selected story(ies) 118 to thecustom source code 140 and/or provide thesource code 124 corresponding to the selected story(ies) 118 to theuser 104 a,b who provided theuser query 134. In some embodiments, the custom code generator 130 may provide suggestions forpreferred source code 124 to include in thecustom code 140. For instance, if auser query 134 involves a request to locatesource code 124 associated with two functions being performed in series, the source code generator 130 may suggest a single entry ofsource code 124 which performs both functions in series as a preferred option compared to providing two separate entries ofsource code 124, which each perform only one of the desired functions. For instance, rather than providing a first entry ofsource code 124 for performing a first task and a second entry ofsource code 124 for performing a second task, the custom code generator 130 may instead only provide a preferred third entry ofsource code 124 the performs the first and second tasks sequentially. - In some embodiments, the custom code generator 130 may identify existing
source code 124 for performing a desired task on a first set of variables (e.g., associated with auser input 132 and/or query 134) and repurpose thissource code 124 to perform the same desired task (e.g., calculations) using a second set of variables which were identified in theuser input 132 and/orquery 134. As an example, the code generator 130 may receive aquery 134 comprising a request to perform a computing task using a first set of variables. The custom code generator 130 may then identify (e.g., based on keywords identified in the query 134) astory 118 stored in thestory repository 116, that is related to performing the second computing task. The identifiedstory 118 may be presented to theuser 104 a,b. If theuser 104 a,b selected thestory 118, thesource code 124 corresponding to the story may be determined. If thesource code 124 performs the desired task using a different set of variables, thesource code 124 may be edited to replace the different set of variables with the set of variables indicated in theuser query 134. - At
step 626, the custom code 140 (e.g., as optionally modified at step 624) is provided to theuser 104 a,b. Theuser 104 a,b may then use thecustom code 140 as appropriate. -
FIG. 8 is an embodiment of adevice 800 configured to implement thequery generation system 100. Thedevice 800 comprises aprocessor 802, amemory 804, and anetwork interface 806. Thedevice 800 may be configured as shown or in any other suitable configuration. Thedevice 800 may be and/or may be used to implementcomputing devices 102 a,b,source code analyzer 106,story repository 116,source code repository 122,style repository 126, and custom code generator 130 ofFIG. 1 . - The
processor 802 comprises one or more processors operably coupled to thememory 804. Theprocessor 802 is any electronic circuitry including, but not limited to, state machines, one or more central processing unit (CPU) chips, logic units, cores (e.g. a multi-core processor), field-programmable gate array (FPGAs), application specific integrated circuits (ASICs), or digital signal processors (DSPs). Theprocessor 802 may be a programmable logic device, a microcontroller, a microprocessor, or any suitable combination of the preceding. Theprocessor 802 is communicatively coupled to and in signal communication with thememory 804 and thenetwork interface 806. The one or more processors are configured to process data and may be implemented in hardware or software. For example, theprocessor 802 may be 8-bit, 16-bit, 32-bit, 64-bit or of any other suitable architecture. Theprocessor 802 may include an arithmetic logic unit (ALU) for performing arithmetic and logic operations, processor registers that supply operands to the ALU and store the results of ALU operations, and a control unit that fetches instructions from memory and executes them by directing the coordinated operations of the ALU, registers and other components. The one or more processors are configured to implement various instructions. For example, the one or more processors are configured to execute instructions to implement the function disclosed herein, such as some or all ofmethods - The
memory 804 is operable to storesource code stories 118,summaries 120, style profiles 128 a,b, and any other data, instructions, logic, rules, or code operable to execute the function described herein. Thememory 804 comprises one or more disks, tape drives, or solid-state drives, and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution. Thememory 804 may be volatile or non-volatile and may comprise read-only memory (ROM), random-access memory (RAM), ternary content-addressable memory (TCAM), dynamic random-access memory (DRAM), and static random-access memory (SRAM). - The
network interface 806 is configured to enable wired and/or wireless communications. Thenetwork interface 806 is configured to communicate data between thedevice 800 and other network devices, systems, or domain(s). For example, thenetwork interface 806 may comprise a WIFI interface, a local area network (LAN) interface, a wide area network (WAN) interface, a modem, a switch, or a router. Theprocessor 802 is configured to send and receive data using thenetwork interface 806. Thenetwork interface 806 may be configured to use any suitable type of communication protocol as would be appreciated by one of ordinary skill in the art. - While several embodiments have been provided in this disclosure, it should be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of this disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
- In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of this disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.
- To aid the Patent Office, and any readers of any patent issued on this application in interpreting the claims appended hereto, applicants note that they do not intend any of the appended claims to invoke 35 U.S.C. § 112(f) as it exists on the date of filing hereof unless the words “means for” or “step for” are explicitly used in the particular claim.
Claims (20)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/793,189 US11250128B2 (en) | 2020-02-18 | 2020-02-18 | System and method for detecting source code anomalies |
US17/644,669 US11657151B2 (en) | 2020-02-18 | 2021-12-16 | System and method for detecting source code anomalies |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/793,189 US11250128B2 (en) | 2020-02-18 | 2020-02-18 | System and method for detecting source code anomalies |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/644,669 Continuation US11657151B2 (en) | 2020-02-18 | 2021-12-16 | System and method for detecting source code anomalies |
Publications (2)
Publication Number | Publication Date |
---|---|
US20210256122A1 true US20210256122A1 (en) | 2021-08-19 |
US11250128B2 US11250128B2 (en) | 2022-02-15 |
Family
ID=77272868
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/793,189 Active 2040-08-15 US11250128B2 (en) | 2020-02-18 | 2020-02-18 | System and method for detecting source code anomalies |
US17/644,669 Active US11657151B2 (en) | 2020-02-18 | 2021-12-16 | System and method for detecting source code anomalies |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/644,669 Active US11657151B2 (en) | 2020-02-18 | 2021-12-16 | System and method for detecting source code anomalies |
Country Status (1)
Country | Link |
---|---|
US (2) | US11250128B2 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11481488B2 (en) * | 2020-04-23 | 2022-10-25 | Red Hat, Inc. | Automated security algorithm identification for software distributions |
US11657232B2 (en) | 2020-02-18 | 2023-05-23 | Bank Of America Corporation | Source code compiler using natural language input |
US11657151B2 (en) | 2020-02-18 | 2023-05-23 | Bank Of America Corporation | System and method for detecting source code anomalies |
Family Cites Families (56)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH083815B2 (en) | 1985-10-25 | 1996-01-17 | 株式会社日立製作所 | Natural language co-occurrence relation dictionary maintenance method |
JPH02301869A (en) | 1989-05-17 | 1990-12-13 | Hitachi Ltd | Method for maintaining and supporting natural language processing system |
US6760695B1 (en) | 1992-08-31 | 2004-07-06 | Logovista Corporation | Automated natural language processing |
US5649200A (en) | 1993-01-08 | 1997-07-15 | Atria Software, Inc. | Dynamic rule-based version control system |
US5878386A (en) | 1996-06-28 | 1999-03-02 | Microsoft Corporation | Natural language parser with dictionary-based part-of-speech probabilities |
US5933822A (en) | 1997-07-22 | 1999-08-03 | Microsoft Corporation | Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision |
US6434524B1 (en) | 1998-09-09 | 2002-08-13 | One Voice Technologies, Inc. | Object interactive user interface using speech recognition and natural language processing |
JP2001100781A (en) | 1999-09-30 | 2001-04-13 | Sony Corp | Method and device for voice processing and recording medium |
US7810069B2 (en) * | 1999-10-05 | 2010-10-05 | Borland Software Corporation | Methods and systems for relating data structures and object-oriented elements for distributed computing |
US6757893B1 (en) | 1999-12-17 | 2004-06-29 | Canon Kabushiki Kaisha | Version control system for software code |
US7092871B2 (en) | 2000-07-20 | 2006-08-15 | Microsoft Corporation | Tokenizer for a natural language processing system |
US7010479B2 (en) | 2000-07-26 | 2006-03-07 | Oki Electric Industry Co., Ltd. | Apparatus and method for natural language processing |
US6604110B1 (en) * | 2000-08-31 | 2003-08-05 | Ascential Software, Inc. | Automated software code generation from a metadata-based repository |
US6594823B1 (en) | 2000-09-13 | 2003-07-15 | Microsoft Corporation | Method and system for representing a high-level programming language data structure in a mark-up language |
US7127712B1 (en) | 2001-02-14 | 2006-10-24 | Oracle International Corporation | System and method for providing a java code release infrastructure with granular code patching |
US7685562B2 (en) * | 2001-09-28 | 2010-03-23 | Siebel Systems, Inc. | Method and code generator for integrating different enterprise business applications |
US7346897B2 (en) | 2002-11-20 | 2008-03-18 | Purenative Software Corporation | System for translating programming languages |
US8332828B2 (en) | 2002-11-20 | 2012-12-11 | Purenative Software Corporation | System for translating diverse programming languages |
US8219801B2 (en) * | 2003-03-10 | 2012-07-10 | International Business Machines Corporation | Method of authenticating digitally encoded products without private key sharing |
US7707566B2 (en) | 2003-06-26 | 2010-04-27 | Microsoft Corporation | Software development infrastructure |
US7454744B2 (en) | 2003-07-03 | 2008-11-18 | International Business Machines Corporation | Private source code commenting |
US7568109B2 (en) | 2003-09-11 | 2009-07-28 | Ipx, Inc. | System for software source code comparison |
JP2008533544A (en) | 2004-09-20 | 2008-08-21 | コダーズ,インコーポレイテッド | Method and system for operating a source code search engine |
US20070299825A1 (en) | 2004-09-20 | 2007-12-27 | Koders, Inc. | Source Code Search Engine |
US7653893B2 (en) | 2005-03-04 | 2010-01-26 | Microsoft Corporation | Methods and apparatus for implementing checkin policies in source code control systems |
US20060271920A1 (en) | 2005-05-24 | 2006-11-30 | Wael Abouelsaadat | Multilingual compiler system and method |
JP2007034813A (en) | 2005-07-28 | 2007-02-08 | National Institute Of Advanced Industrial & Technology | Software manual generation system in two or more natural languages |
US7765097B1 (en) | 2006-03-20 | 2010-07-27 | Intuit Inc. | Automatic code generation via natural language processing |
US8375361B2 (en) | 2007-05-29 | 2013-02-12 | International Business Machines Corporation | Identifying changes in source code |
US8527262B2 (en) | 2007-06-22 | 2013-09-03 | International Business Machines Corporation | Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications |
US8495100B2 (en) | 2007-11-15 | 2013-07-23 | International Business Machines Corporation | Semantic version control system for source code |
US9672019B2 (en) * | 2008-11-24 | 2017-06-06 | Intel Corporation | Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads |
US8635204B1 (en) * | 2010-07-30 | 2014-01-21 | Accenture Global Services Limited | Mining application repositories |
US8683430B2 (en) | 2011-01-07 | 2014-03-25 | International Business Machines Corporation | Synchronizing development code and deployed executable versioning within distributed systems |
US8660836B2 (en) | 2011-03-28 | 2014-02-25 | International Business Machines Corporation | Optimization of natural language processing system based on conditional output quality at risk |
US8856725B1 (en) | 2011-08-23 | 2014-10-07 | Amazon Technologies, Inc. | Automated source code and development personnel reputation system |
US8689060B2 (en) * | 2011-11-15 | 2014-04-01 | Sap Ag | Process model error correction |
CN104025077B (en) | 2011-12-28 | 2017-10-27 | 英特尔公司 | The real-time natural language processing of data flow |
US9323923B2 (en) * | 2012-06-19 | 2016-04-26 | Deja Vu Security, Llc | Code repository intrusion detection |
US9280322B2 (en) | 2012-09-27 | 2016-03-08 | Intel Corporation | Generating source code |
US8973142B2 (en) * | 2013-07-02 | 2015-03-03 | Imperva, Inc. | Compromised insider honey pots using reverse honey tokens |
US9176729B2 (en) | 2013-10-04 | 2015-11-03 | Avaya Inc. | System and method for prioritizing and remediating defect risk in source code |
US9928040B2 (en) | 2013-11-12 | 2018-03-27 | Microsoft Technology Licensing, Llc | Source code generation, completion, checking, correction |
US10726831B2 (en) | 2014-05-20 | 2020-07-28 | Amazon Technologies, Inc. | Context interpretation in natural language processing using previous dialog acts |
US20160162457A1 (en) | 2014-12-09 | 2016-06-09 | Idibon, Inc. | Optimization techniques for artificial intelligence |
US9785777B2 (en) * | 2014-12-19 | 2017-10-10 | International Business Machines Corporation | Static analysis based on abstract program representations |
US9946785B2 (en) | 2015-03-23 | 2018-04-17 | International Business Machines Corporation | Searching code based on learned programming construct patterns and NLP similarity |
US9531745B1 (en) * | 2015-11-20 | 2016-12-27 | International Business Machines Corporation | Crowd-sourced security analysis |
US9766868B2 (en) | 2016-01-29 | 2017-09-19 | International Business Machines Corporation | Dynamic source code generation |
US10621314B2 (en) * | 2016-08-01 | 2020-04-14 | Palantir Technologies Inc. | Secure deployment of a software package |
US10048945B1 (en) | 2017-05-25 | 2018-08-14 | Devfactory Fz-Llc | Library suggestion engine |
US10474455B2 (en) * | 2017-09-08 | 2019-11-12 | Devfactory Fz-Llc | Automating identification of code snippets for library suggestion models |
US10732966B2 (en) * | 2017-09-08 | 2020-08-04 | Devfactory Innovations Fz-Llc | Library model addition |
US10911337B1 (en) * | 2018-10-10 | 2021-02-02 | Benjamin Thaddeus De Kosnik | Network activity monitoring service |
US11250128B2 (en) * | 2020-02-18 | 2022-02-15 | Bank Of America Corporation | System and method for detecting source code anomalies |
US11176329B2 (en) | 2020-02-18 | 2021-11-16 | Bank Of America Corporation | Source code compiler using natural language input |
-
2020
- 2020-02-18 US US16/793,189 patent/US11250128B2/en active Active
-
2021
- 2021-12-16 US US17/644,669 patent/US11657151B2/en active Active
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11657232B2 (en) | 2020-02-18 | 2023-05-23 | Bank Of America Corporation | Source code compiler using natural language input |
US11657151B2 (en) | 2020-02-18 | 2023-05-23 | Bank Of America Corporation | System and method for detecting source code anomalies |
US11481488B2 (en) * | 2020-04-23 | 2022-10-25 | Red Hat, Inc. | Automated security algorithm identification for software distributions |
Also Published As
Publication number | Publication date |
---|---|
US20220108011A1 (en) | 2022-04-07 |
US11250128B2 (en) | 2022-02-15 |
US11657151B2 (en) | 2023-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11657232B2 (en) | Source code compiler using natural language input | |
US11657151B2 (en) | System and method for detecting source code anomalies | |
CN111460787B (en) | Topic extraction method, topic extraction device, terminal equipment and storage medium | |
US11080597B2 (en) | Crowdsourced learning engine for semantic analysis of webpages | |
US7689578B2 (en) | Dealing with annotation versioning through multiple versioning policies and management thereof | |
US9659055B2 (en) | Structured searching of dynamic structured document corpuses | |
KR20150042877A (en) | Managing record format information | |
US20190391992A1 (en) | Methods and systems for performing a model driven domain specific search | |
CN111831803A (en) | Sensitive information detection method and device and storage medium | |
US8656267B2 (en) | Method of approximate document generation | |
US10782942B1 (en) | Rapid onboarding of data from diverse data sources into standardized objects with parser and unit test generation | |
Zou et al. | SCVD: A new semantics-based approach for cloned vulnerable code detection | |
CN112733517B (en) | Method for checking requirement template conformity, electronic equipment and storage medium | |
Nielandt et al. | Predicate enrichment of aligned XPaths for wrapper induction | |
CN114936366A (en) | Malicious software family tag correction method and device based on hybrid analysis | |
Stein Dani et al. | Supporting event log extraction based on matching | |
CN111339272A (en) | Code defect report retrieval method and device | |
US11822907B2 (en) | Reusable code management for improved deployment of application code | |
US11704096B2 (en) | Monitoring application code usage for improved implementation of reusable code | |
CN116414445B (en) | Homology detection method and system based on source code watermark | |
WO2023078742A1 (en) | Automated maintenance of an object repository file for automation testing | |
US10268674B2 (en) | Linguistic intelligence using language validator | |
CN116127016A (en) | Business process compliance checking method, terminal and platform based on natural language processing | |
CN116541071A (en) | Application programming interface migration method based on prompt learning | |
CN117371034A (en) | Function-oriented operating system kernel automatic security assessment method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: BANK OF AMERICA CORPORATION, NORTH CAROLINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VITHIYANATHAN, VIDHYA;REEL/FRAME:051844/0412 Effective date: 20200207 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: BANK OF AMERICA CORPORATION, NORTH CAROLINA Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE LAST NAME OF INVENTOR/ASSIGNOR TO VAITHIYANATHAN PREVIOUSLY RECORDED ON REEL 051844 FRAME 0412. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:VAITHIYANATHAN, VIDHYA;REEL/FRAME:057632/0326 Effective date: 20200207 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |