CN116126731A - Code standardization method based on generation type pre-training - Google Patents
Code standardization method based on generation type pre-training Download PDFInfo
- Publication number
- CN116126731A CN116126731A CN202310229771.7A CN202310229771A CN116126731A CN 116126731 A CN116126731 A CN 116126731A CN 202310229771 A CN202310229771 A CN 202310229771A CN 116126731 A CN116126731 A CN 116126731A
- Authority
- CN
- China
- Prior art keywords
- code
- model
- training
- method based
- codes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/3628—Software debugging of optimised code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Debugging And Monitoring (AREA)
- Stored Programmes (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The code standardization method based on the generation type pre-training provided by the invention adopts the generation type pre-training text processing model, performs model pre-training through an open source code and an own code, and performs model fine adjustment based on the code conforming to a programming specification, so that the model can perform code normalization inspection on the input code and give a modification suggestion, the compliance of an application code to the programming specification and a development standard is improved, and the defects of postmortem and passivity existing in a static code scanning mode are overcome.
Description
Technical Field
The invention relates to the field of code auditing, in particular to a code standardization method based on generation type pre-training.
Background
In the development process of the application system and the software product, programming specifications are usually formulated to restrict the programming style of developers and the implementation mode to unify the programming style, so that the maintenance and the expansion are convenient, and the project delivery quality is improved. The disclosed 'Aliba Java development manual' is divided into five parts of programming protocol, exception log, mySQL protocol, engineering protocol and safety protocol to restrict the development specification of Java program language, so that development team is helped to develop more efficiently, fault-tolerant and cooperatively, code quality is improved, and project maintenance cost is reduced.
In addition, when software development is combined with the actual business field, the standardization and consistency of the system are generally improved based on specifications such as unified data dictionary, unified message format, unified log format, unified byte code and the like. Thus, compliance and efficient verification of programming development specifications and standards are indispensable.
In addition to manual code auditing, the existing scheme mainly uses a static scanning tool such as Checkstyle, PMD to conduct code normalization inspection. The static scanning tool scans the program code through the technologies such as lexical analysis, grammar analysis, control flow, data analysis and the like in a mode of not running the code, and verifies whether the code meets the indexes such as standardization, safety, reliability, maintainability and the like.
Static code scanning automatically discovers the compliance risk existing in the code in a standardized and automatic mode, helps developers to concentrate on analyzing and solving code design defects, and rapidly locates code hiding errors and defects, so that program delivery quality is guaranteed.
The static code scanning is to perform standard specification inspection on the written code, mark a program which does not meet the standard specification and has problem hidden danger, and provide compliance standard recommendation for post-hoc inspection in the encoding stage.
Static code scanning cannot give modification suggestions for most of the detected problems, and a developer is required to carry out code modification by himself.
Disclosure of Invention
In view of the above, the present invention has been made to provide a code normalization method based on generative pre-training that overcomes or at least partially solves the above-mentioned problems.
According to one aspect of the present invention, there is provided a code normalization method based on a generative pre-training, the code normalization method comprising:
pre-training codes;
code trimming is normalized;
code specification checking and modification suggestions.
Optionally, the pre-training code specifically includes:
the GPT model engine is deployed to pre-train the GPT model with open source code and owned code.
Optionally, the fine tuning of the specification code specifically includes:
after the GPT model is pre-trained, the model is finely tuned by using codes which pass through the manual review or the static scanning tool inspection and meet programming standards, so that the model has generalization capability on text features of the programming standard compliance codes.
Optionally, the code specification checking and modifying suggestion specifically includes:
in the encoding process, a developer can input a completed code or method function as a problem into a GPT model, the model gives a checked problem list, and the problem list is provided for the developer as an output result by generating and matching a repair suggestion.
The code standardization method based on the generation type pre-training provided by the invention adopts the generation type pre-training text processing model, performs model pre-training through an open source code and an own code, and performs model fine adjustment based on the code conforming to a programming specification, so that the model can perform code normalization inspection on the input code and give a modification suggestion, the compliance of an application code to the programming specification and a development standard is improved, and the defects of postmortem and passivity existing in a static code scanning mode are overcome.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a pre-training code provided in an embodiment of the present invention;
FIG. 2 is a flow chart of fine tuning of a code specification according to an embodiment of the present invention;
FIG. 3 is a flow chart of code specification checking and modification suggestions provided by an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The terms "comprising" and "having" and any variations thereof in the description embodiments of the invention and in the claims and drawings are intended to cover a non-exclusive inclusion, such as a series of steps or elements.
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings and the examples.
According to the technical scheme, a GPT model engine is deployed, training is conducted based on open source codes and self codes, fine adjustment is conducted on the model through the self codes conforming to programming specifications, development and Bug repair suggestions can be provided by the model, programming specifications and development standard compliance detection are conducted on the codes, and repair suggestions are given.
As shown in fig. 1, code pre-training: a GPT model engine (transducer) is deployed to pre-train the GPT model with open source code and owned code.
As shown in fig. 2, specification code Fine tuning (Fine-Tune): after the GPT model is pre-trained, the model is subjected to Fine tuning (Fine-Tune) by using codes which pass through and meet programming standards through manual review (Codereview) or static scanning tool inspection, so that the model has generalization capability on text features of programming standard compliance codes.
As shown in fig. 3, code specification checking and modification advice: in the encoding process, a developer can input a completed code or method function as a problem into a GPT model, the model gives a checked problem list, and the problem list is provided for the developer as an output result by generating and matching a repair suggestion.
The invention adopts the generated pre-training text processing model, performs model pre-training through open source codes and own codes, and performs model fine adjustment based on codes conforming to programming specifications, so that the model can perform code normalization inspection on the input codes and give out modification suggestions, the compliance of application codes to the programming specifications and development standards is improved, and the defects of postmortem property and passivity existing in a static code scanning mode are overcome.
The beneficial effects are that: and the model has generalization capability on the text characteristics of the programming specification compliance code by adopting a generating pre-training and fine tuning mode.
The model gives a problem list of code inspection, and gives a repair suggestion for the problem code, so that the development efficiency is further improved.
The foregoing detailed description of the invention has been presented for purposes of illustration and description, and it should be understood that the invention is not limited to the particular embodiments disclosed, but is intended to cover all modifications, equivalents, alternatives, and improvements within the spirit and principles of the invention.
Claims (4)
1. A method of code normalization based on generative pre-training, the method comprising:
pre-training codes;
code trimming is normalized;
code specification checking and modification suggestions.
2. The code normalization method based on the generated pre-training according to claim 1, characterized in that the pre-training code specifically comprises:
the GPT model engine is deployed to pre-train the GPT model with open source code and owned code.
3. The code normalization method based on the generated pre-training according to claim 1, wherein the code trimming specification specifically comprises:
after the GPT model is pre-trained, the model is finely tuned by using codes which pass through the manual review or the static scanning tool inspection and meet programming standards, so that the model has generalization capability on text features of the programming standard compliance codes.
4. The code normalization method based on the generated pre-training according to claim 1, wherein the code specification checking and modifying advice specifically comprises:
in the encoding process, a developer can input a completed code or method function as a problem into a GPT model, the model gives a checked problem list, and the problem list is provided for the developer as an output result by generating and matching a repair suggestion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310229771.7A CN116126731A (en) | 2023-03-10 | 2023-03-10 | Code standardization method based on generation type pre-training |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310229771.7A CN116126731A (en) | 2023-03-10 | 2023-03-10 | Code standardization method based on generation type pre-training |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116126731A true CN116126731A (en) | 2023-05-16 |
Family
ID=86297601
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310229771.7A Pending CN116126731A (en) | 2023-03-10 | 2023-03-10 | Code standardization method based on generation type pre-training |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116126731A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116611074A (en) * | 2023-07-17 | 2023-08-18 | 北京奇虎科技有限公司 | Security information auditing method, device, storage medium and apparatus |
-
2023
- 2023-03-10 CN CN202310229771.7A patent/CN116126731A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116611074A (en) * | 2023-07-17 | 2023-08-18 | 北京奇虎科技有限公司 | Security information auditing method, device, storage medium and apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10346140B2 (en) | System and method for model based technology and process for safety-critical software development | |
US10372592B2 (en) | Automatic pre-detection of potential coding issues and recommendation for resolution actions | |
Hata et al. | Learning to generate corrective patches using neural machine translation | |
Gervasi et al. | Lightweight validation of natural language requirements | |
CN104462981B (en) | leak detection method and device | |
CN116126731A (en) | Code standardization method based on generation type pre-training | |
US10459829B2 (en) | Overall test tool migration pipeline | |
US20110067003A1 (en) | System and method of substituting parameter sets in self-contained mini-applications | |
CN104965956A (en) | RUCM based demand verification method | |
CN111103861B (en) | Method and apparatus for developing an integrated system based on vehicle after-market diagnostic needs | |
CN113366453A (en) | Generating test models from behavior driven development scenarios based on behavior driven development step definitions and similarity analysis using neuro-language programming and machine learning mechanisms | |
CN112685315A (en) | C-source code-oriented automatic formal verification tool and method | |
AU2009201680B2 (en) | Computer Implemented Method for Generating Interrelated Computer Executable Files Computer-Based System and Computer Program Product | |
Lopez-Miguel et al. | PLCverif: status of a formal verification tool for programmable logic controller | |
CN115357492A (en) | Formal verification method and device for Java software | |
CN112559359B (en) | S-based 2 ML security critical system analysis and verification method | |
JP5539921B2 (en) | Program development tools | |
MacLennan | The Synmac syntax macroprocessor: Introduction and manual, version 5 | |
Arruda et al. | Automation and consistency analysis of test cases written in natural language: An industrial context | |
US8645908B2 (en) | Method for generating specifications of static test | |
Ha et al. | Meta-validation of UML structural diagrams and behavioral diagrams with consistency rules | |
JP3305049B2 (en) | Software quality management system | |
CN114706789A (en) | API misuse defect repairing method based on prompt learning | |
CN117971534A (en) | LLM model-based kernel defect repairing method | |
Leite et al. | Design Recovery-A Multi-Paradigm Approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |