CN116126731A - Code standardization method based on generation type pre-training - Google Patents

Code standardization method based on generation type pre-training Download PDF

Info

Publication number
CN116126731A
CN116126731A CN202310229771.7A CN202310229771A CN116126731A CN 116126731 A CN116126731 A CN 116126731A CN 202310229771 A CN202310229771 A CN 202310229771A CN 116126731 A CN116126731 A CN 116126731A
Authority
CN
China
Prior art keywords
code
model
training
method based
codes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310229771.7A
Other languages
Chinese (zh)
Inventor
刘梦雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beiyin Financial Technology Co ltd
Original Assignee
Beiyin Financial Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beiyin Financial Technology Co ltd filed Critical Beiyin Financial Technology Co ltd
Priority to CN202310229771.7A priority Critical patent/CN116126731A/en
Publication of CN116126731A publication Critical patent/CN116126731A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3628Software debugging of optimised code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)
  • Stored Programmes (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The code standardization method based on the generation type pre-training provided by the invention adopts the generation type pre-training text processing model, performs model pre-training through an open source code and an own code, and performs model fine adjustment based on the code conforming to a programming specification, so that the model can perform code normalization inspection on the input code and give a modification suggestion, the compliance of an application code to the programming specification and a development standard is improved, and the defects of postmortem and passivity existing in a static code scanning mode are overcome.

Description

Code standardization method based on generation type pre-training
Technical Field
The invention relates to the field of code auditing, in particular to a code standardization method based on generation type pre-training.
Background
In the development process of the application system and the software product, programming specifications are usually formulated to restrict the programming style of developers and the implementation mode to unify the programming style, so that the maintenance and the expansion are convenient, and the project delivery quality is improved. The disclosed 'Aliba Java development manual' is divided into five parts of programming protocol, exception log, mySQL protocol, engineering protocol and safety protocol to restrict the development specification of Java program language, so that development team is helped to develop more efficiently, fault-tolerant and cooperatively, code quality is improved, and project maintenance cost is reduced.
In addition, when software development is combined with the actual business field, the standardization and consistency of the system are generally improved based on specifications such as unified data dictionary, unified message format, unified log format, unified byte code and the like. Thus, compliance and efficient verification of programming development specifications and standards are indispensable.
In addition to manual code auditing, the existing scheme mainly uses a static scanning tool such as Checkstyle, PMD to conduct code normalization inspection. The static scanning tool scans the program code through the technologies such as lexical analysis, grammar analysis, control flow, data analysis and the like in a mode of not running the code, and verifies whether the code meets the indexes such as standardization, safety, reliability, maintainability and the like.
Static code scanning automatically discovers the compliance risk existing in the code in a standardized and automatic mode, helps developers to concentrate on analyzing and solving code design defects, and rapidly locates code hiding errors and defects, so that program delivery quality is guaranteed.
The static code scanning is to perform standard specification inspection on the written code, mark a program which does not meet the standard specification and has problem hidden danger, and provide compliance standard recommendation for post-hoc inspection in the encoding stage.
Static code scanning cannot give modification suggestions for most of the detected problems, and a developer is required to carry out code modification by himself.
Disclosure of Invention
In view of the above, the present invention has been made to provide a code normalization method based on generative pre-training that overcomes or at least partially solves the above-mentioned problems.
According to one aspect of the present invention, there is provided a code normalization method based on a generative pre-training, the code normalization method comprising:
pre-training codes;
code trimming is normalized;
code specification checking and modification suggestions.
Optionally, the pre-training code specifically includes:
the GPT model engine is deployed to pre-train the GPT model with open source code and owned code.
Optionally, the fine tuning of the specification code specifically includes:
after the GPT model is pre-trained, the model is finely tuned by using codes which pass through the manual review or the static scanning tool inspection and meet programming standards, so that the model has generalization capability on text features of the programming standard compliance codes.
Optionally, the code specification checking and modifying suggestion specifically includes:
in the encoding process, a developer can input a completed code or method function as a problem into a GPT model, the model gives a checked problem list, and the problem list is provided for the developer as an output result by generating and matching a repair suggestion.
The code standardization method based on the generation type pre-training provided by the invention adopts the generation type pre-training text processing model, performs model pre-training through an open source code and an own code, and performs model fine adjustment based on the code conforming to a programming specification, so that the model can perform code normalization inspection on the input code and give a modification suggestion, the compliance of an application code to the programming specification and a development standard is improved, and the defects of postmortem and passivity existing in a static code scanning mode are overcome.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a pre-training code provided in an embodiment of the present invention;
FIG. 2 is a flow chart of fine tuning of a code specification according to an embodiment of the present invention;
FIG. 3 is a flow chart of code specification checking and modification suggestions provided by an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The terms "comprising" and "having" and any variations thereof in the description embodiments of the invention and in the claims and drawings are intended to cover a non-exclusive inclusion, such as a series of steps or elements.
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings and the examples.
According to the technical scheme, a GPT model engine is deployed, training is conducted based on open source codes and self codes, fine adjustment is conducted on the model through the self codes conforming to programming specifications, development and Bug repair suggestions can be provided by the model, programming specifications and development standard compliance detection are conducted on the codes, and repair suggestions are given.
As shown in fig. 1, code pre-training: a GPT model engine (transducer) is deployed to pre-train the GPT model with open source code and owned code.
As shown in fig. 2, specification code Fine tuning (Fine-Tune): after the GPT model is pre-trained, the model is subjected to Fine tuning (Fine-Tune) by using codes which pass through and meet programming standards through manual review (Codereview) or static scanning tool inspection, so that the model has generalization capability on text features of programming standard compliance codes.
As shown in fig. 3, code specification checking and modification advice: in the encoding process, a developer can input a completed code or method function as a problem into a GPT model, the model gives a checked problem list, and the problem list is provided for the developer as an output result by generating and matching a repair suggestion.
The invention adopts the generated pre-training text processing model, performs model pre-training through open source codes and own codes, and performs model fine adjustment based on codes conforming to programming specifications, so that the model can perform code normalization inspection on the input codes and give out modification suggestions, the compliance of application codes to the programming specifications and development standards is improved, and the defects of postmortem property and passivity existing in a static code scanning mode are overcome.
The beneficial effects are that: and the model has generalization capability on the text characteristics of the programming specification compliance code by adopting a generating pre-training and fine tuning mode.
The model gives a problem list of code inspection, and gives a repair suggestion for the problem code, so that the development efficiency is further improved.
The foregoing detailed description of the invention has been presented for purposes of illustration and description, and it should be understood that the invention is not limited to the particular embodiments disclosed, but is intended to cover all modifications, equivalents, alternatives, and improvements within the spirit and principles of the invention.

Claims (4)

1. A method of code normalization based on generative pre-training, the method comprising:
pre-training codes;
code trimming is normalized;
code specification checking and modification suggestions.
2. The code normalization method based on the generated pre-training according to claim 1, characterized in that the pre-training code specifically comprises:
the GPT model engine is deployed to pre-train the GPT model with open source code and owned code.
3. The code normalization method based on the generated pre-training according to claim 1, wherein the code trimming specification specifically comprises:
after the GPT model is pre-trained, the model is finely tuned by using codes which pass through the manual review or the static scanning tool inspection and meet programming standards, so that the model has generalization capability on text features of the programming standard compliance codes.
4. The code normalization method based on the generated pre-training according to claim 1, wherein the code specification checking and modifying advice specifically comprises:
in the encoding process, a developer can input a completed code or method function as a problem into a GPT model, the model gives a checked problem list, and the problem list is provided for the developer as an output result by generating and matching a repair suggestion.
CN202310229771.7A 2023-03-10 2023-03-10 Code standardization method based on generation type pre-training Pending CN116126731A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310229771.7A CN116126731A (en) 2023-03-10 2023-03-10 Code standardization method based on generation type pre-training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310229771.7A CN116126731A (en) 2023-03-10 2023-03-10 Code standardization method based on generation type pre-training

Publications (1)

Publication Number Publication Date
CN116126731A true CN116126731A (en) 2023-05-16

Family

ID=86297601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310229771.7A Pending CN116126731A (en) 2023-03-10 2023-03-10 Code standardization method based on generation type pre-training

Country Status (1)

Country Link
CN (1) CN116126731A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116611074A (en) * 2023-07-17 2023-08-18 北京奇虎科技有限公司 Security information auditing method, device, storage medium and apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116611074A (en) * 2023-07-17 2023-08-18 北京奇虎科技有限公司 Security information auditing method, device, storage medium and apparatus

Similar Documents

Publication Publication Date Title
US10346140B2 (en) System and method for model based technology and process for safety-critical software development
US10372592B2 (en) Automatic pre-detection of potential coding issues and recommendation for resolution actions
Hata et al. Learning to generate corrective patches using neural machine translation
Gervasi et al. Lightweight validation of natural language requirements
CN104462981B (en) leak detection method and device
CN116126731A (en) Code standardization method based on generation type pre-training
US10459829B2 (en) Overall test tool migration pipeline
US20110067003A1 (en) System and method of substituting parameter sets in self-contained mini-applications
CN104965956A (en) RUCM based demand verification method
CN111103861B (en) Method and apparatus for developing an integrated system based on vehicle after-market diagnostic needs
CN113366453A (en) Generating test models from behavior driven development scenarios based on behavior driven development step definitions and similarity analysis using neuro-language programming and machine learning mechanisms
CN112685315A (en) C-source code-oriented automatic formal verification tool and method
AU2009201680B2 (en) Computer Implemented Method for Generating Interrelated Computer Executable Files Computer-Based System and Computer Program Product
Lopez-Miguel et al. PLCverif: status of a formal verification tool for programmable logic controller
CN115357492A (en) Formal verification method and device for Java software
CN112559359B (en) S-based 2 ML security critical system analysis and verification method
JP5539921B2 (en) Program development tools
MacLennan The Synmac syntax macroprocessor: Introduction and manual, version 5
Arruda et al. Automation and consistency analysis of test cases written in natural language: An industrial context
US8645908B2 (en) Method for generating specifications of static test
Ha et al. Meta-validation of UML structural diagrams and behavioral diagrams with consistency rules
JP3305049B2 (en) Software quality management system
CN114706789A (en) API misuse defect repairing method based on prompt learning
CN117971534A (en) LLM model-based kernel defect repairing method
Leite et al. Design Recovery-A Multi-Paradigm Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination