CN116126731A

CN116126731A - Code standardization method based on generation type pre-training

Info

Publication number: CN116126731A
Application number: CN202310229771.7A
Authority: CN
Inventors: 刘梦雯
Original assignee: Beiyin Financial Technology Co ltd
Current assignee: Beiyin Financial Technology Co ltd
Priority date: 2023-03-10
Filing date: 2023-03-10
Publication date: 2023-05-16

Abstract

The code standardization method based on the generation type pre-training provided by the invention adopts the generation type pre-training text processing model, performs model pre-training through an open source code and an own code, and performs model fine adjustment based on the code conforming to a programming specification, so that the model can perform code normalization inspection on the input code and give a modification suggestion, the compliance of an application code to the programming specification and a development standard is improved, and the defects of postmortem and passivity existing in a static code scanning mode are overcome.

Description

Code standardization method based on generation type pre-training

Technical Field

The invention relates to the field of code auditing, in particular to a code standardization method based on generation type pre-training.

Background

In the development process of the application system and the software product, programming specifications are usually formulated to restrict the programming style of developers and the implementation mode to unify the programming style, so that the maintenance and the expansion are convenient, and the project delivery quality is improved. The disclosed 'Aliba Java development manual' is divided into five parts of programming protocol, exception log, mySQL protocol, engineering protocol and safety protocol to restrict the development specification of Java program language, so that development team is helped to develop more efficiently, fault-tolerant and cooperatively, code quality is improved, and project maintenance cost is reduced.

In addition, when software development is combined with the actual business field, the standardization and consistency of the system are generally improved based on specifications such as unified data dictionary, unified message format, unified log format, unified byte code and the like. Thus, compliance and efficient verification of programming development specifications and standards are indispensable.

In addition to manual code auditing, the existing scheme mainly uses a static scanning tool such as Checkstyle, PMD to conduct code normalization inspection. The static scanning tool scans the program code through the technologies such as lexical analysis, grammar analysis, control flow, data analysis and the like in a mode of not running the code, and verifies whether the code meets the indexes such as standardization, safety, reliability, maintainability and the like.

Static code scanning automatically discovers the compliance risk existing in the code in a standardized and automatic mode, helps developers to concentrate on analyzing and solving code design defects, and rapidly locates code hiding errors and defects, so that program delivery quality is guaranteed.

The static code scanning is to perform standard specification inspection on the written code, mark a program which does not meet the standard specification and has problem hidden danger, and provide compliance standard recommendation for post-hoc inspection in the encoding stage.

Static code scanning cannot give modification suggestions for most of the detected problems, and a developer is required to carry out code modification by himself.

Disclosure of Invention

In view of the above, the present invention has been made to provide a code normalization method based on generative pre-training that overcomes or at least partially solves the above-mentioned problems.

According to one aspect of the present invention, there is provided a code normalization method based on a generative pre-training, the code normalization method comprising:

pre-training codes;

code trimming is normalized;

code specification checking and modification suggestions.

Optionally, the pre-training code specifically includes:

the GPT model engine is deployed to pre-train the GPT model with open source code and owned code.

Optionally, the fine tuning of the specification code specifically includes:

after the GPT model is pre-trained, the model is finely tuned by using codes which pass through the manual review or the static scanning tool inspection and meet programming standards, so that the model has generalization capability on text features of the programming standard compliance codes.

Optionally, the code specification checking and modifying suggestion specifically includes:

in the encoding process, a developer can input a completed code or method function as a problem into a GPT model, the model gives a checked problem list, and the problem list is provided for the developer as an output result by generating and matching a repair suggestion.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a pre-training code provided in an embodiment of the present invention;

FIG. 2 is a flow chart of fine tuning of a code specification according to an embodiment of the present invention;

FIG. 3 is a flow chart of code specification checking and modification suggestions provided by an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terms "comprising" and "having" and any variations thereof in the description embodiments of the invention and in the claims and drawings are intended to cover a non-exclusive inclusion, such as a series of steps or elements.

The technical scheme of the invention is further described in detail below with reference to the accompanying drawings and the examples.

According to the technical scheme, a GPT model engine is deployed, training is conducted based on open source codes and self codes, fine adjustment is conducted on the model through the self codes conforming to programming specifications, development and Bug repair suggestions can be provided by the model, programming specifications and development standard compliance detection are conducted on the codes, and repair suggestions are given.

As shown in fig. 1, code pre-training: a GPT model engine (transducer) is deployed to pre-train the GPT model with open source code and owned code.

As shown in fig. 2, specification code Fine tuning (Fine-Tune): after the GPT model is pre-trained, the model is subjected to Fine tuning (Fine-Tune) by using codes which pass through and meet programming standards through manual review (Codereview) or static scanning tool inspection, so that the model has generalization capability on text features of programming standard compliance codes.

As shown in fig. 3, code specification checking and modification advice: in the encoding process, a developer can input a completed code or method function as a problem into a GPT model, the model gives a checked problem list, and the problem list is provided for the developer as an output result by generating and matching a repair suggestion.

The invention adopts the generated pre-training text processing model, performs model pre-training through open source codes and own codes, and performs model fine adjustment based on codes conforming to programming specifications, so that the model can perform code normalization inspection on the input codes and give out modification suggestions, the compliance of application codes to the programming specifications and development standards is improved, and the defects of postmortem property and passivity existing in a static code scanning mode are overcome.

The beneficial effects are that: and the model has generalization capability on the text characteristics of the programming specification compliance code by adopting a generating pre-training and fine tuning mode.

The model gives a problem list of code inspection, and gives a repair suggestion for the problem code, so that the development efficiency is further improved.

The foregoing detailed description of the invention has been presented for purposes of illustration and description, and it should be understood that the invention is not limited to the particular embodiments disclosed, but is intended to cover all modifications, equivalents, alternatives, and improvements within the spirit and principles of the invention.

Claims

1. A method of code normalization based on generative pre-training, the method comprising:

pre-training codes;

code trimming is normalized;

code specification checking and modification suggestions.

2. The code normalization method based on the generated pre-training according to claim 1, characterized in that the pre-training code specifically comprises:

3. The code normalization method based on the generated pre-training according to claim 1, wherein the code trimming specification specifically comprises:

4. The code normalization method based on the generated pre-training according to claim 1, wherein the code specification checking and modifying advice specifically comprises: