CN116991990A - Program development assisting method, storage medium and device based on AIGC - Google Patents

Program development assisting method, storage medium and device based on AIGC Download PDF

Info

Publication number
CN116991990A
CN116991990A CN202310814622.7A CN202310814622A CN116991990A CN 116991990 A CN116991990 A CN 116991990A CN 202310814622 A CN202310814622 A CN 202310814622A CN 116991990 A CN116991990 A CN 116991990A
Authority
CN
China
Prior art keywords
aigc
word
model
prompt
sensitive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310814622.7A
Other languages
Chinese (zh)
Inventor
柳琰峰
阳成文
周斌
王志伟
宋荣康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shizhuang Information Technology Co ltd
Original Assignee
Shanghai Shizhuang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shizhuang Information Technology Co ltd filed Critical Shanghai Shizhuang Information Technology Co ltd
Priority to CN202310814622.7A priority Critical patent/CN116991990A/en
Publication of CN116991990A publication Critical patent/CN116991990A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code

Abstract

The application provides a program development assisting method, a storage medium and equipment based on AIGC. Collecting prompt words input by a user in a pre-trained AIGC model; matching the prompting words with a preset sensitive information base, and confirming whether the prompting words are compliant; and when confirming the compliance of the prompt word, controlling the AIGC model to generate information matched with the prompt word. The program development assisting method based on the AIGC can help program developers reduce repetitive work based on the AIGC model, simplify the program development work and simultaneously avoid leakage of sensitive information in the interaction process with the AIGC model.

Description

Program development assisting method, storage medium and device based on AIGC
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to a program development assisting method, a storage medium and equipment based on AIGC.
Background
As the ChatGPT is now hot, various AIGCs (AI Generated Content, artificial intelligence generation content) are also flooded. Currently, program developers often use search engines to assist in solving problems encountered in program development, and answers searched in this way cannot be matched with current requirements, so that the problems cannot be quickly and accurately solved.
With the hot trend of ChatGPT, various AIGCs (AI Generated Content, artificial intelligence generation content) are developed, and program developers can precisely match the current requirements to meet the scenes by inputting keywords, so that the problems can be quickly solved.
However, the current program implementation scheme based on AIGC is easy to cause content leakage risk, and many current programs only record logs in the process of interaction between a user and an AIGC model, provide a function of post-audit, cannot intercept sensitive information in real time, and still cause the problem of sensitive information leakage.
Disclosure of Invention
The application provides an AIGC-based program development assisting method, a storage medium and equipment, which are used for assisting program development based on AIGC and avoiding sensitive information leakage in the interaction process with an AIGC model.
In a first aspect, an embodiment of the present application provides an AIGC-based program development assistance method, including: collecting prompt words input by a user in a pre-trained AIGC model; matching the prompting words with a preset sensitive information base, and confirming whether the prompting words are compliant; and when confirming the compliance of the prompt word, controlling the AIGC model to generate information matched with the prompt word.
In one implementation manner of the first aspect, training the AIGC model is further included; training the AIGC model includes: acquiring a prompting word data training set; adopting an attention mechanism to distribute weights to the cue word training data in the cue word data training set to form weight tag training data; and inputting the weight tag training data into a network model for training, and obtaining the AIGC model capable of generating information matched with the prompt word.
In an implementation manner of the first aspect, the network model is a generated countermeasure network model, a self-dividing coding network model, a diffusion model or a transducer neural network model.
In one implementation manner of the first aspect, training the AIGC model further includes: prompting word fine tuning training; the prompt word fine tuning training comprises the following steps: acquiring the prompting word data in the prompting word data training set; converting the prompt word data into prompt words containing empty slots based on a preset template library; inputting the prompt word and answer data set containing the empty slots into a network model for training, and obtaining the AIGC model capable of searching answer data filling the empty slots from the answer data set.
In an implementation manner of the first aspect, the controlling the AIGC model to generate information matching the hint word includes: converting the prompt word into a prompt word containing empty slots based on a preset template library; the AIGC model searches answer data matched with the empty slots from the answer data set; mapping the answer data to the corresponding empty slots to form optimized prompt words; and controlling the AIGC model to generate information matched with the optimized prompt word.
In an implementation manner of the first aspect, the matching the prompting word with a preset sensitive information base, and determining whether the prompting word is compliant includes: inputting the prompt word into a pre-trained large language model, and carrying out text recognition on the prompt word by the large language model to obtain suspected sensitive words in the prompt word; matching the suspected sensitive word with a sensitive word in a sensitive word bank, and acquiring a corresponding sensitive word in the sensitive word bank and a classification label corresponding to the sensitive word when the sensitive word matched with the suspected sensitive word exists in the sensitive word; and determining whether the suspected sensitive words in the prompting words are compliant or not based on the suspected sensitive words, the acquired sensitive words in the sensitive word bank, the classification labels and a pre-configured compliance strategy, and outputting a compliance auditing result.
In one implementation manner of the first aspect, any one or more of the following is further included: detecting program codes input by a user, and when detecting that the codes are abnormal, carrying out alarm prompt and searching to obtain optimization suggestions; detecting an SQL database corresponding to the program code, and when detecting that the SQL database is abnormal, carrying out alarm prompt and searching to obtain optimization suggestions; and generating a test program for executing the corresponding test function based on the test data input by the user.
In an implementation manner of the first aspect, the method further includes: and collecting log information in the interaction process of the user and the AIGC model in real time.
In a second aspect, an embodiment of the present application provides a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the AIGC-based program development assistance method of any one of the first aspects of the present application.
In a third aspect, an embodiment of the present application provides an electronic device, including: a memory storing a computer program; and the processor is in communication connection with the memory and executes the program development assisting method based on AIGC according to any one of the first aspect of the application when the computer program is called.
The program development assisting method based on the AIGC can help program developers reduce repetitive work based on the AIGC model, simplify the program development work and simultaneously avoid leakage of sensitive information in the interaction process with the AIGC model.
Drawings
Fig. 1 is a schematic diagram of an application principle of an AIGC-based program development assistance method according to an embodiment of the present application.
Fig. 2 is a flowchart illustrating an AIGC-based program development assistance method according to an embodiment of the present application.
Fig. 3 is a schematic flow chart of training an AIGC model in an AIGC-based program development assistance method according to an embodiment of the application.
Fig. 4 is a schematic diagram illustrating a principle of using an attention mechanism to assign weights to the training data of the cue word in the training set of the training words in the AIGC-based program development assistance method according to an embodiment of the present application.
Fig. 5 is a schematic flow chart of a prompt word fine tuning training in an AIGC-based program development assistance method according to an embodiment of the application.
Fig. 6 is a schematic flow chart of a program development assistance method based on AIGC according to an embodiment of the present application for controlling an AIGC model to generate information matching a hint word.
Fig. 7 is a schematic diagram illustrating an implementation application of an AIGC-based program development assistance method according to an embodiment of the application.
Fig. 8 is a schematic diagram illustrating an implementation principle of an AIGC-based program development assistance method according to an embodiment of the present application.
Fig. 9 is a schematic diagram illustrating an implementation process of an AIGC-based program development assistance method according to an embodiment of the application.
Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the application.
Description of element reference numerals
100. Electronic equipment
101. Memory device
102. Processor and method for controlling the same
103. Display device
S100 to S300 steps
S310 to S340 steps
Steps S401 to S403
S410 to S430 steps
Detailed Description
Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
The program codes generated by the prior art have low availability, cannot be used accurately, have poor user experience, cannot obtain the wanted answer through the brief keywords, and can also easily expose the sensitive information of the company directly, so that the information security risk is caused
The embodiment of the application provides an AIGC-based program development assisting method which is used for assisting program development based on AIGC and avoiding sensitive information leakage in the interaction process with an AIGC model.
Fig. 1 is a schematic diagram of an application principle of an AIGC-based program development assistance method according to an embodiment of the present application. As shown in fig. 1, in the AIGC-based program development assistance method of the present embodiment, a sensitive word library is built in advance, and a user assists program development through an AIGC model, so as to help a program developer reduce repetitive work and simplify program development work. In the development process of the auxiliary program based on the AIGC model, keywords are input into the AIGC model, information communicated with the AIGC model by a user is collected through a Kafka tool, and a Flink tool is used for providing matching identification of second-level suspected sensitive words and sensitive words in a sensitive word bank. Determining whether suspected sensitive words in the prompt words are in compliance or not, prompting the user risk information if the suspected sensitive words are not in compliance, calling an AIGC model, and generating information matched with the prompt words by the AIGC model. The embodiment can ensure the safety of data input by a book in real time, limit the return of sensitive information and avoid information leakage.
The following describes in detail the technical schemes of the program development supporting method, the storage medium and the apparatus based on the AIGC according to the embodiment of the present application with reference to fig. 1 to 10. The AIGC-based program development assistance method, the storage medium, and the apparatus of the present embodiment can be understood by those skilled in the art without the need for creative efforts.
Fig. 2 is a flowchart illustrating a program development assistance method based on an AIGC in an embodiment of the present application. As shown in fig. 2, the AIGC-based program development assistance method provided by the embodiment of the present application includes the following steps S100 to S400.
Step S100, collecting prompt words input by a user in a pre-trained AIGC model;
step S200, matching the prompting words with a preset sensitive information base, and confirming whether the prompting words are compliant or not;
and step S300, when confirming the compliance of the prompt word, controlling the AIGC model to generate information matched with the prompt word.
Steps S100 to S300 of the AIGC-based program development assistance method of the present embodiment are specifically described below.
Step S100, collecting prompt words input by a user in a pre-trained AIGC model.
The embodiment assists program development based on the AIGC model, can help program developers to reduce repetitive work and simplify program development work.
In this embodiment, only the entry of the prompt word is exposed to the user by encapsulating the large text.
In this embodiment, a prompt word specification input by a user is predetermined:
1) Prompt word format: directly issuing instructions or using a question-answer mode.
2) Prompt word element:
2-1) instruction: a particular task or instruction that the model is intended to perform.
2-2) context: containing external information or additional context information, a language model responds better.
2-3) input data: content or questions entered by the user.
2-4) output prompts: the type or format of the output is specified.
3) Prompt style:
3-1) precise instruction: the more accurate the instruction, the more detailed the requirement, and the more satisfactory the answer obtained.
3-2) role hint: in this application is set as follows: if you are a software development engineer, xxxx.
3-3) zero sample hint: directly raise the problem and do not raise the front problem.
3-4) Single sample hint: in addressing the problem, an example is given first, and the model will be understood from the given example.
3-5) few samples cues: with respect to single sample cues, several examples are given before a problem is posed.
In one implementation of this embodiment, training the AIGC model is further included. And (5) returning information and content meeting the user requirements by training an AIGC model. Fig. 3 is a schematic flow chart of training an AIGC model in an AIGC-based program development assistance method according to an embodiment of the application. As shown in fig. 3, training the AIGC model includes:
step S410, acquiring a prompt word data training set;
and step S420, adopting an attention mechanism to distribute weights to the cue word training data in the cue word data training set to form weight tag training data.
Fig. 4 is a schematic diagram illustrating a principle of using an attention mechanism to assign weights to the training data of the cue word in the training set of the training words in the AIGC-based program development assistance method according to an embodiment of the present application. The attention mechanism (attention) is adopted to allocate different weights to different importance of input data, and the advantage of parallelization processing can enable the input data to be trained in a larger data set, so that development of a pre-training large model such as GPT (general purpose input) is accelerated, and translation among different languages can be completed. The body of the attention mechanism includes an encocoder and a Decoder, respectively, encoding the source language and converting the encoded information into target language text.
Step S430, inputting the weight tag training data into a network model for training, and obtaining the AIGC model capable of generating information matched with the prompt word.
In this embodiment, the network model employs, but is not limited to, a generative antagonism network (Generative Adversarial Networks, GAN) model, a self-organizing coding network model, a diffusion model, or a Transformer neural network model.
In one implementation of this embodiment, training the AIGC model further includes: and (5) fine tuning training of the prompt words. Through the fine adjustment training of the prompt words, the accuracy of recognition of the prompt words can be improved, and the communication cost of the user in the interaction process with the AI is greatly simplified. Fig. 5 is a schematic flow chart of a prompt word fine tuning training in an AIGC-based program development assistance method according to an embodiment of the application. The prompt word fine tuning training comprises the following steps:
step S401, acquiring the prompting word data in the prompting word data training set;
step S402, converting the prompt word data into prompt words containing empty slots based on a preset template library;
step S403, inputting the prompt word and answer data set containing the empty slot into a network model for training, and obtaining the AIGC model capable of searching the answer data filling the empty slot from the answer data set.
Specifically, one implementation manner of performing the model fine tuning training in this embodiment is as follows:
1) And (3) template design: and forming a preset template library by manually or automatically designing templates. The input X (e.g., optimize ranking algorithm) is converted to X (e.g., i am a software development engineer, let me write a ranking algorithm ____ with low time complexity). Typically, the space slots are included in X, and y () is deduced by letting the training network model fill the space slots. The preset template library is flexible and changeable according to the required design, and a proper template is selected according to the downstream task and the pre-training language model.
2) Answer data search: after the X is obtained through the template, the training network model searches in the answer data set to find the answer data which is most suitable to be filled in the empty slot, for example, the answer data value with the highest score is found out to be filled in the corresponding empty slot through calculating the score by the matching degree.
) Answer mapping: after the filling value corresponding to the empty slot is obtained through answer search, the slot value of part of tasks is the final result, the slot value of part of tasks needs to be converted, and the slot value is corresponding to the final output label y (the sequencing algorithm with low time complexity). According to the embodiment, through fine adjustment of the model, the prompt words input by the user can be converted into prompt words which are easier to understand and match with the result by the AIGC model, and the accuracy and efficiency of the AIGC prompt words are effectively improved.
Step S200, matching the prompting words with a preset sensitive information base, and confirming whether the prompting words are compliant or not. By only safely auditing the prompt words and confirming whether the prompt words are compliant, the leakage of sensitive information in the process of interacting with the AI in the process of program development can be avoided.
In an implementation manner of this embodiment, the matching the prompting word with a preset sensitive information base, and determining whether the prompting word is compliant includes:
1) Inputting the prompt word into a pre-trained large language model, and carrying out text recognition on the prompt word by the large language model to obtain suspected sensitive words in the prompt word. The large language model (LLM, large Language ModelsAn) refers to a deep learning model trained by using a large amount of text data, can generate natural language text or understand meaning of the natural language text, is a natural language processing model with large-scale parameters and complex structures constructed based on a deep learning technology, can process various natural language tasks such as text classification, question-answering, dialogue and the like, and is an important path leading to artificial intelligence.
In this embodiment, before the prompt word is input to the pre-trained large language model, the method further includes preprocessing a text, performing coding processing on the preprocessed training set to form a coded text, and then inputting the coded text to the large language model for text recognition to obtain a suspected sensitive word in the prompt word.
In one possible implementation, the text is preprocessed, including but not limited to removing punctuation marks, stop words, and other irrelevant information, and performing word drying (stemming) or word shape reduction (stemming) to reduce noise and normalize the text.
In one possible implementation, the pre-processed training set is encoded to form encoded text. I.e., the preprocessed cue words are converted into an input encoded form acceptable to the model, including but not limited to, word or subword segmentation of the cue words and mapping thereof into a vector representation. Among them, the coding methods employed include, but are not limited to, word embedding (Word embedding) such as Word2Vec or GloVe, and subword embedding (subword embeddings) such as BERT or FastText.
In one implementation, further comprising training the large language model; training the large language model includes:
1) A training set containing sensitive words is obtained.
The sources of the sensitive words in the training set include, but are not limited to, any one or more of sensitive words passing through historical audits (such as community dynamic, searching, column, and the like), sensitive word banks (the sensitive word banks constructed by means of artificial word expansion, machine learning model generation, and the like), sensitive words input by users and variants thereof.
2) And adding a bypass matrix comprising a dimension reduction matrix and a dimension increase matrix into the original open-source large language model, training the open-source large language model by adopting the training set, and fine-tuning and optimizing the bypass matrix.
The large language model is a generating type and other language model, and the main aim is to generate natural language response related to input, so that the large language model has better semantic understanding capability. In this embodiment, the original open source large language model includes, but is not limited to, a large language model such as ChatGLM, stableVicuna. The large language model in this embodiment is an open source large language model, in which the code is open source, the data set is open source, and has authorized permissions.
In the training stage, training an open source large language model by using a training set with labels and fine tuning and optimizing the bypass matrix. And then, overlapping the training output of training the open-source large language model with the optimized output of fine tuning and optimizing the bypass matrix, and outputting the overlapped training output.
And performing parameter fine tuning on the ChatGLM-6B large language model based on LoRA in the Huggingface peft library by using the content security sensitive word library and the data set of the history audit. In the implementation process of LoRA, matrix parameters of a large language model are frozen, and a dimension-reducing matrix and a dimension-increasing matrix are selected to replace the matrix parameters, and only the dimension-reducing matrix and the dimension-increasing matrix are updated when the model is trained.
In one possible implementation, the dimension-reduction matrix is initialized with a random gaussian distribution and the dimension-increase matrix is initialized with an all-zero matrix.
In one possible implementation, the optimization parameters in the bypass matrix include any one or more combination of loading pre-training model weights, adding training data, and adjusting the super parameters of the model.
In the fine tuning process, the learning rate, the training iteration number and the like can also be adjusted. After the fine tuning is completed, the performance of the large language model can also be evaluated and optimally evaluated. The performance of a large language model on a specific domain task is measured by some evaluation indexes. If the large language model performs poorly, it may be further optimized by adjusting training parameters, increasing the size of the data set, or making more fine-tuning.
3) And overlapping the training output of training the open-source large language model with the optimized output of fine-tuning and optimizing the bypass matrix, and outputting the overlapped training output.
The specific principle of training the large language model in this embodiment is as follows:
1) A bypass matrix is added beside the original large language model, the bypass matrix comprises a dimension reduction matrix and a dimension increase matrix, and the dimension reduction operation and the dimension increase operation are carried out through the dimension reduction matrix and the dimension increase matrix, so that the so-called intrinsic rank is simulated.
2) The parameters of the open source large language model are fixed and unchanged during training, and only the dimension-reducing matrix and the dimension-increasing matrix are trained, namely, the optimizer only optimizes the parameters of the right path;
3) The input and output dimensions of the original large language model are unchanged, the original large language model and the bypass matrix share the input training set, and the output of the original large language model and the output of the bypass matrix are overlapped during output;
4) Initializing a dimension-reducing matrix by using a random Gaussian distribution, and initializing a dimension-increasing matrix by using a full-zero matrix. The zero initialization of the matrix dimension-increasing matrix is performed so that the result of the bypass matrix approaches 0 in a period of time when training is started, and the output after superposition is basically from the original large language model, namely the calculation result of the original parameters of the large language model, so that the initial point of model optimization is consistent with the original large model.
In this embodiment, based on the sensitive words passing the history audit, the sensitive word library, the sensitive words input by the user, the variants thereof and the like train the large language model, so that the large language model performs deep learning and semantic understanding on the prompt words, can recognize the variant and metaphorically expressed sensitive words, input the recognized sensitive words into the sensitive words, and can also perform regular update and maintenance on the sensitive word library according to the actual situation and user feedback so as to expand the data in the sensitive word library, update and expand the sensitive word library in real time and cope with the newly appeared sensitive words.
Through the trained large language model, deep learning and semantic understanding can be carried out on the prompt words, and the sensitive words expressed by variants and metaphors can be accurately identified. Inputting the encoded prompt words into a large language model for semantic analysis and classification, and recognizing the prompt words by the trained large language model to obtain suspected sensitive words in the prompt words.
2) Matching the suspected sensitive words with sensitive words in a sensitive word stock, and acquiring corresponding sensitive words in the sensitive word stock and classification labels corresponding to the sensitive words when the sensitive words matched with the suspected sensitive words exist in the sensitive words.
In this embodiment, a sensitive word library including various sensitive words is constructed in advance, and the sensitive word library can be maintained and updated by professionals or special institutions of the sensitive words. The sensitive word library should contain various types of sensitive words including, for example, sensitive words, company core code, database table names.
In this embodiment, after the part of speech and the semantic analysis are performed on the prompt word through the large language model, the prompt word is matched through the sensitive word stock, so that the efficiency is improved. The sensitive word library comprises various types of sensitive words and corresponding classification labels.
In this embodiment, the suspected sensitive words in the prompt words are matched with the keywords in the sensitive word stock.
And matching keywords in the sensitive word stock through a character string matching algorithm to obtain a matching result and a part-of-speech tagging result.
The prompting words are matched with the keywords in the sensitive word stock, and the keywords can be matched with the sensitive word stock through a character string matching algorithm, such as a KMP algorithm. Traversing each vocabulary of the prompt word, comparing the vocabulary with the keywords in the sensitive word stock one by one, and judging that the prompt word contains the sensitive word if the keywords are found to be matched.
3) And determining whether the suspected sensitive words in the prompting words are compliant or not based on the suspected sensitive words, the acquired sensitive words in the sensitive word bank, the classification labels and a pre-configured compliance strategy, and outputting a compliance auditing result.
Specifically, in this embodiment, the preconfigured compliance policies include any one or two of the following combinations:
1) Audit rules formed based on, but not limited to, the number of sensitive word matches, the weight of the sensitive word, the threshold, the context; the auditing rules may be regular expressions, pattern matching rules, etc.
2) An audit model constructed based on, but not limited to, any one or more machine learning algorithms of a decision tree, a random forest, a support vector machine, a neural network.
Wherein the weighing factors of the auditing rules include, but are not limited to, any one or more of the following combinations:
1) Severity and weight of sensitive words: different weights and processing strategies are given for different sensitive words. Certain sensitive words may pose a greater threat to platform security and user experience, requiring more stringent handling measures.
2) Context analysis and context understanding: the auditing decision needs to consider the context information and the context of the prompt words provided by the natural language processing module so as to avoid erroneous judgment of normal prompt words. And comprehensively judging the prompt words according to the semantic relation and emotion analysis of the context.
3) Threshold setting: for some metrics, such as the number of matches or confidence scores of the sensitive words, a threshold is set to determine whether the sensitive words belong to offending content. According to the requirements and the risk bearing capacity of the user, the threshold value can be adjusted to balance the problems of false alarm and missing alarm.
In this embodiment, according to the matching result and the classification information of the sensitive word, whether the prompt word is illegal or not is determined. Different audit levels and processing measures such as warning, deletion, blocking, etc. may also be provided.
In this embodiment, a sensitive information base (e.g., including sensitive words, company core code, database table names) needs to be created in advance. For example, the Kafka tool collects information communicated by the user with the AIGC model, and the Flink tool provides matching identification of second-level suspected sensitive words and sensitive words in the sensitive word stock. And determining whether the suspected sensitive words in the prompt words are in compliance or not, prompting risk information of the user, calling an AIGC model through compliance, and generating information matched with the prompt words by the AIGC model. The embodiment can ensure the safety of data input by a book in real time, limit the return of sensitive information and avoid information leakage.
And step S300, when confirming the compliance of the prompt word, controlling the AIGC model to generate information matched with the prompt word.
Fig. 6 is a schematic flow chart of a program development assistance method based on AIGC according to an embodiment of the present application for controlling an AIGC model to generate information matching a hint word. In this embodiment, the controlling the AIGC model to generate the information matching the hint word includes:
step S310, converting the prompt word into a prompt word containing empty slots based on a preset template library;
step S320, the AIGC model searches answer data matched with the empty slots from the answer data set;
step S330, mapping the answer data to the corresponding empty slots to form optimized prompt words; and controlling the AIGC model to generate information matched with the optimized prompt word.
Therefore, the matching information can be quickly generated according to the prompt words input by the user through the trained AIGC model, thereby effectively helping program developers to reduce repetitive work, improving working efficiency and simplifying program development work.
In addition, in one implementation of the present embodiment, any one or more of the following is further included:
1) And detecting program codes input by a user, and when detecting that the codes are abnormal, carrying out alarm prompt and searching to obtain optimization suggestions. Specifically, by pre-setting coding specifications in advance, BUG which possibly appears in places where the user does not accord with the specifications in the code writing process is detected in real time, and optimization suggestions are given.
2) And detecting the SQL database corresponding to the program codes, and when detecting that the SQL database is abnormal, carrying out alarm prompt and searching to obtain optimization suggestions.
3) And generating a test program for executing the corresponding test function based on the test data input by the user. Wherein the test data includes, but is not limited to, test variables, test methods, and the like.
In one implementation manner of this embodiment, the method further includes: and collecting log information in the interaction process of the user and the AIGC model in real time. The embodiment provides a post audit function through log records, and ensures that sensitive information is not leaked.
Fig. 7 is a schematic diagram illustrating an implementation application of an AIGC-based program development assistance method according to an embodiment of the application. It should be noted that, the program development assistance method based on the AIGC may be applied to various types of hardware devices in the client portion. The hardware device is, for example, a controller, specifically, a ARM (Advanced RISC Machines) controller, a FPGA (Field Programmable Gate Array) controller, a SoC (System on Chip) controller, a DSP (Digital Signal Processing) controller, or a MCU (Micorcontroller Unit) controller, or the like. The hardware device may also be, for example, a computer including memory, a memory controller, one or more processing units (CPUs), peripheral interfaces, RF circuitry, audio circuitry, speakers, microphones, input/output (I/O) subsystems, display screens, other output or control devices, and external ports; the computer includes, but is not limited to, a personal computer such as a desktop computer, a notebook computer, a tablet computer, a smart phone, a smart television, a personal digital assistant (Personal Digital Assistant, PDA for short), and the like. In other embodiments, the hardware device may also be a server, where the server may be disposed on one or more physical servers according to a plurality of factors such as a function, a load, and the like, or may be formed by a distributed or centralized server cluster, which is not limited in this embodiment.
In one embodiment, the AIGC-based program development assistance method may display a Graphical User Interface (GUI) in which the AIGC-based program development assistance data associated therewith is presented at the electronic terminal of the client.
In an embodiment, the electronic terminal may be, for example, a fixed terminal, such as a server, desktop, or the like; and may also be a mobile terminal such as a notebook, smart phone or tablet computer.
In an embodiment, the electronic terminal may be implemented in an offline or online state in the presentation of program development assistance method data of the AIGC.
In an example, the electronic terminal may not access the internet, and is provided with a client APP, the client may log in to the client APP through pre-registered account information, the client APP may authenticate itself, and after the authentication is passed, provide program development assistance data of the AIGC related to the account information; if the program development auxiliary method data based on AIGC is updated, the client can update according to an offline data packet, and the transmission mode of the offline data packet is as follows: such as a gate update service; or, providing online downloading of the offline data packet, so that the client updates the offline system after downloading through a terminal capable of accessing the Internet; or, the mobile hard disk or the USB flash disk carrying the offline data packet is used for updating.
Optionally, the electronic terminal installs client software, and the client software of the electronic terminal may generate a Graphical User Interface (GUI), and in addition, the electronic terminal may also be provided with a Browser (Browser) for displaying the client service graphical interface; based on the B/S architecture, the hard software requirement on the electronic terminal of the client can be greatly reduced, the electronic terminal of the client does not need to be provided with client software, and only a web browser is needed, so that the user experience of the client can be greatly improved.
In the embodiment of the application, the graphical user interface is accessed by a webpage browsed by a browser loaded by a user terminal; wherein the user terminal comprises a PC; or the graphical user interface is accessed by an interface provided by integrated service platform software loaded by the user terminal; the user terminal comprises a mobile terminal and a PC; the mobile terminal comprises a smart phone or a tablet computer; the integrated service platform software comprises WeChat and/or payment treasures.
It should be noted that the graphical user interfaces displayed by the electronic terminals corresponding to different types of electronic terminals may also be different.
In particular, in some embodiments, for the electronic terminal to be a PC terminal, it may access a particular web page by browsing the web page through a loaded browser (including but not limited to IE, google, 360, QQ, dog search, hundred degrees, aoGage, UC, fire fox, cheetah, 2345, oupong, etc. browsers) and accessing a predetermined URL with the web page as an interface, and displaying a graphical user interface in the particular web page.
In yet other embodiments, for the electronic terminal to be a mobile terminal (e.g., smart phone, tablet computer), it may access the graphical user interface through a web page or web applet in integrated platform software such as WeChat, payment Pop, etc., an IDE plug-in.
The interface for entering the graphical user interface is provided in the WeChat applet, and the user can add the WeChat applet by scanning the two-dimensional code or searching the WeChat applet, so as to operate (e.g. click) the WeChat applet, thereby entering the graphical user interface.
The embodiment realizes specific functions of an AIGC-based program development auxiliary method at a service layer, including code prompting, code abnormality alarming, SOL optimization suggestion, unit test writing, detecting whether prompting words input by a user contain sensitive information, shielding the sensitive information, recording logs for security audit, AIGC model content generation and the like. At the service layer, the log is monitored, an AIGC model is trained, tools such as Flink, ODPS, kafka, dataworks and the like are provided for realizing an AIGC-based program development assistance method, for example, data involved in the AIGC-based program development assistance method is stored in a NebutaGraph database through a MySQL database.
Fig. 8 is a schematic diagram illustrating the implementation principle of the program development assistance method based on the AIGC of the present embodiment. Fig. 9 is a schematic diagram showing an implementation process of the program development supporting method based on the AIGC of the present embodiment. As shown in fig. 8 and 9, in this embodiment, in an actual application, an application program for implementing the program development assistance method based on the AIGC is generated by programming, a user uses the application program as an AIGC program development assistant, and selects the application program when using the application program, inputs a prompt word, and then the application program verifies the compliance of the content of the prompt word in real time, determines whether a suspected sensitive word in the prompt word input by the user is in compliance, does not in compliance, prompts user risk information, and the compliance passes through invoking an AIGC model, performing prompt word feature matching by the AIGC model, generating information matched with the prompt word, and displaying the content result generated by the AIGC model to the user.
The protection scope of the program development assistance method based on the AIGC according to the embodiment of the present application is not limited to the sequence of steps listed in the embodiment, and all the schemes implemented by adding or removing steps and replacing steps according to the prior art made by the principles of the present application are included in the protection scope of the present application.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the AIGC-based program development assistance method provided by any embodiment of the application.
Any combination of one or more storage media may be employed in embodiments of the present application. The storage medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The embodiment of the application also provides electronic equipment. Fig. 8 is a schematic structural diagram of an electronic device 100 according to an embodiment of the application. In some embodiments, the electronic device may be a mobile phone, tablet, wearable device, in-vehicle device, augmented Reality (Augmented Reality, AR)/Virtual Reality (VR) device, notebook, ultra-Mobile Personal Computer (UMPC), netbook, personal digital assistant (Personal Digital Assistant, PDA), or other terminal device. The embodiment of the application does not limit the specific application scene of the program development auxiliary method based on AIGC.
As shown in fig. 10, an electronic device 100 provided in an embodiment of the present application includes a memory 101 and a processor 102.
The memory 101 is for storing a computer program; preferably, the memory 101 includes: various media capable of storing program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
In particular, memory 101 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) and/or cache memory. Electronic device 100 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. Memory 101 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the application.
The processor 102 is connected to the memory 101 for executing a computer program stored in the memory 101, so that the electronic device 100 executes the AIGC-based program development assistance method provided in any one of the embodiments of the present application.
Alternatively, the processor 102 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
Optionally, the electronic device 100 in this embodiment may further include a display 103. A display 103 is communicatively coupled to the memory 101 and the processor 102 for displaying a related GUI interactive interface for an AIGC-based program development assistance method.
In summary, the program development assisting method based on the AIGC provided by the application can help program developers reduce repetitive work based on the AIGC model, simplify the program development work, and avoid leakage of sensitive information in the interaction process with the AIGC model. Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. Accordingly, it is intended that all equivalent modifications and variations of the application be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims (10)

1. An AIGC-based program development assistance method, comprising:
collecting prompt words input by a user in a pre-trained AIGC model;
matching the prompting words with a preset sensitive information base, and confirming whether the prompting words are compliant;
and when confirming the compliance of the prompt word, controlling the AIGC model to generate information matched with the prompt word.
2. The AIGC-based program development assistance method of claim 1, further comprising training the AIGC model; training the AIGC model includes:
acquiring a prompting word data training set;
adopting an attention mechanism to distribute weights to the cue word training data in the cue word data training set to form weight tag training data;
and inputting the weight tag training data into a network model for training, and obtaining the AIGC model capable of generating information matched with the prompt word.
3. The AIGC-based program development assistance method of claim 2, wherein the network model employs a generative countermeasure network model, a self-dividing encoding network model, a diffusion model or a Transformer neural network model.
4. The AIGC-based program development assistance method of claim 2, wherein training the AIGC model further comprises: prompting word fine tuning training; the prompt word fine tuning training comprises the following steps:
acquiring the prompting word data in the prompting word data training set;
converting the prompt word data into prompt words containing empty slots based on a preset template library;
inputting the prompt word and answer data set containing the empty slots into a network model for training, and obtaining the AIGC model capable of searching answer data filling the empty slots from the answer data set.
5. The AIGC-based program development assistance method of claim 4, wherein the controlling the AIGC model to generate information matching the hint word includes:
converting the prompt word into a prompt word containing empty slots based on a preset template library;
the AIGC model searches answer data matched with the empty slots from the answer data set;
mapping the answer data to the corresponding empty slots to form optimized prompt words;
and controlling the AIGC model to generate information matched with the optimized prompt word.
6. The AIGC-based program development assistance method of claim 1, wherein the matching the hint word with a preset sensitive information base, determining whether the hint word is compliant includes:
inputting the prompt word into a pre-trained large language model, and carrying out text recognition on the prompt word by the large language model to obtain suspected sensitive words in the prompt word;
matching the suspected sensitive word with a sensitive word in a sensitive word bank, and acquiring a corresponding sensitive word in the sensitive word bank and a classification label corresponding to the sensitive word when the sensitive word matched with the suspected sensitive word exists in the sensitive word;
and determining whether the suspected sensitive words in the prompting words are compliant or not based on the suspected sensitive words, the acquired sensitive words in the sensitive word bank, the classification labels and a pre-configured compliance strategy, and outputting a compliance auditing result.
7. The AIGC-based program development assistance method of claim 1, further comprising any one or more of:
detecting program codes input by a user, and when detecting that the codes are abnormal, carrying out alarm prompt and searching to obtain optimization suggestions;
detecting an SQL database corresponding to the program code, and when detecting that the SQL database is abnormal, carrying out alarm prompt and searching to obtain optimization suggestions;
and generating a test program for executing the corresponding test function based on the test data input by the user.
8. The AIGC-based program development assistance method of claim 1, further comprising: and collecting log information in the interaction process of the user and the AIGC model in real time.
9. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the AIGC-based program development assistance method of any one of claims 1 to 8.
10. An electronic device, the electronic device comprising:
a memory storing a computer program;
a processor, communicatively connected to the memory, which executes the AIGC-based program development assistance method according to any one of claims 1 to 8 when calling the computer program.
CN202310814622.7A 2023-07-04 2023-07-04 Program development assisting method, storage medium and device based on AIGC Pending CN116991990A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310814622.7A CN116991990A (en) 2023-07-04 2023-07-04 Program development assisting method, storage medium and device based on AIGC

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310814622.7A CN116991990A (en) 2023-07-04 2023-07-04 Program development assisting method, storage medium and device based on AIGC

Publications (1)

Publication Number Publication Date
CN116991990A true CN116991990A (en) 2023-11-03

Family

ID=88527515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310814622.7A Pending CN116991990A (en) 2023-07-04 2023-07-04 Program development assisting method, storage medium and device based on AIGC

Country Status (1)

Country Link
CN (1) CN116991990A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117289841A (en) * 2023-11-24 2023-12-26 浙江口碑网络技术有限公司 Interaction method and device based on large language model, storage medium and electronic equipment
CN117311683A (en) * 2023-11-24 2023-12-29 浙江口碑网络技术有限公司 Code auxiliary system, code auxiliary processing method and device and electronic equipment
CN117389998A (en) * 2023-12-13 2024-01-12 北京汉勃科技有限公司 Data storage method and device based on large model
CN117539438A (en) * 2024-01-05 2024-02-09 阿里云计算有限公司 Software development method
CN117574410A (en) * 2024-01-16 2024-02-20 卓世智星(天津)科技有限公司 Risk data detection method and device

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117289841A (en) * 2023-11-24 2023-12-26 浙江口碑网络技术有限公司 Interaction method and device based on large language model, storage medium and electronic equipment
CN117311683A (en) * 2023-11-24 2023-12-29 浙江口碑网络技术有限公司 Code auxiliary system, code auxiliary processing method and device and electronic equipment
CN117311683B (en) * 2023-11-24 2024-03-19 浙江口碑网络技术有限公司 Code auxiliary system, code auxiliary processing method and device and electronic equipment
CN117389998A (en) * 2023-12-13 2024-01-12 北京汉勃科技有限公司 Data storage method and device based on large model
CN117389998B (en) * 2023-12-13 2024-03-12 北京汉勃科技有限公司 Data storage method and device based on large model
CN117539438A (en) * 2024-01-05 2024-02-09 阿里云计算有限公司 Software development method
CN117574410A (en) * 2024-01-16 2024-02-20 卓世智星(天津)科技有限公司 Risk data detection method and device
CN117574410B (en) * 2024-01-16 2024-04-05 卓世智星(天津)科技有限公司 Risk data detection method and device

Similar Documents

Publication Publication Date Title
CN116991990A (en) Program development assisting method, storage medium and device based on AIGC
JP7346609B2 (en) Systems and methods for performing semantic exploration using natural language understanding (NLU) frameworks
US10360308B2 (en) Automated ontology building
JP7441186B2 (en) System and method for translating natural language sentences into database queries
CN112100354B (en) Man-machine conversation method, device, equipment and storage medium
CN113761163B (en) Deep code searching method, system and device based on code structure semantic information
US20230110829A1 (en) Learned evaluation model for grading quality of natural language generation outputs
US11704506B2 (en) Learned evaluation model for grading quality of natural language generation outputs
CN111177307A (en) Test scheme and system based on semantic understanding similarity threshold configuration
EP3195308A1 (en) Actions on digital document elements from voice
US20210149937A1 (en) Enhanced intent matching using keyword-based word mover's distance
CN116720515A (en) Sensitive word auditing method based on large language model, storage medium and electronic equipment
CN114840869A (en) Data sensitivity identification method and device based on sensitivity identification model
CN112464655A (en) Word vector representation method, device and medium combining Chinese characters and pinyin
US20220083579A1 (en) Method and system for performing summarization of text
CN116663525B (en) Document auditing method, device, equipment and storage medium
US20230334075A1 (en) Search platform for unstructured interaction summaries
WO2020242383A1 (en) Conversational diaglogue system and method
CN115168851A (en) Method and device for generating malicious file detection rule and detecting malicious file
CN114626388A (en) Intention recognition method and device, electronic equipment and storage medium
US20230281484A1 (en) Semantic-aware rule-based recommendation for process modeling
US11669681B2 (en) Automated calculation predictions with explanations
US11966704B1 (en) Techniques for verifying a veracity of machine learning outputs
JP2018055224A (en) Data generating device, method, and program
KR102363958B1 (en) Method, apparatus and program for analyzing customer perception based on double clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination