CN117786082A

CN117786082A - Generation type online evaluation method and system based on fine tuning large model

Info

Publication number: CN117786082A
Application number: CN202311829864.XA
Authority: CN
Inventors: 颜友军; 崔恒香; 程雯; 何杰; 史煜凯; 魏亮
Original assignee: Jiangsu Future Networks Innovation Institute
Current assignee: Jiangsu Future Networks Innovation Institute
Priority date: 2023-12-27
Filing date: 2023-12-27
Publication date: 2024-03-29

Abstract

The invention discloses a generation type online evaluation method and system based on a fine tuning large model, wherein the method comprises the steps of firstly, constructing a large model in the evaluation field on the basis of an open-source large model; then, constructing a data set QuesBank by using the existing discipline knowledge and the question library; on the basis, an algorithm is used for fine tuning of the large model in the evaluation field, the fine tuning large model aiming at a specific subject evaluation scene is generated, automatic generation of a question bank is achieved, and a user is supported to perform conversational evaluation. Meanwhile, a dialogue question-answering flow with user characteristics is designed, a Storage module is used for vectorizing and storing context information of a user, the context is introduced into a prompt word and is input into a large model, a model response with the user characteristics is generated, and user-customized question bank evaluation is realized.

Description

Generation type online evaluation method and system based on fine tuning large model

Technical Field

The invention mainly relates to a generation type online evaluation method and system based on a fine tuning large model.

Background

With the development of technology and the rapid popularization of the mobile internet, the education industry is undergoing a profound revolution. The novel education mode of online education breaks through the space-time limitation in the traditional education, covers all stages from infants to adults, and meets the requirements of fragmented learning of people in the mobile Internet era.

The dialogue system is a popular Natural Language Processing (NLP) task, and has a very wide application prospect in real life. With the rapid development of a generative large language model (Large Language Model, LLM, also called a large model) represented by ChatGPT, the model gradually approaches the human level in the tuina test, thus injecting new vitality into a dialogue system and bringing more possibility to online education.

However, the general large model, the training data source is general knowledge content, is not optimized for discipline knowledge, and cannot meet the accurate question and answer of knowledge of specific disciplines in online education. The training process of the large model is very complex, and the subjects and the fields of the online education are numerous, and the corresponding subject large model cannot be customized for each subject, so that it is difficult to apply the general large model to the evaluation of the online education.

At present, on-line evaluation relies on a manually edited question bank, and the creation of the question bank needs a large number of personnel in the professional field, has uneven quality and is easy to form an industry barrier. Meanwhile, the existing online evaluation flow depends on fixed volume grouping logic: using a given test question group paper, or extracting test questions and grouping paper by using a specific rule; dynamic topic generation and targeted exercise cannot be performed according to weak links learned by the user.

Disclosure of Invention

The invention aims to provide a generating type online evaluation method and system based on a fine tuning large model. Through using discipline knowledge and a topic library, a training data set is constructed, a fine tuning algorithm of a large model is combined, fine tuning models of different disciplines in an online education evaluation scene are trained on the basis of fine tuning the large model in the evaluation field, and the fine tuning models are applied to online learning of a user, so that automatic topic generation and dialogue evaluation are realized.

In order to achieve the above object, the solution of the present invention is:

the utility model provides a generation type online evaluation method based on a fine tuning large model, which comprises the following steps:

step 1, pre-training an open source large model by using basic evaluation problem libraries of different disciplines to obtain a large model in the evaluation field;

step 2, collecting the knowledge and the subject database of the existing subjects, and constructing a data set of each subject;

step 3, training the large models in the evaluation field by using a large model fine-tuning algorithm based on the data set of each subject to obtain fine-tuning large models of each subject;

and 4, selecting a fine tuning large model of the corresponding subject by the user based on the current question, and generating a conversational evaluation question.

Further, the data set in step 2 includes a training data set and a reinforcement learning data set for large model fine tuning.

Further, the data sources of the data set include:

1) Existing knowledge maps, knowledge bases and discipline knowledge sets;

2) An existing subject library;

3) A public data set and a public question bank of the Internet;

4) Subject libraries generated by ChatGPT, discourse.

Further, the large model fine tuning algorithm in the step 3 is an improved AdaLoRA algorithm, and reinforcement learning is carried out on the large model in the evaluation field after fine tuning is carried out by adopting the AdaLoRA algorithm; wherein, the modified AdaLoRA algorithm is based on the AdaLora algorithm, and the gummel Softmax function is used for correcting the Softmax layer.

Further, adding penalty terms for repetition and null values to the reinforcement-learned reward function.

Further, step 4 further includes:

performing Embedding conversion on the existing discipline knowledge to obtain vector representation of the existing discipline knowledge as discipline data;

performing an Embedding conversion on the historical questioning contents of the user to obtain vector representations of the historical questioning contents as historical data;

performing an Embedding conversion on the current questioning content of the user to obtain a vector representation of the current questioning content;

and respectively splicing the subject data and the history data, the similarity of which meets the set requirements, with the vector representation of the current questioning content, serving as the known conditions and the front problems of the prompt words, with the vector representation of the current questioning content to obtain the prompt words which are input as the fine tuning large model.

Further, step 4 further includes:

and after carrying out data cleaning on the duplicate removal and invalid removal values on the output of the fine-tuning large model, carrying out an Embedding conversion on the data cleaning result, and updating historical data by using the Embedding conversion result.

Further, the method for acquiring subject data, in which the similarity with the vector representation of the current question content meets the set requirement, comprises the following steps:

respectively calculating first similarity between the vector representation of the current questioning content of the user and the vector representations of the historical questioning contents of the user, and judging whether the first similarity is larger than a set first threshold value or not;

if the first similarity is larger than the set first threshold value, selecting subject data associated with the history questioning content corresponding to the first similarity larger than the set first threshold value as subject data meeting the set requirement; otherwise, respectively calculating the second similarity between the vector representation of the current questioning content and the vector representation of each hot subject knowledge, and judging whether the second similarity is larger than a set second threshold value or not;

if the second similarity larger than the set second threshold exists, the vector representation of the hot subject knowledge corresponding to the second similarity larger than the set second threshold is subject data meeting the set requirement; otherwise, the N discipline data with the maximum similarity with the vector representation of the current questioning content are discipline data meeting the set requirement;

the hot subject knowledge is subject knowledge with occurrence frequency higher than a set frequency threshold in the history questioning contents of all users.

The utility model also provides a generation type online evaluation system based on fine tuning large model, which comprises:

the training data management module is used for constructing and managing the fine adjustment data set;

the large model fine adjustment module is used for fine adjustment of the large model in the evaluation field to obtain fine adjustment large models of various subjects;

and the conversational evaluation module is used for receiving the current questioning content of the user, generating a prompt word and realizing the generation of conversational evaluation questions.

Further, the generating type online evaluation system based on the fine tuning large model further comprises:

the user characteristic management module is used for performing an Embedding conversion on the history questioning content of the user to obtain a vector representation of the history questioning content and storing the vector representation as history data; and after carrying out data cleaning on the duplicate removal and invalid removal values on the output of the fine-tuning large model, carrying out an Embedding conversion on the data cleaning result, and updating historical data by using the Embedding conversion result.

Compared with the prior art, the invention has the remarkable advantages that: according to the invention, the subject knowledge and the subject library are used for constructing a data set, a fine-tuning algorithm of the large model is combined, the fine-tuning large models of different subjects in the online education evaluation scene are trained on the basis of fine-tuning the large model in the evaluation field, and the fine-tuning large models are applied to online learning of users, so that automatic subject generation and dialogue evaluation are realized. By utilizing the technical scheme of the invention, mass questions can be generated on the basis of subject knowledge and limited question banks, thereby assisting students to learn knowledge and practice the question banks rapidly, improving the efficiency of online learning and reducing the cost of the question banks.

Drawings

FIG. 1 is a flow chart of a generative online assessment system based on a fine tuning large model.

Fig. 2 is a schematic diagram of a large model fine tuning algorithm.

FIG. 3 is a flow chart of a large model fine tuning algorithm.

Fig. 4 is a conversational question-answering flow with user features.

FIG. 5 is a block diagram of a generated on-line assessment system based on a fine-tuning large model.

Detailed Description

The present invention will be described in further detail with reference to specific examples.

Aiming at the problem that the question bank is lacking and the dialogue type assessment cannot be carried out by utilizing the large model in the assessment scene of the online education, the invention provides a generating type online assessment method and system based on the fine tuning large model. Firstly, constructing a large model in the evaluation field on the basis of a large model of an open source; then, constructing a data set QuesBank by using the existing discipline knowledge and the original question bank; on the basis, an algorithm is used for fine tuning of the large model in the evaluation field, the fine tuning large model aiming at a specific subject evaluation scene is generated, automatic generation of a question bank is achieved, and a user is supported to perform conversational evaluation. Meanwhile, a dialogue question-answering flow with user characteristics is designed, a Storage module is used for vectorizing and storing context information of a user, the context is introduced into a prompt word and is input into a large model, a model response with the user characteristics is generated, and user-customized question bank evaluation is realized.

The invention relates to a generation type online evaluation method based on a Fine-tuning large model, which is used for carrying out Full-scale parameter Fine tuning (Full Fine-tuning) on an open-source large model based on a basic evaluation question bank to construct a large model in the evaluation field; constructing a training data set QuesBank through specific discipline knowledge and a question bank; large model tuning (Fine-tuning) is performed using a model tuning algorithm based on the training dataset. As shown in fig. 1, the method specifically comprises the following steps:

1) Using basic evaluation problem libraries of different disciplines to carry out full-parameter adjustment on the open source large model, constructing a large model in the evaluation field, and using the large model as a basic model for subsequent evaluation fine adjustment of different disciplines;

2) Collecting the existing discipline knowledge and the question bank, and constructing a data set QuesBank; the data set is used as basic data for fine adjustment of the model;

3) Based on the large model in the evaluation field, a large model fine tuning algorithm is used, and a fine tuning large model of each subject in an online evaluation scene is trained on the basis of a QuesBank data set.

Specifically, a basic evaluation problem library of different disciplines is used for carrying out full-parameter fine adjustment on an open-source basic model, and a large evaluation field model is generated. The large model in the evaluation field is oriented to a general evaluation scene, so that conversational evaluation questions are generated and used as a basic model for fine adjustment of a subsequent specific subject.

Specifically, using the existing discipline knowledge and the topic library, a dataset QuesBank is constructed, which contains training data TrainData and reinforcement data Rewards Date. The training data is used for subsequent large model fine tuning, so that the topic generation aiming at the specific discipline is realized, and the evaluation of the user in the knowledge of the specific discipline is supported. The reinforcement data is used for reinforcement learning in fine tuning of the model, thereby improving accuracy of fine tuning of the model answer.

The data sources that construct the data set include:

1) Extracting from the existing knowledge graph, knowledge base and discipline knowledge set;

2) Extracting from an existing subject library;

3) Public data sets and public question banks extracted from the Internet;

4) Subject libraries generated by ChatGPT, discourse.

Specifically, the large model in the evaluation field is subjected to fine adjustment and reinforcement training with repetition and null punishment. The fine-tuned model is used for generating a subject library and supporting conversational evaluation of a user. The fine tuning does not modify the parameter quantity of the original model, but achieves the effect of adjusting the precision approximate to the whole parameter by updating a small quantity of parameters, thereby realizing rapid subject fine tuning model generation by utilizing lower hardware resources. And (3) for the fine-tuned model, reinforcement learning training added with repeated and null penalty items is used to solve the problem that repeated and null answer frequently occurs in a small subject library sample by a fine-tuning algorithm, so that user experience is improved.

Specifically, when the large model in the evaluation field is subjected to fine adjustment, an improved AdaLoRA (Adaptive Low Rank Adaptor) algorithm is used for fine adjustment aiming at the phenomenon that the subject database data volume is small and the difference distribution of the subject database is discrete from that of the training database of the basic large model, and the reinforced training process with repetition and null penalty is implemented.

At fine tuning, the data features were evaluated against disciplines and on the basis of the AdaLora algorithm, the Softmax layer was corrected using the gummel Softmax function in the original model. The algorithm principle is shown in fig. 2, wherein x is input, h is model output, the algorithm adds singular vectors P and Q and singular matrix Λ outside the original PLM model, and corrects Softmax. The method comprises the following steps:

for a pre-training parameter matrixThe parameter update DeltaW can be represented by left and right singular vectors, W ⁽¹⁾ Representing a base model after Softmax layer correction, namely:

W＝W ⁽⁰⁾ +ΔW＝W ⁽¹⁾ +PΛQ

wherein W is a fine-tuned model parameter, W ⁽⁰⁾ For pretraining parameters, W ⁽¹⁾ For model parameters corrected by Softmax layers, Δw is a fine-tuned delta parameter, P and Q are left and right singular vectors of Δw, respectively, Λ is a Δw singular matrix,r＜＜min{d ₁ ,d ₂ }. Because the value of r is very small, the additional computational effort that the algorithm adds to during training is also very small.

Specifically, the flow of fine tuning the large model in the evaluation field is as follows:

1) Training data TrainData in a data set QuesBank, training by using an improved AdaLora algorithm on the basis of a large model in the evaluation field, and training out a model W ^′ ；

2) Setting a reward function with repetition and null penalties on the enhanced data Rewards Date in the data set QuesBank, wherein:

a) Setting a repeated value and null value detection function as punishment;

b) And setting rewards for correct answers on the enhanced data Rewards date.

3) For fine tuning model ΔW using the reward function described above ^′ Reinforcement learning is performed by using a TRPO (Trust region policy optimization) algorithm, and a model W after reinforcement learning, namely a fine-tuning large model, is generated.

Further, the generating type online evaluation method based on the fine tuning large model further comprises the step of designing a dialogue type question-answering flow with discipline knowledge and user context characteristics. Specifically, when a user performs conversational evaluation, the user first selects a fine-tuning large model of a corresponding subject as required, and then asks a question. In the process of user question and answer, collecting context information with user characteristics, combining a discipline knowledge base and the history question content of the user, generating a Prompt word (Prompt) with discipline knowledge and user context information, inputting the Prompt word (Prompt) into a fine-tuning large model, and generating model output. Thus, more accurate question-answer data which is suitable for the learning interests of the user can be generated according to the subject selected by the user and the specific focus of the subject knowledge; meanwhile, continuous context-aware question answering can be performed, and the user requirements can be better met. In addition, the characteristic data of the user can be dynamically updated and stored in the Storage component in a lasting manner; each time a user opens a question and answer, loading the question and answer, and updating data after each time the user asks and answers; thus realizing continuous question and answer with user characteristics. As shown in fig. 4, specifically:

1) Using the collected existing discipline knowledge, performing an Embeddding conversion by using a universal text Embedding model (Text Embedding models), wherein a vector representation of the existing discipline knowledge is used as discipline data and stored in a vector manner in the discipline knowledge of Storage;

2) When a user asks a question each time, the questioning content of the user is subjected to Embeddding conversion by using a universal text Embedding model to obtain vector representation of the historical questioning content, and the vector representation is used as historical data and stored in the user characteristics of Storage in a vector mode;

3) Performing an Embedding conversion on the current questioning content of the user by using a universal text Embedding model, comparing the converted Embedding with discipline knowledge stored in a Storage component by using a cosine similarity algorithm, and taking out discipline data with similarity meeting the set requirement as a known condition in a prompt word so as to improve the matching precision of the large model;

4) According to the Embeddings converted from the current questioning contents, a cosine similarity algorithm is used for comparing the Embeddings with user characteristics stored in a Storage component, historical data (a plurality of historical data with highest similarity are generally taken) with similarity meeting the set requirements is taken out and used as a prepositive question of a prompt word, and the answer of a large model is enabled to have contextual characteristics;

5) Splicing the subject data selected in the step 3) as the known condition of the prompt word, the history data selected in the step 4) as the prepositive problem of the prompt word and the Embeddings corresponding to the current question content of the user, and merging to generate the prompt word Prompts as the input of the fine tuning large model;

6) After the macro-scale model makes predictions, besides returning the results to the user, the results are subjected to repeated removal and invalid value removal cleaning, and then the text Embedding model is used for performing Embedding conversion on the results, converting the results into vectors and storing the vectors in the user characteristics of the Storage component.

Specifically, both the discipline features and the user features stored in the Storage component support dynamic updates. The incremental updating of the subject features can realize the rapid dynamic expansion of the subject knowledge so as to make up for the defect that the large model cannot be updated in rapid increment; the incremental update of the user features can realize more accurate user intention matching, avoid the influence of stale data on the current context and reduce the length of the prompt words. The specific incremental updating method is as follows:

1) Updating Storage discipline features using incremental discipline data entered by an administrator;

2) In the timing use of user questions, the question and answer data with highest occurrence frequency updates Storage subject characteristics;

3) The least recently used algorithm is used to eliminate user features in Storage, and the number of specific user features is maintained, namely 512 in the invention.

Furthermore, when subject data with similarity meeting the set requirement is selected, a three-level feature matching mechanism is adopted to match the subject data, so that the performance and the efficiency are further improved, and the comparison of large-scale subject field knowledge in complex subjects is supported. The specific three-level matching is as follows:

1) The comparison between the current questioning content of the user and the subject knowledge associated with each history questioning of the user is used as a first-level comparison;

2) The comparison between the current questioning content of the user and the hot spot subject knowledge with the hit frequency exceeding the set threshold is used as a second-level comparison;

3) And comparing the current questioning content of the user with all discipline knowledge to obtain a third-level comparison.

The specific processing flow of the three-level feature matching mechanism is as follows:

1) Performing first-level comparison, and searching for similarity between the vector representations of the current questioning contents and the vector representations of the current questioning contents to be larger than a threshold valueSubject knowledge associated with a historical question, if any, with a similarity to the vector representation of the current question content greater than a thresholdA vector representation of the subject knowledge associated with the history question as a known condition, otherwise, performing the next step;

2) Performing second-level comparison, and searching for similarity between the vector representations of the current questioning contents and the vector representations of the current questioning contents to be larger than a threshold valueHot subject knowledge of (c) if present, with similarity to the vector representation of the current questioning content greater than a threshold +.>As a known condition, otherwise performing the next step;

3) And carrying out third-level comparison, and selecting N discipline data with the maximum similarity with the vector representation of the current questioning content as known conditions. Here, N may be preset according to actual needs.

The invention also provides a generation type online evaluation system based on the fine tuning large model, which comprises a training data management module, a large model fine tuning module, a user characteristic management module and a dialogue type evaluation module, and realizes automatic generation of a question bank and dialogue type evaluation. Centering on the discipline fine tuning model, as shown in fig. 5.

Specifically, the training data management module is used for constructing and managing the data set:

a) Supporting the extraction and cleaning of data from a subject knowledge base, a subject question base, a network open source data set and a large model question-answer channel;

b) Supporting the storage of a training data set;

c) Reading of the training data set is supported.

Specifically, the large model fine adjustment module is used for fine adjustment of the large model in the evaluation field to obtain fine adjustment large models of various subjects:

a) And the docking training data management module reads training data to be used as a fine tuning data source.

b) And the docking training data management module reads the strengthening data to be used as a source of the strengthening training data.

c) The fine tuning and strengthening training process provided by the invention is used for carrying out fine tuning and strengthening training on the large model in the evaluation field.

d) A nudge model is generated and stored.

Specifically, the user feature management module is used for managing the context information of the user and realizing user feature questions and answers:

a) And managing feature Storage, and supporting vectorization Storage of Storage components.

b) And carrying out duplication removal and null removal cleaning on the model reasoning result, and storing the model reasoning result in a Storage component.

c) And automatically injecting the user characteristic data stored in the Storage component when the user answers.

Specifically, the conversational evaluation module is used for supporting conversational evaluation of the user.

a) And receiving input of a user, receiving injection of user characteristics, and generating a prompt word.

b) And (5) butting and fine-tuning the large model, generating model prediction, and realizing conversational evaluation.

The invention focuses on automatic topic generation and conversational evaluation, builds a large model in the evaluation field and carries out fine adjustment on the large model, and the conversational evaluation in a plurality of subject fields in an online education scene is satisfied by using limited resources. Therefore, mass questions can be generated on the basis of subject knowledge and limited question banks rapidly, students can be assisted to learn knowledge and practice the question banks rapidly, learning efficiency is improved, and cost of the question banks is reduced.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the foregoing embodiments, and that the foregoing embodiments and description are merely preferred embodiments of the invention, and are not intended to limit the invention, but that various changes and modifications may be made therein without departing from the novel spirit and scope of the invention, which is defined by the appended claims.

Claims

1. The generation type online evaluation method based on the fine tuning large model is characterized by comprising the following steps of:

2. The method of claim 1, wherein the data set in step 2 includes training data set and reinforcement learning data set for fine tuning of the large model.

3. The method of claim 1, wherein the data sources of the data set include:

1) Existing knowledge maps, knowledge bases and discipline knowledge sets;

2) An existing subject library;

3) A public data set and a public question bank of the Internet;

4) Subject libraries generated by ChatGPT, discourse.

4. The method for online evaluation based on generation of a fine-tuning large model according to claim 1, wherein the large model fine-tuning algorithm in the step 3 is an improved adalore algorithm, and reinforcement learning is performed on the large model in the evaluation field after fine-tuning by adopting the adalore algorithm; wherein, the modified AdaLoRA algorithm is based on the AdaLora algorithm, and the gummel Softmax function is used for correcting the Softmax layer.

5. The method of claim 4, wherein penalty terms for repetition and nulls are added to the reinforcement-learning reward function.

6. The method for online evaluation based on generation of a fine-tuning large model according to claim 1, wherein step 4 further comprises:

7. The method for generating online evaluation based on a fine-tuning large model as claimed in claim 6, wherein the step 4 further comprises:

8. The method for generating online evaluation based on a fine-tuning large model according to claim 6, wherein the method for acquiring subject data whose similarity with the vector representation of the current questioning contents satisfies the set requirement comprises:

9. The utility model provides a big model of fine setting-based generation formula online evaluation system which characterized in that includes:

the training data management module is used for constructing and managing a data set;

10. The micro-scale model-based generative online evaluation system of claim 9, further comprising: