CN117290480A - Fine tuning method, system, equipment and medium based on large language model - Google Patents

Fine tuning method, system, equipment and medium based on large language model Download PDF

Info

Publication number
CN117290480A
CN117290480A CN202311261455.4A CN202311261455A CN117290480A CN 117290480 A CN117290480 A CN 117290480A CN 202311261455 A CN202311261455 A CN 202311261455A CN 117290480 A CN117290480 A CN 117290480A
Authority
CN
China
Prior art keywords
knowledge graph
fine tuning
language model
knowledge
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311261455.4A
Other languages
Chinese (zh)
Inventor
贾波
陈永志
陈欣
张思俊
曹磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cec Jiutian Intelligent Technology Co ltd
Original Assignee
Cec Jiutian Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cec Jiutian Intelligent Technology Co ltd filed Critical Cec Jiutian Intelligent Technology Co ltd
Priority to CN202311261455.4A priority Critical patent/CN117290480A/en
Publication of CN117290480A publication Critical patent/CN117290480A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Manufacturing & Machinery (AREA)
  • Human Computer Interaction (AREA)
  • Animal Behavior & Ethology (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of industrial automation, in particular to a fine tuning method, a system, equipment and a medium based on a large language model; firstly, acquiring an industrial knowledge graph from an intelligent center, generating knowledge graph triplet training data according to a query question to be input and the industrial knowledge graph, then constructing a fine tuning model according to a LLM large language model, and finally inputting the triplet training data into the fine tuning model to obtain a knowledge graph logic chain tracing answer; by fine tuning the knowledge graph as a control mode and generating a high-quality LLM fine tuning data set based on the knowledge graph triple training data of logic chain decomposition, the content and the relation of the knowledge graph can be fully utilized, the problem of unbalanced distribution of the fine tuning data is relieved, the contextual learning based on the knowledge graph is realized, various industrial knowledge graphs are used as the input of a large language model, and the reliability and the certainty of the output of the large language model are enhanced.

Description

Fine tuning method, system, equipment and medium based on large language model
Technical Field
The invention relates to the technical field of industrial automation, in particular to a fine tuning method, a system, equipment and a medium based on a large language model.
Background
Natural Language Processing (NLP) tasks develop rapidly after the appearance of a Transformer model, and ChatGPT introduced by OpenAI, 11 months in 2022, realizes more than 1 million active users in a short period of three months, becomes a consumer application with the fastest growing number of users historically, and also makes a large language model (Large Language Model, LLM) enter the field of view of the public. The very large scale of parameters of large language models (e.g., 1750 hundred million parameter GPT-3 and 5400 hundred million parameter PaLM) make the models exhibit emerging capabilities not possessed by the Pre-trained language models (Pre-training Language Model, PLM) of small previous parameters, exhibiting dramatic capabilities over a range of complex tasks such as conversations, searches, questions and answers, reasoning, and the like.
The general language capability of the large language model is quite surprise, but how to specialize the capability of the LLM to specific tasks is still insufficient in the specialization capability of the vertical business and the professional field, the large language model which is not subjected to fine tuning training has serious factual errors in the fields of fuzzy boundary conditions and accurate subdivision required by professional knowledge, and the information generated by the large language model conflicts with the existing sources or cannot pass the prior knowledge traceability verification (illusion), so that the application of the large language model in the industrial field is seriously restricted.
To adapt the general capabilities of large language models to a specific application domain, adaptation fine-tuning is often required in combination with knowledge of the professional domain on a pre-training basis. The existing method for adapting the pre-trained LLM comprises the following steps: instruction fine tuning (Instruction tuning) and alignment fine tuning (alignment fine tuning) are two fine tuning methods, and because LLM (LLM) is huge in parameter quantity, a pre-training data set is not open, professional field data are relatively less, a model has disastrous forgetting and the like, the two full-quantity parameter fine tuning methods are huge in cost in the actual application and training processes.
In order to solve the problem of large LLM full parameter tuning overhead while maintaining good performance of LLM as much as possible, high-efficiency tuning methods for large language models are proposed, including four methods of adapter tuning (adapter tuning), prefix tuning (prefix tuning), prompt tuning (prompt tuning), and low-rank tuning (LoRA).
The existing LLM is extremely dependent on the promptness Prompt words, is sensitive to the change of the promptness Prompt words, and can obtain a good result in the expectation only by manually carefully designing the promptness Prompt words;
the existing fine tuning method does not contain graph structure data in a similar knowledge graph form, and cannot efficiently use rich, reliable and credible knowledge graph data existing in the industry;
the existing fine tuning method cannot solve the problem of uneven distribution of training data caused by smaller local data scale and less professional knowledge, and the fine tuning effect cannot be ensured;
the existing fine tuning method uses the local existing data to update, the updated domain knowledge needs to be trained again, otherwise, the illusion phenomenon still exists in the knowledge boundary or vertical domain;
after new knowledge is fused, the existing fine tuning training method generally generates forgetting of original knowledge of a model to a certain extent, and causes loss of LLM universal capability;
the existing fine tuning method inputs newly-added knowledge into LLM for fine tuning, uncertainty exists in the reasoning process, accurate tracing cannot be achieved, and the reliability requirement of the industrial field on results cannot be met.
Disclosure of Invention
Aiming at the problems of insufficient reliability and certainty of the existing fine tuning method, the invention provides a fine tuning method, a system, equipment and a medium based on a large language model; by fine tuning the knowledge graph as a control mode and generating a high-quality LLM fine tuning data set based on the knowledge graph triple training data of logic chain decomposition, the content and the relation of the knowledge graph can be fully utilized, the problem of unbalanced distribution of the fine tuning data is relieved, the contextual learning based on the knowledge graph is realized, various industrial knowledge graphs are used as the input of a large language model, and the reliability and the certainty of the output of the large language model are enhanced.
The invention has the following specific implementation contents:
a fine tuning method based on a large language model comprises the steps of firstly obtaining an industrial knowledge graph from an intelligent center, generating knowledge graph triplet training data according to a query problem to be input and the industrial knowledge graph, then constructing a fine tuning model according to an LLM large language model, and finally inputting the triplet training data into the fine tuning model to obtain a knowledge graph logic chain tracing answer.
In order to better implement the present invention, further, the tuning method based on the large language model specifically includes the following steps:
step S1: acquiring an industrial knowledge graph from an intelligent center, and generating knowledge graph triplet training data according to a query problem to be input and the industrial knowledge graph;
step S2: constructing a fine tuning training model according to the LLM big language model, and calling an attention controllable fine tuning network to train the fine tuning training model;
step S3: and inputting the knowledge spectrum triplet training data into the fine tuning training model to obtain a knowledge spectrum logic chain tracing answer.
In order to better implement the present invention, further, the step S1 specifically includes the following steps:
step S11: invoking a logic chain decomposition method to decompose the query problem to obtain a query sub-problem;
step S12: acquiring an industrial knowledge graph from an intelligent center table, and generating knowledge graph triplet training data according to the query sub-problem and the industrial knowledge graph;
step S13: and taking the knowledge spectrum triplet training data as a control mode, and generating query data according to the query problem corresponding to the knowledge spectrum triplet data.
In order to better implement the present invention, further, the step S12 specifically includes the following steps:
step S121: searching the query sub-questions in an associated mode, and searching entities and relations related to the query questions according to the knowledge graph;
step S122: obtaining a knowledge graph subgraph related to the query problem according to the entity and the relationship;
step S123: and decomposing the knowledge graph subgraph into atomic level single logic, and constructing knowledge graph triplet training data according to a set format by the atomic level single logic and the query problem.
In order to better implement the present invention, further, the step S2 specifically includes the following steps:
step S21: reasoning the inquiry problem according to the LLM big language model, and predicting a reasoning result;
step S22: invoking a graph neural network to extract the characteristics of the knowledge-graph triplet training data;
step S23: adding a convolution layer initialized to 0 to be connected with a frozen parameter residual error of the LLM big language model, and splicing the reasoning result with the characteristics of the knowledge graph triplet training data to obtain mixed characteristics;
step S24: and constructing a fine tuning training model according to the mixed characteristics, and calling a low-rank decomposition matrix to combine a channel and a spatial attention mechanism to train the fine tuning training model.
In order to better implement the present invention, further, the step S3 specifically includes the following steps:
step S31: converting the query problem in the triplet training data into word feature vectors;
step S32: fusing the word feature vector with the acquired position information to obtain a word feature vector containing the position information;
step S33: reasoning the word feature vector containing the position information to obtain a reasoning feature vector;
step S34: fusing the reasoning feature vector with the industrial knowledge graph to obtain a feature vector fused with knowledge graph information;
step S35: and restoring the feature vector fused with the knowledge graph information into natural language to obtain a knowledge graph logic chain tracing answer.
Based on the above mentioned fine tuning method based on large language model, in order to better realize the invention, further, a fine tuning system based on large language model is provided, comprising a preprocessing unit, a construction unit and a prediction unit;
the preprocessing unit is used for acquiring an industrial knowledge graph from the intelligent center station and generating knowledge graph triplet training data according to the query problem to be input and the industrial knowledge graph;
the construction unit is used for constructing a fine tuning model according to the LLM big language model;
and the prediction unit is used for inputting the triplet training data into the fine tuning model to obtain a knowledge graph logic chain tracing answer.
Based on the above-mentioned fine tuning method based on large language model, in order to better implement the present invention, further, an electronic device is proposed, which includes a memory and a processor; the memory has a computer program stored thereon; the above-described large language model based tuning method is implemented when the computer program is executed on the processor.
Based on the above-mentioned fine tuning method based on the large language model, in order to better implement the present invention, further, a computer readable storage medium is proposed, and the computer readable storage medium stores computer instructions thereon; when the computer instructions are executed on the electronic device, the method for fine tuning based on the large language model is realized.
The invention has the following beneficial effects:
(1) According to the invention, the knowledge graph is used as a control mode for fine adjustment, and a high-quality LLM fine adjustment data set is generated based on the knowledge graph triplet training data of logic chain decomposition, so that the content and the relation of the knowledge graph can be fully utilized, the problem of unbalanced distribution of the fine adjustment data is solved, the contextual learning based on the knowledge graph is realized, various industrial knowledge graphs are used as the input of a large language model, and the reliability and the certainty of the output of the large language model are enhanced.
(2) The knowledge spectrum triplet training data based on knowledge spectrum logic chain decomposition effectively improves the understanding and reasoning capacity of the large language model on the logic relationship between knowledge spectrum entities, enhances the zero sample reasoning capacity of the large language model, and improves the general reasoning capacity of the large language model on the knowledge spectrum.
(3) The invention uses the connection of the attention controllable fine tuning module ACNet and the low rank attention resolution fine tuning module LRA with the frozen parameter residual error of the large language model, thereby enhancing the understanding and reasoning capacity of the large language model to the knowledge graph and simultaneously keeping the original general reasoning capacity of the large language model to the maximum extent.
Drawings
Fig. 1 is a schematic diagram of a fine tuning process according to the present invention.
Fig. 2 is a schematic diagram of knowledge graph input provided by the invention.
Fig. 3 is a schematic diagram of a fine tuning architecture according to the present invention.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it should be understood that the described embodiments are only some embodiments of the present invention, but not all embodiments, and therefore should not be considered as limiting the scope of protection. All other embodiments, which are obtained by a worker of ordinary skill in the art without creative efforts, are within the protection scope of the present invention based on the embodiments of the present invention.
In the description of the present invention, it should be noted that, unless explicitly stated and limited otherwise, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; or may be directly connected, or may be indirectly connected through an intermediate medium, or may be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Example 1:
the embodiment provides a fine tuning method based on a large language model, which comprises the steps of firstly acquiring an industrial knowledge graph from an intelligent center, generating knowledge graph triplet training data according to a query problem to be input and the industrial knowledge graph, then constructing a fine tuning model according to an LLM large language model, and finally inputting the triplet training data into the fine tuning model to obtain a knowledge graph logic chain tracing answer.
Further, the fine tuning method based on the large language model specifically comprises the following steps:
step S1: and acquiring an industrial knowledge graph from the intelligent center, and generating knowledge graph triplet training data according to the query problem to be input and the industrial knowledge graph.
Further, the step S1 specifically includes the following steps:
step S11: and calling a logic chain decomposition method to decompose the query problem to obtain a query sub-problem.
Step S12: and acquiring an industrial knowledge graph from the intelligent center, and generating knowledge graph triplet training data according to the query sub-problem and the industrial knowledge graph.
Further, the step S12 specifically includes the following steps:
step S121: and searching the query sub-questions in an associated mode, and searching the entities and the relations related to the query questions according to the knowledge graph.
Step S122: and obtaining a knowledge graph subgraph related to the query problem according to the entity and the relation.
Step S123: and decomposing the knowledge graph subgraph into atomic level single logic, and constructing knowledge graph triplet training data according to a set format by the atomic level single logic and the query problem.
Step S13: and taking the knowledge spectrum triplet training data as a control mode, and generating query data according to the query problem corresponding to the knowledge spectrum triplet data.
Step S2: and constructing a fine tuning training model according to the LLM large language model, and calling a focus controllable fine tuning network to train the fine tuning training model.
Further, the step S2 specifically includes the following steps:
step S21: reasoning the inquiry problem according to the LLM big language model, and predicting a reasoning result;
step S22: invoking a graph neural network to extract the characteristics of the knowledge-graph triplet training data;
step S23: adding a convolution layer initialized to 0 to be connected with a frozen parameter residual error of the LLM big language model, and splicing the reasoning result with the characteristics of the knowledge graph triplet training data to obtain mixed characteristics;
step S24: and constructing a fine tuning training model according to the mixed characteristics, and calling a low-rank decomposition matrix to combine a channel and a spatial attention mechanism to train the fine tuning training model.
Step S3: and inputting the knowledge spectrum triplet training data into the fine tuning training model to obtain a knowledge spectrum logic chain tracing answer.
Further, the step S3 specifically includes the following steps:
step S31: converting the query problem in the triplet training data into word feature vectors;
step S32: fusing the word feature vector with the acquired position information to obtain a word feature vector containing the position information;
step S33: reasoning the word feature vector containing the position information to obtain a reasoning feature vector;
step S34: fusing the reasoning feature vector with the industrial knowledge graph to obtain a feature vector fused with knowledge graph information;
step S35: and restoring the feature vector fused with the knowledge graph information into natural language to obtain a knowledge graph logic chain tracing answer.
Working principle: according to the embodiment, firstly, an industrial knowledge graph is obtained from an intelligent center, knowledge graph triplet training data are generated according to a query problem to be input and the industrial knowledge graph, then a fine tuning model is built according to a LLM large language model, and finally the triplet training data are input into the fine tuning model to obtain a knowledge graph logic chain traceability answer; by fine tuning the knowledge graph as a control mode and generating a high-quality LLM fine tuning data set based on the knowledge graph triple training data of logic chain decomposition, the content and the relation of the knowledge graph can be fully utilized, the problem of unbalanced distribution of the fine tuning data is relieved, the contextual learning based on the knowledge graph is realized, various industrial knowledge graphs are used as the input of a large language model, and the reliability and the certainty of the output of the large language model are enhanced.
Example 2:
this embodiment is described in detail with reference to one specific embodiment, as shown in fig. 1, 2 and 3, based on embodiment 1.
The general flow chart of the high-efficiency fine tuning method provided in this embodiment is shown in fig. 1, and specifically includes the following steps.
Step S1: by means of a logic chain decomposition method based on a knowledge graph built in a JTian intelligent middle platform, complex problems are decomposed into single-step queries, high-quality industrial knowledge graphs are combined to generate knowledge graph triplet data, the constructed knowledge graph triplet data are used as control modes to enhance knowledge boundaries of a large language model, certainty of generating answers is enhanced, and query problems corresponding to the triplet data are used as normal query data.
Step S2: and (3) inputting the knowledge graph triplet training data obtained in the step (S1) into a LLM fine tuning training framework, carrying out fine tuning training on the LLM by combining with an attention controllable fine tuning network Attention Control Net and ACNet, enhancing the understanding and learning of the large language model on the industrial knowledge graph, enhancing the capability of the large language model to decompose complex problems by using a logic chain and deducing step by step according to facts, and relieving the illusion phenomenon of the LLM.
Step S3: the fine-tuned LLM can automatically correlate the fact knowledge in the query knowledge graph according to the query questions and the input related knowledge graph triple data, gradually perform reasoning and answering, finally generate a trusted answer result based on a logic chain, realize traceability of a large language model generation result and enhance accuracy and reliability of the large language model generation result.
The knowledge-graph triplet input data construction method provided by the embodiment is shown in fig. 2.
In the process of performing fine tuning by LLM, since the fine tuning training data set is far smaller in scale and volume than the pre-training data set, in order to avoid the loss of the universal capability of the large language model as much as possible, there is a higher requirement on the quality of the fine tuning data set. Meanwhile, because LLM has poor understanding and reasoning capability on complex problems, a thinking chain is generally introduced to prompt CoT to decompose the complex problems into multi-step simple reasoning inquiry.
S101: and carrying out logic chain decomposition on the query problem by relying on the JTian intelligent middle platform, carrying out association search on the decomposed atomic problem, and retrieving entities and relations related to the query problem from the knowledge graph to obtain a knowledge graph subgraph about the query. For example, as shown in fig. 2, to know the condition of the latest electric quantity of the battery, the JTian intelligent center station retrieves the data of the entity and attribute related to the battery, including the electric quantity of the battery, the charging condition of the power supply, the power consumption condition of the sensor, and the like, from the knowledge graph, and constructs a knowledge graph sub-graph related to the battery.
S102: and decomposing the queried knowledge graph subgraph into atomic-level single logic, and constructing knowledge graph triplet training data according to a preset format by using the decomposed triplet data together with the query questions and the determined answers.
The method for training fine tuning of a large language model according to the embodiment is shown in fig. 3.
The high-efficiency fine tuning method for the large language model provided by the embodiment comprises a large language model body freezing parameter part, a knowledge graph control mode input part, an attention controllable fine tuning network ACNet, an attention controllable fine tuning network LRA and a low-rank attention decomposition fine tuning network LRA, wherein the S201 is shown, the S202 is shown, the S203 is shown, the S404 is shown, the ChatGLM-6B is taken as an example of the large language model in the embodiment, the rest of the large language models can adopt the high-efficiency fine tuning method provided by the patent, and the S203 and the S204 are core parts of the fine tuning model.
S201: the part is the original structure of the large language model, the input query problem is inferred, all parameters of the original model are not involved in training, and the universal reasoning capacity of the large language model is reserved to the maximum extent. Taking ChatGLM-6B as an example, the system includes an encoding layer for encoding input, a rotating encoding layer for encoding position, and each module including 28 GLMBlock modules propagating forward includes RMSNorm, selfAttention and an MLP layer, and finally outputs a prediction reasoning result.
S202: the fine tuning training method adopts a knowledge graph as control mode input, input content is shown as S102, the ternary training data containing the knowledge graph after retrieval and filling is input, and the fine tuning training is carried out by inputting a fine tuning model after feature extraction through a graph neural network GCN.
S203: the method is characterized in that the method is an attention controllable fine tuning network ACNet, the integral fine tuning of the large language model module is realized by adding a convolution layer initialized to 0, and the universal reasoning capacity of the large language model is reserved as far as possible by linking with a frozen parameter residual error of the large language model.
The ACNet module receives the extracted features from the knowledge-graph triplet training data input by the GCNS402, and the features are spliced with the problem query coding features extracted by the ChatGLM-6B model Embedding layer S401 after being restored to the feature dimension through a convolution layer initialized to 0, so that mixed input features are obtained.
And the GLMBLock modules with the input characteristics being copied perform forward reasoning, each GLMBLock module completely copies the weight and the parameters of the frozen large language model GLMBlock, and the blocks to be adjusted are connected with a convolution layer initialized to 0 before and after each GLMBlock module, so that the mapping adjustment of the input and output of the modules is realized.
S204: the part is a low-rank attention decomposition fine tuning network LRA, and in order to reduce the calculated amount required by billion parameters on a large language model and reduce the resources required by reasoning, a low-rank decomposition matrix is adopted to combine a channel and a spatial attention mechanism to perform equivalent replacement and training enhancement on a linear layer part in SelfAttention in a GLMBlock module.
The input of SelfAttention is used as the input of LRA, and the characteristics of the multi-head Attention after passing through the linear layer Ar are spliced according to the Attention dimension;
obtaining channel Attention weights by using global maximum pooling MaxPool and global average pooling AvgPool based on single Attention direction;
the obtained features are spliced and then subjected to multi-layer perceptron MLP to obtain the feature attention weight which is the same as the number of the output channels of the Ar layer;
through a channel attention fusion module CAF, correspondingly multiplying the feature attention weight obtained by the MLP with the features of the Ar layer to obtain the fused channel attention features;
the characteristics of the CAF module are that each layer respectively uses MaxPool and AvgPool based on the Attention channel direction to obtain the space Attention of each layer;
spatial Attention is reduced in dimension to a single attribute of the same dimension by convolution of 1*1;
the feature after dimension reduction is subjected to a spatial attention fusion module SAF and a fused channel attention feature dot product to obtain a fused attention feature;
after residual linking is carried out on the fused attention features through the ReLU activation layer and the features from Ar, the fused attention features are restored to the output dimension of SelfAttention in GLMBlock through a linear layer Br;
and sending the characteristics of Br as input of RMSNorm to perform model reasoning training.
The whole training process comprises the following steps:
s201, the Input is a query problem (Language Input) in the knowledge-graph triplet training data, and the query problem is converted into word feature vectors through an encoding layer Embedding;
the word feature vector is changed into a feature vector containing position information through position information fusion of a position coding layer rotaryencoding;
the feature vector containing the position information respectively enters an inference module GLMBlock and an S203 attention controllable fine tuning network ACNet, and in addition, the fine tuning module also receives the input of an S202 graph neural network GCN;
the feature vector containing the position information enters an inference module and is normalized by a mean square layer, self-attention layer SelfAttention and a mean square layer respectively, and a multi-layer perceptron MLP infers the feature vector containing the position information;
adding the output characteristics of the reasoning module and the characteristics of the attention controllable fine tuning network ACNet to obtain a characteristic vector fused with knowledge graph information;
sequentially reasoning through 28 reasoning modules GLMBlock, wherein each reasoning module shares input with the attention controllable fine tuning network ACNet, and fuses and adds fine-tuned feature vectors output by the attention controllable fine tuning network ACNet to obtain reasoning feature vectors based on knowledge graph information for the problem;
the integrated and added reasoning feature vector is input and output after being normalized by a mean square layer, and the feature vector is restored to natural language to obtain a trusted answer based on a knowledge graph;
s203, initial Input of the attention controllable fine tuning network ACNet is triplet data KG Control Input in the knowledge-graph triplet training data, and the knowledge-graph triplet feature vector is mapped after the knowledge-graph triplet training data is compressed to a feature space through the graph neural network GCN;
the knowledge-graph triplet feature vector is input into an attention-controllable fine-tuning network ACNet, is subjected to feature space adjustment by a convolution layer Zero Conv initialized to 0, and is spliced with the feature vector containing the position information input by S201 to obtain a knowledge-graph triplet training data feature vector;
thereafter, each inference module GLMBlock corresponds to an attention controllable fine tuning network ACNet;
the input of each attention controllable fine tuning network ACNet is a feature vector which is subjected to the adjustment and mapping of a convolution layer Zero Conv initialized to 0 and is based on knowledge graph information, the feature vector is sequentially subjected to the normalization of a mean square layer RMSNorm, the self-attention layer SelfAttention connected with the residual error of an LRA of an S204 low-rank attention decomposition fine tuning network, the normalization of the mean square layer RMSNorm and the multi-layer perceptron MLP are inferred, each module carries out the mapping adjustment on the inferred feature vector through a convolution layer Zero Conv initialized to 0, and the feature vector after the mapping adjustment is shared as the input of an inference module GLMBLock in S201 and S203;
the self-attention layer SelfAttention in each attention-controllable fine tuning network ACNet is connected with the LRA residual of the S204 low-rank attention resolution fine tuning network;
the Answer and Conclusion defined in the knowledge-graph triplet training data are converted into result feature vectors through the same coding layer Embedding, the result feature vectors are used as the input of a ChatGLM-6B loss function together with the feature vectors obtained after ChatGLM-6B reasoning, the model is subjected to back propagation update training, and all parameters are frozen in S201 and do not participate in training.
According to the embodiment, the industrial knowledge graph is used as the control mode input of the large language model, so that the understanding of the large language model to the knowledge graph and the complex problem inquiry capability based on knowledge graph logic chain decomposition can be enhanced; the knowledge spectrum triplet training data suitable for the large language model based on logic chain decomposition is defined, the sensitivity of the large language model to Prompt words is weakened, and the alignment capability of the large language model and human intention is enhanced; the knowledge graph logic chain reasoning form is used as the output of the large language model, so that traceability of the large language model generated result is realized, and reliability and accuracy of the large language model output result are enhanced.
The embodiment provides a high-efficiency fine tuning training framework of a large language model, uses GCN fusion knowledge graph as control input, uses an attention controllable mechanism and a low-rank attention mechanism to realize high-efficiency controllable increment fine tuning of the large language model, and enhances question answering and query capability of the large language model based on knowledge graph content and relationship;
according to the embodiment, the attention controllable neural network module ACNet is provided, a module which needs to be finely tuned in a large language model is copied, the input and output characteristics are adjusted through a convolution layer which is initialized to 0 through front-back connection, the characteristics output by the fine tuning module and the output characteristics of the freezing module are aligned and added to be used as new output, fusion of the fine tuning module and the freezing module is realized through residual connection, understanding of the large language model on new knowledge can be effectively enhanced, and meanwhile the original general reasoning capacity is reserved as much as possible.
The low-rank attention decomposition fine tuning network module LRA provided by the embodiment decomposes a linear layer in the large language model selfattribute into a low-rank matrix, introduces a channel attention fusion module CAF and a space attention fusion module SAF in a low-rank space at the same time, enhances the extraction and recognition of key features, reduces the training parameters of the large language model, enhances the semantic understanding of a control mode, and enhances the fine tuning effect.
The embodiment provides a knowledge spectrum triplet training data construction method based on logic chain decomposition, which takes a JTian intelligent middle platform as an example, realizes knowledge spectrum-based context learning ICL, and can directly use various industrial knowledge spectrums as input of a large language model; according to the invention, the knowledge graph is used as a control mode for fine adjustment, and the knowledge graph triplet training data based on logic chain decomposition is used to generate a high-quality LLM fine adjustment data set, so that the content and the relation of the knowledge graph can be fully utilized, and the problem of unbalanced distribution of the fine adjustment data is solved.
The knowledge spectrum triplet training data based on knowledge spectrum logic chain decomposition can effectively improve understanding and reasoning capacity of a large language model on logic relations among knowledge spectrum entities, enhance zero sample reasoning capacity of the large language model and improve general reasoning capacity of the large language model on the knowledge spectrum.
The high-efficiency fine tuning method for the large language model, which is provided by the embodiment, uses the fine tuning method based on the attention controllable fine tuning module ACNet and the low-rank attention decomposition fine tuning module LRA to be connected with the frozen parameter residual error of the large language model, so that the understanding and reasoning capacity of the large language model to the knowledge graph is enhanced, and the original general reasoning capacity of the large language model is reserved to the maximum extent.
According to the high-efficiency fine tuning method and the knowledge spectrum triplet training data based on the knowledge spectrum logic chain, the trace source answer based on knowledge spectrum logic chain decomposition is output by fine tuning training of the large language model, and reliability and certainty of output of the large language model are enhanced.
The embodiment is applied to the field of industrial automation, the multi-mode data such as archive text, fault records, knowledge maps and the like existing in the industrial field are used for fine adjustment of the large language model, the dialogue question-answering capability of the large language model in the vertical subdivision field is enhanced, the illusion problem existing in the large language model is lightened, the deterministic answer can be carried out on the problems such as the current factory production environment, product quality and equipment condition, and a constructive improvement scheme and an optimization suggestion are provided according to the existing method.
In this embodiment, a JTian-GLM intelligent middle platform is taken as an example, the model uses a local knowledge vector base constructed by multi-mode data sets including archival texts, fault records, knowledge maps and the like in the industrial field constructed by the intelligent middle platform, automatically constructs knowledge map triples training data, uses a query question mode of enhanced retrieval, uses a trimmed large language model to perform reasoning, and obtains reliable deterministic answers and suggestions based on the enterprise local knowledge base.
The method for fine tuning the large language model comprises the steps of fusing an enterprise knowledge graph logic chain and a causal graph to perform context learning ICL and a thinking chain prompt CoT to perform index enhancement knowledge graph triplet training data construction; including the use of highly efficient controllable fine-tuning neural networks based on attention mechanisms; the high-efficiency controllable fine tuning of the existing large language model in the professional knowledge and vertical field is realized.
The fine tuning training method provided by the embodiment takes the knowledge graph as a control mode, is insensitive to the Prompt word of the Prompt, enhances the robustness and the applicability of the LLM, and improves the use experience of the user; the knowledge graph can be input into the large language model as a new mode, so that the high-efficiency utilization of the existing knowledge graph in the industry is realized; the complex problems are logically chain decomposed by relying on a JTian-GLM intelligent middle platform, a knowledge spectrum triplet training data set is built by combining the knowledge spectrum in the industrial field, and the understanding and reasoning capacity of a large language model on the knowledge spectrum is improved; the fusion of the knowledge graph reasoning capability and the LLM universal capability is realized, the use capability of a large language model on the knowledge graph is enhanced, the zero sample reasoning capability on a new knowledge graph is realized, and the generation of illusion is effectively avoided; the new attention network is fused for fine tuning on the basis of keeping the original weight of LLM, so that the fusion of the expertise in the vertical field is realized, and the universal capability of a large language model is kept to the greatest extent; the method realizes the fusion of the LLM to the knowledge graph logic chain, can output the reasoning process in the form of the knowledge graph logic chain, realizes the tracing of the reasoning result and the reasoning process, and improves the reliability of the LLM reasoning result.
Other portions of this embodiment are the same as those of embodiment 1 described above, and thus will not be described again.
Example 3:
the embodiment proposes a fine tuning system based on a large language model on the basis of any one of the above embodiments 1 to 2, including a preprocessing unit, a construction unit, and a prediction unit;
the preprocessing unit is used for acquiring an industrial knowledge graph from the intelligent center station and generating knowledge graph triplet training data according to the query problem to be input and the industrial knowledge graph;
the construction unit is used for constructing a fine tuning model according to the LLM big language model;
and the prediction unit is used for inputting the triplet training data into the fine tuning model to obtain a knowledge graph logic chain tracing answer.
The embodiment also provides electronic equipment, which comprises a memory and a processor; the memory has a computer program stored thereon; the above-described large language model based tuning method is implemented when the computer program is executed on the processor.
The present embodiment also proposes a computer-readable storage medium having stored thereon computer instructions; when the computer instructions are executed on the electronic device, the method for fine tuning based on the large language model is realized.
Other portions of this embodiment are the same as any of embodiments 1 to 2, and thus will not be described again.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent variation, etc. of the above embodiment according to the technical matter of the present invention fall within the scope of the present invention.

Claims (9)

1. A fine tuning method based on a large language model is characterized in that an industrial knowledge graph is firstly obtained from an intelligent center, knowledge graph triplet training data are generated according to query questions to be input and the industrial knowledge graph, then a fine tuning model is built according to an LLM large language model, and finally the triplet training data are input into the fine tuning model to obtain a knowledge graph logic chain tracing answer.
2. The method for fine tuning based on a large language model according to claim 1, wherein the method for fine tuning based on a large language model specifically comprises the following steps:
step S1: acquiring an industrial knowledge graph from an intelligent center, and generating knowledge graph triplet training data according to a query problem to be input and the industrial knowledge graph;
step S2: constructing a fine tuning training model according to the LLM big language model, and calling an attention controllable fine tuning network to train the fine tuning training model;
step S3: and inputting the knowledge spectrum triplet training data into the fine tuning training model to obtain a knowledge spectrum logic chain tracing answer.
3. The method for fine tuning based on a large language model according to claim 2, wherein the step S1 specifically comprises the steps of:
step S11: invoking a logic chain decomposition method to decompose the query problem to obtain a query sub-problem;
step S12: acquiring an industrial knowledge graph from an intelligent center table, and generating knowledge graph triplet training data according to the query sub-problem and the industrial knowledge graph;
step S13: and taking the knowledge spectrum triplet training data as a control mode, and generating query data according to the query problem corresponding to the knowledge spectrum triplet data.
4. A method for fine tuning based on a large language model according to claim 3, wherein said step S12 comprises the steps of:
step S121: searching the query sub-questions in an associated mode, and searching entities and relations related to the query questions according to the knowledge graph;
step S122: obtaining a knowledge graph subgraph related to the query problem according to the entity and the relationship;
step S123: and decomposing the knowledge graph subgraph into atomic level single logic, and constructing knowledge graph triplet training data according to a set format by the atomic level single logic and the query problem.
5. The method for fine tuning based on a large language model according to claim 4, wherein said step S2 comprises the steps of:
step S21: reasoning the inquiry problem according to the LLM big language model, and predicting a reasoning result;
step S22: invoking a graph neural network to extract the characteristics of the knowledge-graph triplet training data;
step S23: adding a convolution layer initialized to 0 to be connected with a frozen parameter residual error of the LLM big language model, and splicing the reasoning result with the characteristics of the knowledge graph triplet training data to obtain mixed characteristics;
step S24: and constructing a fine tuning training model according to the mixed characteristics, and calling a low-rank decomposition matrix to combine a channel and a spatial attention mechanism to train the fine tuning training model.
6. The method for fine tuning based on a large language model according to claim 5, wherein the step S3 specifically comprises the steps of:
step S31: converting the query problem in the triplet training data into word feature vectors;
step S32: fusing the word feature vector with the acquired position information to obtain a word feature vector containing the position information;
step S33: reasoning the word feature vector containing the position information to obtain a reasoning feature vector;
step S34: fusing the reasoning feature vector with the industrial knowledge graph to obtain a feature vector fused with knowledge graph information;
step S35: and restoring the feature vector fused with the knowledge graph information into natural language to obtain a knowledge graph logic chain tracing answer.
7. The fine tuning system based on the large language model is characterized by comprising a preprocessing unit, a construction unit and a prediction unit; the preprocessing unit is used for acquiring an industrial knowledge graph from the intelligent center station and generating knowledge graph triplet training data according to the query problem to be input and the industrial knowledge graph;
the construction unit is used for constructing a fine tuning model according to the LLM big language model;
and the prediction unit is used for inputting the triplet training data into the fine tuning model to obtain a knowledge graph logic chain tracing answer.
8. An electronic device comprising a memory and a processor; the memory has a computer program stored thereon; the computer program, when executed on the processor, implements a large language model based tuning method as claimed in any one of claims 1-6.
9. A computer-readable storage medium having stored thereon computer instructions; when executed on an electronic device as claimed in claim 8, the computer instructions implement the large language model based tuning method as claimed in any one of claims 1-6.
CN202311261455.4A 2023-09-27 2023-09-27 Fine tuning method, system, equipment and medium based on large language model Pending CN117290480A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311261455.4A CN117290480A (en) 2023-09-27 2023-09-27 Fine tuning method, system, equipment and medium based on large language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311261455.4A CN117290480A (en) 2023-09-27 2023-09-27 Fine tuning method, system, equipment and medium based on large language model

Publications (1)

Publication Number Publication Date
CN117290480A true CN117290480A (en) 2023-12-26

Family

ID=89256825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311261455.4A Pending CN117290480A (en) 2023-09-27 2023-09-27 Fine tuning method, system, equipment and medium based on large language model

Country Status (1)

Country Link
CN (1) CN117290480A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556059A (en) * 2024-01-12 2024-02-13 天津滨电电力工程有限公司 Detection and correction method based on knowledge fusion and reasoning charging station data
CN117743315A (en) * 2024-02-20 2024-03-22 浪潮软件科技有限公司 Method for providing high-quality data for multi-mode large model system
CN117876651A (en) * 2024-03-13 2024-04-12 浪潮电子信息产业股份有限公司 Visual positioning method, device, equipment and medium
CN117556059B (en) * 2024-01-12 2024-05-31 天津滨电电力工程有限公司 Detection and correction method based on knowledge fusion and reasoning charging station data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556059A (en) * 2024-01-12 2024-02-13 天津滨电电力工程有限公司 Detection and correction method based on knowledge fusion and reasoning charging station data
CN117556059B (en) * 2024-01-12 2024-05-31 天津滨电电力工程有限公司 Detection and correction method based on knowledge fusion and reasoning charging station data
CN117743315A (en) * 2024-02-20 2024-03-22 浪潮软件科技有限公司 Method for providing high-quality data for multi-mode large model system
CN117743315B (en) * 2024-02-20 2024-05-14 浪潮软件科技有限公司 Method for providing high-quality data for multi-mode large model system
CN117876651A (en) * 2024-03-13 2024-04-12 浪潮电子信息产业股份有限公司 Visual positioning method, device, equipment and medium
CN117876651B (en) * 2024-03-13 2024-05-24 浪潮电子信息产业股份有限公司 Visual positioning method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN117290480A (en) Fine tuning method, system, equipment and medium based on large language model
CN111368545B (en) Named entity recognition method and device based on multitask learning
CN116881428B (en) Language model training method and device
US20220004547A1 (en) Method, apparatus, system, device, and storage medium for answering knowledge questions
CN115017178A (en) Training method and device for data-to-text generation model
CN117194986A (en) Information recommendation model training method and device, storage medium and electronic equipment
Gordon-Hall et al. Show us the way: Learning to manage dialog from demonstrations
Jiang et al. Large Language Model Enhanced Multi-Agent Systems for 6G Communications
Li et al. Memory efficient optimizers with 4-bit states
CN117332852A (en) Knowledge graph-based large model training deployment method and system
Roussopoulos et al. An adaptable methodology for database design
Gu et al. A knowledge-intensive method for conversational CBR
EP4060579A1 (en) Method and system for evaluating performance of developers using artificial intelligence (ai)
Banik et al. User-controlled, robust natural language generation from an evolving knowledge base
Mok et al. Scaling understanding up to mental spaces
Karapantelakis et al. Using Large Language Models to Understand Telecom Standards
CN117573842B (en) Document retrieval method and automatic question-answering method
Zou et al. GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
JP7012811B1 (en) Search device, search method, and program
Beg et al. PNL-enhanced restricted domain question answering system
Brezillon Contextualized explanations
WO2023273237A1 (en) Model compression method and system, electronic device, and storage medium
Feng et al. A unified implicit dialog framework for conversational search
WO2024041350A1 (en) Intention recognition method and apparatus, electronic device, and storage medium
Abane et al. An Adaptable AI Assistant for Network Management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination