CN117592563A - Power large model training and adjusting method with field knowledge enhancement - Google Patents

Power large model training and adjusting method with field knowledge enhancement Download PDF

Info

Publication number
CN117592563A
CN117592563A CN202311425644.0A CN202311425644A CN117592563A CN 117592563 A CN117592563 A CN 117592563A CN 202311425644 A CN202311425644 A CN 202311425644A CN 117592563 A CN117592563 A CN 117592563A
Authority
CN
China
Prior art keywords
model
power
knowledge
vector
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311425644.0A
Other languages
Chinese (zh)
Inventor
马永
雷霆
张靖
周明
郭洋
路宇
许冬
赵煜阳
王俊
董夏磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd
Original Assignee
Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd filed Critical Information and Telecommunication Branch of State Grid Anhui Electric Power Co Ltd
Priority to CN202311425644.0A priority Critical patent/CN117592563A/en
Publication of CN117592563A publication Critical patent/CN117592563A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of artificial intelligence in the field of computers, and discloses a training and adjusting method of an electric power base large model with enhanced field knowledge, which comprises the following steps of 1, continuously training on the basis of a universal base model to obtain the electric power base large model; step 2, designing a method for supervising and learning instructions in the electric power field, and performing full fine adjustment on the large electric power base model by using an instruction data set in the electric power field to obtain an electric power instruction large model; step 3, vector retrieval and knowledge retrieval technology are adopted to carry out vectorization representation on the knowledge base text of the electric power; and 4, designing a fault diagnosis large model, inputting a power signal and a power instruction text, and outputting fault description by the fault diagnosis large model. The invention improves the prediction accuracy and the fault diagnosis accuracy of the system, improves the information transmission efficiency and quality, reduces the possibility of illusion generation, improves the efficiency of maintenance personnel and shortens the recovery time of the system.

Description

Power large model training and adjusting method with field knowledge enhancement
Technical Field
The invention belongs to the field of artificial intelligence in the field of computers, and particularly relates to a training and adjusting method for an electric power large model with enhanced field knowledge.
Background
In the traditional power field, the promotion of expertise retrieval questions and answers, system fault diagnosis capability and the like is always a key challenge. In recent years, large model technologies have made remarkable progress in the general field, however, application of these technologies to electric power systems faces a series of unique problems. The general large model often cannot fully consider the professional knowledge and special data characteristics in the power field, so that the general large model has poor performance on power tasks; meanwhile, the electric power data used for many problems such as fault diagnosis and the like are not natural language texts, have special formats and huge scale, and are difficult to be directly applied to training of a text large model.
Data in the power domain is typically from various sensors and monitoring devices, but challenges exist in terms of quality and quantity. Incomplete, noisy, or inaccurate data may affect the performance of the model. Furthermore, obtaining sufficient tagged training data may also be a problem, especially for specific power tasks. While large models can be trained to understand natural language in the power domain, integrating domain expertise into the model remains a challenge. This requires a close collaboration between the domain expert and the data scientist to ensure that the model is able to understand and apply domain specific knowledge correctly. Stability and reliability of the power system are critical, but large deep learning models often lack interpretability. This means that the reasoning and logic behind the model is difficult to understand as it makes decisions. In the power domain, this uncertainty may not be neglected. The power system is a critical infrastructure, facing potential security threats. When artificial intelligence is applied to a power system, security and privacy protection of models must be enhanced to prevent potential attacks and data leakage.
Therefore, although artificial intelligence has achieved some remarkable research results in the power field, challenges in data quality, field knowledge integration, model interpretation, safety and the like still need to be overcome so as to better exert potential thereof and improve reliability and efficiency of a power system. Moreover, close collaboration with power domain experts would be a key factor in success.
Therefore, in order to fully exploit the potential of large model technology in the power domain, a new training technology is needed that can effectively fuse domain knowledge, process large-scale power data, and achieve targeted fine tuning.
Disclosure of Invention
In order to solve the technical problems, the invention provides a training and tuning method for a large electric power model with enhanced domain knowledge, which aims to realize the training and fine tuning technology for the large electric power model with enhanced domain knowledge so as to improve the intelligent level of an electric power system.
In order to achieve the above purpose, the invention is realized by the following technical scheme:
the invention relates to a training and adjusting method for a large electric power model with enhanced domain knowledge, which is characterized by comprising the following steps of: the power large model training and adjusting method comprises the following steps:
step 1, continuously training aiming at the characteristics and data characteristics of the power field on the basis of a general base model to realize the field rapid adaptation of the base model and obtain a power base large model;
step 2, designing a method for supervising and learning the power field instruction, and performing full-scale fine adjustment on the power base large model obtained in the step 1 by using a power field instruction data set so that the power base large model follows the instruction to execute a corresponding task to obtain a power instruction large model;
and 3, carrying out vectorization representation on a knowledge base text of electric power, such as an electric power teaching material text, a fault diagnosis text and an operation manual text by adopting a vector retrieval and knowledge retrieval technology, and matching candidate text blocks in the knowledge base in a vector retrieval mode to improve the question-answer effect of knowledge, wherein the method specifically comprises the following steps:
step 3.1, constructing a knowledge vector database:
step 3.2, designing a vector retrieval and large model question-answering algorithm;
and 4, designing a fault diagnosis large model which integrates various signal characteristics to perform fault diagnosis, inputting power signals and power instruction texts, and outputting fault description by the fault diagnosis large model.
The invention further improves that: the step 1 specifically comprises the following steps:
step 1.1, selecting a general pre-training model as a base model;
step 1.2, collecting large-scale data in the electric power field to form a large-scale data set, introducing the large-scale data set in the electric power field into a base model, and continuing training the base model by utilizing the idea of transfer learning, wherein the large-scale data set in the electric power field comprises news, knowledge, blogs, running logs, equipment parameters and engineering reports of an electric power system.
The invention further improves that: the step 1 carries out continuous training aiming at the characteristics and the data characteristics of the electric power field, and the continuous training is specifically as follows:
the vocabulary of the universal base model is expanded by using Sentence piece, the Sentence piece regards input text as a series of Unicode characters, language-independent logic, and a token is trained from a large-scale dataset, the vocabulary of the knowledge model in the electric power domain is expanded lightweight and rapidly, and in the training process, the vocabulary of the knowledge model in the electric power domain is aimed at an input sequence U= { U 1 ,u 2 ,…u T Using an extended Tokenizer to convert the input sequence U into a continuous vector representation x= { X 1 ,x 2 ,…x N And added to the position codes to capture information at different positions in the input sequence, and then using a self-attention mechanism to calculate context information for each position of the successive vector representation X, resulting in an attention score:
Q=W Q X,K=W K X,V=W V X
information is processed in parallel through a plurality of attention heads to capture different semantic information, and the calculation of the multiple attention heads is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,…,head h )W O
wherein head i =ATTN(QW i Q ,KW i K ,VW i V ) Further transformation and nonlinear processing of the representation of each position is performed by a feed-forward neural network, and finally, linear transformation is performed at the output of the model, and then a Softmax function is applied to calculate the probability distribution of each word element.
The invention further improves that: the step 2 specifically comprises the following steps:
step 2.1, collecting instruction data in the electric power field, forming an instruction data set, and preparing an instruction; inputting; an output > triplet, wherein the input may be null, wherein the power domain instruction data includes questions and answers to power knowledge, control of the power device, operational steps, and operational guidelines;
step 2.2, inviting an expert to check and correct incorrect answers of the instruction data set, expanding the form of the instruction by a method of rewriting instruction sentences, and obtaining labeling data corresponding to the power instruction text and the answer text;
step 2.3, using a supervised learning method, taking the obtained power instruction text as input and taking the expected answer text as output;
and 2.4, measuring the difference between the output of the step 3 and the real answer by adopting the conditional probability maximum likelihood estimation so as to guide the power command large model to learn the correct operation mapping.
The invention further improves that: the step 2.4 specifically comprises the following steps:
assuming the instruction is I, the input is X, the output is Y, where Y is T in length, then the penalty function is:
where i is the location of the answer text and y represents the lemma in the answer text.
The invention further improves that: the step 3.1 of constructing a knowledge vector database specifically comprises the following steps:
step 3.1.1, loading an externally hung knowledge text, segmenting the knowledge text according to the length, and segmenting the long knowledge text into shorter segments;
step 3.1.2, vectorizing the shorter paragraphs cut in the step 1 by using an Embedding acquisition operation of the large electric base model, and generating a corresponding vector representation for each paragraph;
and 3.1.3, selecting local storage or cloud storage to realize the storage of the vector database.
The invention further improves that: the step 3.2 of designing vector retrieval and large model question-answering algorithm specifically comprises the following steps:
step 3.2.1, carrying out semantic vectorization on a problem (query) of a user, thereby obtaining a vector representing the problem;
step 3.2.2, matching k paragraphs which are most similar to the problem vector in the vector database by calculating cosine similarity scores between the problem vector in step 3.2.1 and the candidate text block vector in the vector database, wherein the paragraphs become the context of the follow-up question-and-answer process, and are added into a prompt word template of the power instruction big model together with the user problem to form a prompt with context information;
step 3.2.3, submitting the prompt of step 3.2.2 to a power instruction big model, which uses the words of the prompt containing the context information to conduct accurate questions and answers, and provides more valuable replies for the user.
The invention further improves that: the step 4 specifically comprises the following steps:
step 4.1, data acquisition and pretreatment: collecting real-time operation data, sensor information and monitoring reports from the power system, wherein the data comprise voltage, current, temperature and frequency, and preprocessing the collected data, including data cleaning, abnormal value detection and data alignment, so as to ensure the quality and consistency of input data;
step 4.2, feature extraction and representation learning: extracting features and representation learning using a power command big model for input power command text X q Directly inputting a fault diagnosis large model, for a power signal, firstly dividing the signal to form a feature vector X which is the same as the dimension of a word element of the fault diagnosis large model d Then, for the characteristic vector X of the power signal d Multiplying by a learned matrix W, mapping to text vector space H d =X d The dimension of the W, W matrix depends on the dimensions of the two vector spacesFor the power signal characteristics, converting the original data into a form which can be understood by a fault diagnosis large model, and extracting characteristics related to faults from the form, wherein the characteristics are from text and signal characteristics corresponding to fault description analysis;
step 4.3: fault analysis and description generation: transmitting the characteristics obtained in the step 4.2 to a large fault diagnosis model, generating a training for fault analysis and description generation, classifying fault types, analyzing root causes according to the input characteristics and base model knowledge by the large fault diagnosis model after determining the fault types,
step 4.4: the large fault diagnosis model will exert the capability of generating large fault diagnosis models according to the fault classification result of the step 4.3, and generate detailed fault descriptions and reports, including fault types, possible root causes, influence ranges and suggested repair measures.
The beneficial effects of the invention are as follows:
1. accuracy promotes: traditional models have limited performance in terms of power system problems, and general large models lack guidance for domain expertise. According to the invention, the model is continuously trained in the electric power field, and the electric power field instruction supervised learning and the professional knowledge question-answering are combined, so that the model can more accurately understand and process the electric power task, and the prediction accuracy of the system and the fault diagnosis accuracy are improved.
2. Rapid adaptation: the introduction of the universal base model enables the model to have good basic performance. The invention enables the model to be rapidly adapted to the complex characteristics of the electric power field by continuing training on the basis. This rapid adaptation makes the technique more viable and efficient in practical applications than training the model from scratch.
3. Expert knowledge transfer: through the expertise questions and answers of vector retrieval, the invention can transmit the expertise of the electric power field to the user in an intelligent mode. The user can quickly obtain detailed and comprehensive solutions related to the power system, so that the efficiency and quality of information transmission are improved, and the possibility of illusion generation is reduced.
4. And the fault diagnosis efficiency is improved: based on the enhanced model fine tuning technology, the method has remarkable advantages in the aspect of system fault diagnosis. The model fuses fault cases in the power field in the training process, so that potential fault signals can be perceived more sharply, and more targeted diagnosis suggestions can be given. This will increase the efficiency of maintenance personnel and shorten the system recovery time.
Drawings
FIG. 1 is a schematic diagram of the training method of the present invention.
Fig. 2 is a schematic diagram of the vector search in step 3 of the present invention.
Fig. 3 is a schematic diagram of the fault diagnosis of step 4 of the present invention.
Detailed Description
Embodiments of the invention are disclosed in the drawings, and for purposes of explanation, numerous practical details are set forth in the following description. However, it should be understood that these practical details are not to be taken as limiting the invention. That is, in some embodiments of the invention, these practical details are unnecessary.
As shown in FIG. 1, the invention relates to a training and fine tuning method of a large electric power model with enhanced domain knowledge, namely a training and fine tuning method of a large electric power model. Secondly, the power field instruction supervised learning is adopted, and the guidance of a field expert is introduced, so that the model can better understand and execute the power system instruction output. In the aspect of specialized knowledge question-answering, based on the large model of the electric power field trained in the prior art, the specialized knowledge in the field is structured into vector representation by establishing a knowledge library of the electric power field, so that the accurate knowledge question-answering based on vector retrieval is realized. Most importantly, aiming at fault diagnosis of the power system, the invention adopts a model fine tuning technology based on power characteristic enhancement, and integrates a fault mode and a history case in the power field into a model fine tuning process, so that the model can more accurately capture potential fault signs, generate fault word description and provide more targeted diagnosis suggestions for maintenance personnel.
The electric power large model training and fine tuning method comprises the following steps:
step 1, training a general base model by using related data in the electric power field to enable the model to learn specific characteristics and knowledge, so as to obtain a large electric power base model suitable for the electric power field: firstly, a general pre-training model allowing commercial authorization, such as BaiChuan or LLaMA2, is selected as a base model, and the model is fully pre-trained on large-scale general data in the form of predicting the next word element according to the previous, has certain language understanding and generating capability, and has better performance in other fields. And then, collecting large-scale data in the electric power field, including news, knowledge, blogs, running logs, equipment parameters, engineering reports and the like of the electric power system, wherein the data cover technical details, terms and practical application scenes of the electric power field, introducing a large-scale data set in the electric power field, and utilizing the idea of transfer learning to train the base model continuously. During the training process, the model weights are adjusted step by step to adapt to the specific data characteristics and task requirements of the power system. The continuous training is specifically as follows:
the vocabulary of the universal base model is expanded by using Sentence piece, the Sentence piece regards input text as a series of Unicode characters, language-independent logic, and a token is trained from a large-scale dataset, the vocabulary of the knowledge model in the electric power domain is expanded lightweight and rapidly, and in the training process, the vocabulary of the knowledge model in the electric power domain is aimed at an input sequence U= { U 1 ,u 2 ,…u T Using an extended Tokenizer to convert the input sequence U into a continuous vector representation x= { X 1 ,x 2 ,…x N And added to the position codes to capture information at different positions in the input sequence, and then using a self-attention mechanism to calculate context information for each position of the successive vector representation X, resulting in an attention score:
Q=W Q X,K=W K X,V=W V X
information is processed in parallel through a plurality of attention heads to capture different semantic information, and the calculation of the multiple attention heads is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,…,head h )W O
wherein head i =ATTN(QW i Q ,KW i K ,VW i V ) Further transformation and nonlinear processing of the representation of each position is performed by a feed-forward neural network, and finally, linear transformation is performed at the output of the model, and then a Softmax function is applied to calculate the probability distribution of each word element.
The optimization objective still employs maximum likelihood estimation, using Negative Log Likelihood (NLL) as a loss function:
after the field adaptation training on the field data set, the base model has knowledge of the electric power field, and is updated into a large electric power base model. The model can output subsequent text to perform a series of tasks such as power knowledge questions and answers, power text document summaries, term of art translations, and the like, by inputting natural language text. The large electric power base model can provide support for a plurality of application scenes such as knowledge enhancement, text generation and task processing in the electric power field, and therefore the aims of the large electric power model training and fine adjustment technology with the enhanced knowledge in the field are achieved. By means of the method, the universal base model is successfully combined with the professional knowledge of the electric power field, the large electric power base model with enhanced field knowledge is creatively realized, and powerful support and tools are provided for technical progress and application of the electric power field.
And 2, designing a method for supervising and learning the power field instruction, and performing full-scale fine adjustment on the power base large model obtained in the step 1 by using a power field instruction data set, so that the power base large model follows instructions to execute corresponding tasks, and a power instruction large model is obtained.
In the step 1, a large power base model is obtained, but direct question-answer dialogue fault analysis and other operations cannot be performed, and expert knowledge in the power field is introduced in order to enable the model to accurately understand question-answer instructions or operation instructions of a power system. The expert provides the usual instructions and answers to form a supervision dataset. In the model training process, the supervision data are utilized, and a supervision learning mode is adopted, so that the model learns the correct problem or the output result corresponding to the instruction, and the accuracy of the model in the aspect of instruction processing of the power system is improved.
The step 2 specifically comprises the following steps:
step 2.1, collecting instruction data in the electric power field, forming an instruction data set, and preparing an instruction; inputting; the output > triples, wherein the input can be null, and the power domain instruction data comprise questions and answers of power knowledge, control of power equipment, operation steps and operation guidance, wherein the data come from a plurality of channels such as operation manuals, training materials, actual operation records and the like, so that the model can learn a wide and real operation scene.
The essence of the instruction dataset is to prepare < instructions; inputting; output > triplets, where the input may be empty, such as < "what is the basic procedure for switching operations of high voltage distribution equipment? "; empty; the basic procedure for switching operation of the high-voltage distribution equipment is … ". In order to construct a command data set with high quality and diversity, an expert is invited to check and correct incorrect answers of the command data set, the form of a command is expanded through a method of rewriting a command sentence, and after the command data is corrected and expanded by the expert, the power command text and the label data corresponding to the answer text are obtained.
In the model training stage, using a supervised learning method, taking the obtained power instruction text as input and taking the expected answer text as output; and (3) measuring the difference between the output of the step (3) and the real answer by using the conditional probability maximum likelihood estimation so as to guide the power instruction large model to learn the correct operation mapping.
Assuming the instruction is I, the input is X, the output is Y, where Y is T in length, then the penalty function is:
where i is the location of the answer text and y represents the lemma in the answer text.
The large power instruction model is obtained after the model converges by performing supervised learning training on the marked power instruction data set by the large power base model. The structure of the model is consistent with that of the base model, but the model parameters have knowledge of power instructions and answers, and the model has the capability of replying according to the instructions.
And 3, in order to realize accurate professional knowledge question and answer, firstly constructing a knowledge base text of the electric power, wherein the knowledge base text comprises an electric power teaching material text, a fault diagnosis text and an operation manual text. The document blocks where knowledge elements such as key concepts, principles, problems and solutions of the power system are located are stored in the map in the form of vector representation. In the question and answer process, the questions presented by the user are also converted into vector representations, and the document speed where the most relevant knowledge elements are located is searched out from the knowledge graph by using a vector search technology and is used as an answer candidate. And then inputting the final answer into the instruction model obtained in the second step for reading and understanding. In this way, the model can quickly and accurately answer the expertise questions of the user. As shown in fig. 2.
The method specifically comprises the following steps:
step 3.1, constructing a knowledge vector database:
step 3.1.1, loading an externally hung knowledge text, segmenting the knowledge text according to the length, and segmenting the long knowledge text into shorter segments; this helps to increase the efficiency of vectorization, making the storage and retrieval of databases more convenient, while alleviating the problem of limited contextual window length of large models.
Step 3.1.2, vectorizing the shorter paragraphs cut in step 1 by using an Embedding acquisition operation of the large electric base model, wherein the process generates a corresponding vector representation for each paragraph;
step 3.1.3, in order to realize the storage of the vector database, excellent tools such as Annoy or FAISS in the local storage or cloud storage can be selected. These tools are able to efficiently process large-scale vector data and provide fast similarity search work.
Step 3.2, designing a vector retrieval and large model question-answering algorithm;
the step 3.2 of designing vector retrieval and large model question-answering algorithm specifically comprises the following steps:
step 3.2.1, carrying out semantic vectorization on a problem (query) of a user, thereby obtaining a vector representing the problem; through the vector, semantic similarity of the problem can be better measured, so that more accurate text matching is realized.
Step 3.2.2, matching k paragraphs most similar to the question vector in the vector database by calculating cosine similarity scores between the question vector in step 3.2.1 and the candidate text block vector in the vector database, wherein the paragraphs become the context of the follow-up question-and-answer process, and adding the context to a prompt word template of the power instruction big model together with the user questions to form a prompt with context information. In this way, a more contextual cue can be formed that contains knowledge of facts about the problem, helping the large model to better understand the intent of the user.
Step 3.2.3, submitting the prompt of step 3.2.2 to a power instruction big model, which uses the words of the prompt containing the context information to conduct accurate questions and answers, and provides more valuable replies for the user. The power instruction big model can use prompt words containing context information to conduct more accurate questions and answers and provide more valuable answers for users.
Since the power base large model and the power instruction large model are both plain text large models, how to input other non-text characteristic signals into the large models and fully develop the capability of modeling and analyzing the characteristics of the large models to perform fault analysis is a very important and challenging problem.
Aiming at the fault diagnosis problem of the power system, a model fine tuning technology based on non-text electric power characteristic enhancement is adopted. And constructing a fault case library by introducing a fault mode and a history case in the power field. In the fine tuning process of the model, fault cases are fused into training data, so that the model can more accurately capture potential fault signs for classification, and meanwhile, fault description is output through characters, and the accuracy of diagnosis is improved. When the system is abnormal, the model can refer to similar cases, give more targeted fault diagnosis suggestions, and help maintenance personnel to quickly locate the problem. As shown in fig. 3, a large fault diagnosis model which fuses multiple signal features to perform fault diagnosis is designed, a power signal and a power instruction text are input, and the large fault diagnosis model outputs fault description specifically as follows:
step 4.1, data acquisition and pretreatment: collecting real-time operation data, sensor information and monitoring reports from the power system, wherein the data comprise voltage, current, temperature and frequency, and preprocessing the collected data, including data cleaning, abnormal value detection and data alignment, so as to ensure the quality and consistency of input data;
step 4.2, feature extraction and representation learning: extracting features and representation learning using a power command big model for input power command text X q The fault diagnosis large model is directly input, but for the power signal, firstly, a feature vector X which is the same as the dimension of the word element of the fault diagnosis large model is formed by dividing the signal d Then, as shown in FIG. 3, since the power signal and the text signal are not in one vector space, the present invention then applies to the eigenvector X of the power signal d Multiplying by a learned matrix W, mapping to text vector space H d =X d The dimension of the W, W matrix depends on the dimension of two vector spaces, and for the power signal characteristics, the original data are converted into a form which can be understood by a large fault diagnosis model, and the characteristics related to faults are extracted from the form, wherein the characteristics are from the characteristics of texts and signals corresponding to fault description analysis;
step 4.3: fault analysis and description generation: the features obtained in the step 4.2 are transferred to a large fault diagnosis model, and generated fault analysis and description generated training are generated. The system will consider a number of possibilities and generate inferences and explanations regarding the root cause of the fault. Finally: the large fault diagnosis model will exert the capability of generating large fault diagnosis models according to the fault classification result of the step 4.3, and generate detailed fault descriptions and reports, including fault types, possible root causes, influence ranges and suggested repair measures.
Experimental results
Aiming at the professional knowledge question and answer and the fault analysis description, two tasks are generated, and 500 test sets are respectively prepared for evaluating the model effect. Because the large models all adopt a generating mode to generate answers, the invention adopts BLEU indexes to evaluate.
(1) Expertise answer generation
Model\index BLEU BLEU-1 BLEU-4
Baichuan2-7B-Chat 3.22 9.87 1.05
Large electric power instruction model 9.32 17.53 4.26
(2) Fault analysis description generation
Model\index BLEU BLEU-1 BLEU-4
Baichuan2-7B-Chat 13.67 25.33 5.72
Large fault diagnosis model 24.12 36.89 9.13
The invention is compared with an open source model Baichuan2-7B-Chat, and experimental results show that the technology in the scheme of the invention is better than a general large model in two tasks.
In summary, the invention uses the field knowledge enhancement as the core, and effectively solves the technical problems faced by large model application in the electric power field through continuous training, instruction supervised learning, vector retrieval, system fault diagnosis and other key modules. The technology can make full use of the expertise of the electric power field of the model, quickly adapt to the complex characteristics of the electric power system and improve the accuracy and efficiency of the electric power task. Meanwhile, powerful support is provided for the operation and development of the power industry through intelligent expertise questions and answers and accurate system fault diagnosis, and the power system is pushed to advance towards the intelligent direction. The invention has wide application prospect, is expected to have profound effect in the field of power systems, and promotes innovation and progress in the field of power.
The foregoing description is only illustrative of the invention and is not to be construed as limiting the invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the present invention, should be included in the scope of the claims of the present invention.

Claims (8)

1. A training and adjusting method for a large electric power model with enhanced domain knowledge is characterized by comprising the following steps of: the power large model training and adjusting method comprises the following steps:
step 1, continuously training aiming at the characteristics and data characteristics of the power field on the basis of a general base model to realize the field rapid adaptation of the base model and obtain a power base large model;
step 2, designing a method for supervising and learning the power field instruction, and performing full-scale fine adjustment on the power base large model obtained in the step 1 by using a power field instruction data set so that the power base large model follows the instruction to execute a corresponding task to obtain a power instruction large model;
and 3, carrying out vectorization representation on the knowledge base text of the power by adopting a vector retrieval and knowledge retrieval technology, and matching candidate text blocks in the knowledge base in a vector retrieval mode to improve the question-answer effect of the knowledge, wherein the method specifically comprises the following steps:
step 3.1, constructing a knowledge vector database:
step 3.2, designing a vector retrieval and large model question-answering algorithm;
and 4, designing a fault diagnosis large model which integrates various signal characteristics to perform fault diagnosis, inputting power signals and power instruction texts, and outputting fault description by the fault diagnosis large model.
2. The domain knowledge enhancement power large model training method according to claim 1, wherein the method comprises the following steps: the step 1 specifically comprises the following steps:
step 1.1, selecting a general pre-training model as a base model;
step 1.2, collecting large-scale data in the electric power field to form a large-scale data set, introducing the large-scale data set in the electric power field into a base model, and continuing training the base model by utilizing the idea of transfer learning, wherein the large-scale data set in the electric power field comprises news, knowledge, blogs, running logs, equipment parameters and engineering reports of an electric power system.
3. The domain knowledge enhancement power big model training method according to claim 1 or 2, wherein: the step 1 carries out continuous training aiming at the characteristics and the data characteristics of the electric power field, and the continuous training is specifically as follows:
the vocabulary of the universal base model is expanded by using Sentence piece, the Sentence piece regards input text as a series of Unicode characters, language-independent logic, and a token is trained from a large-scale dataset, the vocabulary of the knowledge model in the electric power domain is expanded lightweight and rapidly, and in the training process, the vocabulary of the knowledge model in the electric power domain is aimed at an input sequence U= { U 1 ,u 2 ,…u T Using an extended Tokenizer to convert the input sequence U into a continuous vector representation x= { X 1 ,x 2 ,…x N And added to the position codes to capture information at different positions in the input sequence, and then using a self-attention mechanism to calculate context information for each position of the successive vector representation X, resulting in an attention score:
Q=W Q X,K=W K X,V=W V X
information is processed in parallel through a plurality of attention heads to capture different semantic information, and the calculation of the multiple attention heads is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,…,head h )W O
wherein,further transformation and nonlinear processing of the representation of each position is performed by a feed-forward neural network, and finally, linear transformation is performed at the output of the model, and then a Softmax function is applied to calculate the probability distribution of each word element.
4. The domain knowledge enhancement power large model training method according to claim 1, wherein the method comprises the following steps: the step 2 specifically comprises the following steps:
step 2.1, collecting instruction data in the electric power field, forming an instruction data set, and preparing an instruction; inputting; an output > triplet, wherein the input may be null, wherein the power domain instruction data includes questions and answers to power knowledge, control of the power device, operational steps, and operational guidelines;
step 2.2, inviting an expert to check and correct incorrect answers of the instruction data set, expanding the form of the instruction by a method of rewriting instruction sentences, and obtaining labeling data corresponding to the power instruction text and the answer text;
step 2.3, using a supervised learning method, taking the obtained power instruction text as input and taking the expected answer text as output;
and 2.4, measuring the difference between the output of the step 3 and the real answer by adopting the conditional probability maximum likelihood estimation so as to guide the power command large model to learn the correct operation mapping.
5. The domain knowledge enhancement power big model training method according to claim 4, wherein: the step 2.4 specifically comprises the following steps:
assuming the instruction is I, the input is X, the output is Y, where Y is T in length, then the penalty function is:
where i is the location of the answer text and y represents the lemma in the answer text.
6. The domain knowledge enhancement power big model training method according to claim 5, wherein the domain knowledge enhancement power big model training method comprises the following steps: the step 3.1 of constructing a knowledge vector database specifically comprises the following steps:
step 3.1.1, loading an externally hung knowledge text, segmenting the knowledge text according to the length, and segmenting the long knowledge text into shorter segments;
step 3.1.2, vectorizing the shorter paragraphs cut in the step 1 by using an Embedding acquisition operation of the large electric base model, and generating a corresponding vector representation for each paragraph;
and 3.1.3, selecting local storage or cloud storage to realize the storage of the vector database.
7. The domain knowledge enhancement power big model training method according to claim 1 or 6, wherein: the step 3.2 of designing vector retrieval and large model question-answering algorithm specifically comprises the following steps:
step 3.2.1, carrying out semantic vectorization on a problem (query) of a user, thereby obtaining a vector representing the problem;
step 3.2.2, matching k paragraphs which are most similar to the problem vector in the vector database by calculating cosine similarity scores between the problem vector in step 3.2.1 and the candidate text block vector in the vector database, wherein the paragraphs become the context of the follow-up question-and-answer process, and are added into a prompt word template of the power instruction big model together with the user problem to form a prompt with context information;
step 3.2.3, submitting the prompt of step 3.2.2 to a power instruction big model, which uses the words of the prompt containing the context information to conduct accurate questions and answers, and provides more valuable replies for the user.
8. The domain knowledge enhancement power large model training method according to claim 1, wherein the method comprises the following steps: the step 4 specifically comprises the following steps:
step 4.1, data acquisition and pretreatment: collecting real-time operation data, sensor information and monitoring reports from the power system, wherein the data comprise voltage, current, temperature and frequency, and preprocessing the collected data, including data cleaning, abnormal value detection and data alignment, so as to ensure the quality and consistency of input data;
step 4.2, feature extraction and representation learning: extracting features and representation learning using a power command big model for input power command text X q Directly inputting a fault diagnosis large model, for a power signal, firstly dividing the signal to form a feature vector X which is the same as the dimension of a word element of the fault diagnosis large model d Then, for the characteristic vector X of the power signal d Multiplying by a learned matrix W, mapping to text vector space H d =X d The dimension of the W, W matrix depends on the dimension of two vector spaces, and for the power signal characteristics, the original data are converted into a form which can be understood by a large fault diagnosis model, and the characteristics related to faults are extracted from the form, wherein the characteristics are from the characteristics of texts and signals corresponding to fault description analysis;
step 4.3: fault analysis and description generation: transmitting the characteristics obtained in the step 4.2 to a large fault diagnosis model, generating a training for fault analysis and description generation, classifying fault types, analyzing root causes according to the input characteristics and base model knowledge by the large fault diagnosis model after determining the fault types,
step 4.4: the large fault diagnosis model will exert the capability of generating large fault diagnosis models according to the fault classification result of the step 4.3, and generate detailed fault descriptions and reports, including fault types, possible root causes, influence ranges and suggested repair measures.
CN202311425644.0A 2023-10-31 2023-10-31 Power large model training and adjusting method with field knowledge enhancement Pending CN117592563A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311425644.0A CN117592563A (en) 2023-10-31 2023-10-31 Power large model training and adjusting method with field knowledge enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311425644.0A CN117592563A (en) 2023-10-31 2023-10-31 Power large model training and adjusting method with field knowledge enhancement

Publications (1)

Publication Number Publication Date
CN117592563A true CN117592563A (en) 2024-02-23

Family

ID=89919165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311425644.0A Pending CN117592563A (en) 2023-10-31 2023-10-31 Power large model training and adjusting method with field knowledge enhancement

Country Status (1)

Country Link
CN (1) CN117592563A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117873909A (en) * 2024-03-13 2024-04-12 上海爱可生信息技术股份有限公司 Fault diagnosis execution method, fault diagnosis execution system, electronic device, and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117873909A (en) * 2024-03-13 2024-04-12 上海爱可生信息技术股份有限公司 Fault diagnosis execution method, fault diagnosis execution system, electronic device, and storage medium
CN117873909B (en) * 2024-03-13 2024-05-28 上海爱可生信息技术股份有限公司 Fault diagnosis execution method, fault diagnosis execution system, electronic device, and storage medium

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN111737496A (en) Power equipment fault knowledge map construction method
CN107562792A (en) A kind of question and answer matching process based on deep learning
CN111563149B (en) Entity linking method for Chinese knowledge map question-answering system
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN114926150B (en) Digital intelligent auditing method and device for transformer technology compliance assessment
CN106682089A (en) RNNs-based method for automatic safety checking of short message
CN111914555B (en) Automatic relation extraction system based on Transformer structure
CN117592563A (en) Power large model training and adjusting method with field knowledge enhancement
CN115587207A (en) Deep hash retrieval method based on classification label
CN114492460B (en) Event causal relationship extraction method based on derivative prompt learning
CN115270797A (en) Text entity extraction method and system based on self-training semi-supervised learning
CN113392191B (en) Text matching method and device based on multi-dimensional semantic joint learning
CN113901224A (en) Knowledge distillation-based secret-related text recognition model training method, system and device
CN117113937A (en) Electric power field reading and understanding method and system based on large-scale language model
CN111666375A (en) Matching method of text similarity, electronic equipment and computer readable medium
CN111104492A (en) Hierarchical Attention mechanism-based automatic question-answering method in civil aviation field
CN114357166B (en) Text classification method based on deep learning
CN115062123A (en) Knowledge base question-answer pair generation method of conversation generation system
CN114238649A (en) Common sense concept enhanced language model pre-training method
CN114168720A (en) Natural language data query method and storage device based on deep learning
CN114579706A (en) Automatic subjective question evaluation method based on BERT neural network and multitask learning
Shen et al. BERT-RF Fusion Model Based on Bayesian optimization for Sentiment Classification
CN113590744B (en) Expandable emotion tracing method
CN116821349B (en) Literature analysis method and management system based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination