WO2024046316A1 - 电力领域模型预训练方法、精调方法、装置、设备、存储介质和计算机程序产品 - Google Patents

电力领域模型预训练方法、精调方法、装置、设备、存储介质和计算机程序产品 Download PDF

Info

Publication number
WO2024046316A1
WO2024046316A1 PCT/CN2023/115522 CN2023115522W WO2024046316A1 WO 2024046316 A1 WO2024046316 A1 WO 2024046316A1 CN 2023115522 W CN2023115522 W CN 2023115522W WO 2024046316 A1 WO2024046316 A1 WO 2024046316A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
model
electric power
power domain
corpus
Prior art date
Application number
PCT/CN2023/115522
Other languages
English (en)
French (fr)
Inventor
宋博川
张强
周飞
刘同阳
范晓宣
贾全烨
Original Assignee
国网智能电网研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国网智能电网研究院有限公司 filed Critical 国网智能电网研究院有限公司
Publication of WO2024046316A1 publication Critical patent/WO2024046316A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • This application relates to, but is not limited to, the field of artificial intelligence technology, and in particular, to a power field model pre-training method, fine-tuning method, device, equipment, storage medium and computer program product.
  • NLP natural language processing
  • the model of pre-training and fine-tuning is widely used in NLP model training.
  • the pre-trained model can learn general knowledge of linguistics. Therefore, for different downstream tasks, its related label data can be used to fine-tune its related parameters, so that the trained NLP model has good performance.
  • the pre-training stage of the natural language processing model since it is not trained for downstream tasks, but for the tasks of the pre-training stage (such as predicting occluded words), it will lead to the distortion of the pre-trained model.
  • the transfer ability is weak, that is, when the pre-trained model is fine-tuned to obtain a model for downstream tasks, the model has poor adaptability and low prediction accuracy.
  • embodiments of the present application provide a power domain model pre-training method, fine-tuning method, device, equipment, storage medium and computer program product.
  • an embodiment of the present application provides a method for pre-training a model in the electric power field, which method includes:
  • the whole-word masking method is used to construct pre-training corpus for the electric power field model
  • the electric power field model is pre-trained using the pre-training corpus.
  • embodiments of the present application provide a method for fine-tuning a power domain model, including:
  • the pre-training corpus of the power domain pre-training model is obtained by performing word segmentation processing on the original power corpus data and then using full word masking, and the power domain pre-training model includes an attention matrix, so The above attention matrix introduces the relative position encoding between words;
  • the power domain model for downstream tasks is trained using the training data set.
  • an embodiment of the present application provides a power domain model pre-training device, including:
  • the acquisition module is configured to obtain original electric power corpus data
  • a processing module configured to process the original electric power corpus data, where the processing at least includes word segmentation processing;
  • the first building module is configured to use the whole-word masking method on the processed electric power corpus data to construct pre-training corpus for the electric power field model;
  • the second building module is configured to build a power domain model, where the power domain model includes an attention matrix that introduces relative position encoding between words;
  • a pre-training module is configured to pre-train the electric power domain model using the pre-training corpus.
  • an embodiment of the present application provides a fine-tuning device for a power domain model, including:
  • the third building module is configured to build a training data set for downstream tasks
  • the fourth building module is configured to use other network structures except the output layer in the electric power field pre-training model as the underlying encoder, construct the output layer network structure according to the downstream task, and connect the output layer network structure to all
  • a power domain model for downstream tasks is obtained.
  • the pre-training corpus of the power domain pre-training model is obtained by performing word segmentation processing on the original power corpus data and then using full word masking, and the power domain pre-training model is
  • the training model includes an attention matrix that introduces relative position encoding between words;
  • a training module configured to use the training data set to train the electric power domain model for downstream tasks.
  • an electronic device including:
  • a memory and a processor The memory and the processor are communicatively connected to each other.
  • the memory is used to store a computer program.
  • the computer program is executed by the processor, the power field described in the first aspect is realized.
  • embodiments of the present application provide a computer-readable storage medium configured to store a computer program.
  • the computer program is executed by a processor, the above-described first aspect is implemented.
  • inventions of the present application provide a computer program product.
  • the computer program product includes computer instructions.
  • the computer device causes the computer device to execute the first aspect.
  • the pre-training corpus of the electric power field model is constructed by using full-word masking, which avoids that when character masking is used to construct the pre-training corpus of the electric field model, the model can easily guess the masked words and ignore the words. and the semantic information between the entire sentence, which can improve the transfer ability of the pre-trained model.
  • the embodiment of the present application also introduces the relative position modeling between words in the built pre-training model, that is, the electric power domain model. For example, an attention matrix is added to introduce the relative position encoding between words.
  • Figure 1 is a schematic flow chart of a power field model pre-training method provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of the process of processing original power corpus data in the embodiment of the present application.
  • Figure 3 is a schematic flowchart of a method for fine-tuning a power domain model provided by an embodiment of the present application
  • Figure 4 is a schematic structural diagram of a power field model pre-training device provided by an embodiment of the present application.
  • Figure 5 is a schematic structural diagram of a fine-tuning device for a power domain model provided by an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • An embodiment of the present application provides a method for pre-training a model in the electric power field.
  • the method includes:
  • S102 Process the original electric power corpus data, the processing at least includes word segmentation processing;
  • S104 Construct a power domain model that includes an attention matrix that introduces relative position coding between words
  • S105 Use the pre-training corpus to pre-train the electric power domain model.
  • the electric power domain model may be a large electric power domain model, that is, a large-scale electric power domain model.
  • the original electric power corpus data can be a large amount of electric power data.
  • the processing can also include cleaning.
  • the cleaning process can be performed before the word segmentation operation. For example, it can be implemented using regular matching, BeautifulSoup and other toolkits.
  • the cleaning process is used to filter out the content in the original electric power corpus data. Some special symbols, including garbled characters, html symbols, etc., are used to obtain relatively clean corpus data.
  • the electric power domain model is used to predict the masked words in the pre-training corpus constructed using the whole-word masking method, and the prediction results are compared with the words before being masked. According to Compare the results to adjust the parameters of the power domain model.
  • the pre-training corpus of the electric power field model is constructed by using full-word masking, which avoids that when character masking is used to construct the pre-training corpus of the electric field model, the model can easily guess the masked words and ignore the words. and the semantic information between the entire sentence, which can improve the transfer ability of the pre-trained model.
  • the embodiment of the present application also introduces the relative position modeling between words in the built pre-training model, that is, the electric power domain model. For example, an attention matrix is added to introduce the relative position encoding between words.
  • Attention is an algorithmic formula of the attention matrix without introducing the relative position encoding. This formula calculates the Attention matrix for an attention head.
  • rel is a parameter related to the relative position between words.
  • rel is a scalar corresponding to an attention head for each input sample (sample, that is, a pre-training corpus).
  • Q, K and V represent Query, Key and Value respectively
  • V is a vector representing input features
  • Q and K are feature vectors used to calculate Attention weight. They are all input characteristics obtained. Attention(Q,K,V) multiplies V by the corresponding weight according to the degree of attention.
  • Q, K, V in the Attention mechanism that is, calculate the similarity between the current Query and all Keys, pass this similarity value through the Softmax layer to obtain a set of weights, and sum the products of this set of weights and the corresponding Value to obtain Value under Attention.
  • Q, K and V are obtained by transforming the input vector X through the matrices W Q , W K and W V.
  • W Q , W K and W V are three trainable parameter matrices.
  • d k is the dimension size of K.
  • the relative position encoding adopts the T5 encoding method to introduce the position offset into the attention matrix. That is, a relative position offset rel is added to the attention matrix.
  • the processing of the original electric power corpus data includes:
  • the BERT-CRF model and the electric power field dictionary are used to perform word segmentation processing on the original electric power corpus data.
  • the BERT-CRF model is trained using the electric power word segmentation data.
  • the BERT-CRF model trained using electric power word segmentation data is a word segmentation tool in the electric power field.
  • the BERT model is a commonly used pre-trained language model in the field of natural language processing.
  • the full name of BERT is Bidirectional Encoder Representation from Transformers; CRF stands for Conditional Random Fields (Conditional Random Fields), which is a traditional machine learning method.
  • the BERT-CRF model uses the "BMES" encoding mode, where "B” means that the current character is the beginning character of a multi-word word, "M” means that the current character is the middle character of a multi-word word, and "E” means that the current character is the beginning character of a multi-word word.
  • the ending character of a word "S" represents the current character as a single word.
  • "maintenance specifications of transformers” are marked as "B, M, E, S, B, E, B, E”
  • the corresponding word segmentation result is: "transformer / of / maintenance / specifications”.
  • the electric power field dictionary is also the electric power dictionary.
  • the BERT-CRF model is first used to perform word segmentation processing on the original electric power corpus data, and then the electric power dictionary is used to merge the separated electric power words to obtain the final word segmentation result.
  • the original electric power corpus data targeted by the word segmentation processing here may be the electric power corpus data that has been cleaned. Please refer to Figure 2. After word segmentation processing, what is obtained is a word sequence composed of a series of words.
  • the BERT-CRF model and the electric power domain dictionary obtained by training using electric power word segmentation data are used to perform word segmentation processing on the original electric power corpus data, which can segment the entities in the electric power field as a whole to the maximum extent. Ground ensures that electrical terms are not separated.
  • word segmentation tools in the electric power field may also be used in combination with the electric power field dictionary to perform word segmentation processing on the original electric power corpus data.
  • the whole-word masking method is used for the processed electric power corpus data to construct pre-training corpus for the electric power domain model, including:
  • the word sequence obtained after word segmentation can be randomly masked with a probability of 0.15, and the characters corresponding to all words that need to be masked are processed as follows: 10% are replaced with random characters, and 80% are replaced with Masking symbols (such as [MASK] as mentioned above) are processed by leaving 10% of the original characters unchanged.
  • Masking symbols such as [MASK] as mentioned above
  • the electric power domain model may be built based on the BERT model. Therefore, in order to maintain the consistency of model training, when using the whole word masking method to construct the pre-training corpus of the electric power domain model, each For a sentence that has been processed with full-word masking, special symbols [CLS] are added to the beginning of the sentence and special symbols [SEP] are added to the end of the sentence.
  • This embodiment of the present application also provides a method for fine-tuning the power domain model, including:
  • S302 Use other network structures except the output layer in the electric power field pre-training model (i.e., the encoding layer of the electric power field pre-training model) as the underlying encoder, and construct the output layer network structure according to the downstream task, and use the output layer After the network structure is connected to the underlying encoder, a power domain model for downstream tasks is obtained.
  • the pre-training corpus of the power domain pre-training model is obtained by performing word segmentation processing on the original power corpus data and then using full-word masking, and
  • the electric power field pre-training model includes an attention matrix, which introduces relative position coding between words;
  • the electric power domain pre-training model may be pre-trained using any of the electric power domain model pre-training methods described in the above embodiments.
  • the pre-training corpus of the electric power field model is constructed by using full-word masking, which avoids that when character masking is used to construct the pre-training corpus of the electric field model, the model can easily guess the masked words and ignore the words. and the semantic information between the entire sentence, which can improve the transfer ability of the pre-trained model.
  • the embodiment of the present application also introduces the relative position modeling between words in the built pre-training model, that is, the electric power domain model. For example, an attention matrix is added to introduce the relative position encoding between words.
  • the downstream task is a classification task
  • the output layer network structure is a fully connected network
  • a first network structure is further included between the underlying encoder and the fully connected network
  • the first network structure is used to extract the coding vectors of the first layer and the last layer in the underlying encoder and average them to obtain the first coding vector, and then average the first coding vectors of each word to obtain The encoding vector of the underlying encoder;
  • the fully connected network is used to output the confidence corresponding to each category based on the encoding vector of the underlying encoder.
  • the downstream task is a sequence labeling task
  • the output layer network structure is a conditional random field (CRF)
  • a Dropout layer and a mapping layer are also included between the underlying encoder and the CRF layer;
  • the output of the underlying encoder is a tensor in the shape of (batch_size, time_steps, hidden_size), where batch_size is the batch size, time_steps is the sequence length, and hidden_size is the hidden layer unit size of the underlying encoder;
  • the output of the underlying encoder is converted into a tensor in the shape of (batch_size, time_steps, num_classes) through the Dropout layer and the mapping layer, where num_classes is the number of target classes;
  • the conditional random field layer is used to obtain the label of each element in the entire sequence based on the tensor of shape (batch_size, time_steps, num_classes).
  • the entire sequence refers to the sequence that is input to the electric power domain model for the sequence annotation task and is to be annotated.
  • conditional random fields are used as the labeling structure for sequence labeling tasks.
  • the Dropout layer is used to set elements in the (batch_size, time_steps, hidden_size) shape tensor output by the underlying encoder to zero with a certain probability, which can increase the robustness of the model.
  • the tensor that has gone through Dropout is converted into a tensor of shape (batch_size, time_steps, num_classes) through the mapping layer.
  • This embodiment of the present application provides a power domain model pre-training device, which includes:
  • the acquisition module 401 is configured to acquire original power corpus data
  • the processing module 402 is configured to process the original power corpus data, where the processing at least includes word segmentation processing;
  • the first building module 403 is configured to use the whole-word masking method on the processed electric power corpus data to construct pre-training corpus for the electric power field model;
  • the second building module 404 is configured to build a power domain model, where the power domain model includes an attention matrix that introduces relative position encoding between words;
  • the pre-training module 405 is configured to pre-train the electric power domain model using the pre-training corpus.
  • the pre-training corpus of the electric power field model is constructed through whole-word masking, which avoids the use of words.
  • the model can easily guess the masked words, while ignoring the issue of semantic information between the words and the entire sentence, which can improve the transfer ability of the pre-training model.
  • the embodiment of the present application also introduces the relative position modeling between words in the built pre-training model, that is, the electric power domain model. For example, an attention matrix is added to introduce the relative position encoding between words.
  • Attention is the algorithm formula of the attention matrix without introducing the relative position encoding
  • rel is a parameter related to the relative position between words.
  • the processing module 402 is configured to use the BERT-CRF model and the electric power domain dictionary to perform word segmentation processing on the original electric power corpus data.
  • the BERT-CRF model is trained using the electric power word segmentation data.
  • the first building module 403 includes:
  • the masking unit is configured to use a preset probability to perform random whole-word masking on the electric power corpus data obtained after the processing, and replace part of the characters corresponding to all words that need to be masked with random characters and the other part with masking symbols, The remaining characters remain unchanged.
  • the embodiment of the present application is a device embodiment based on the same inventive concept as the above-mentioned embodiment of the electric power field model pre-training method. Therefore, for specific technical details and corresponding technical effects, please refer to the above-mentioned electric field model pre-training method embodiment, which will not be discussed here. Again.
  • This embodiment of the present application provides a fine-tuning device for a power domain model.
  • the device includes:
  • the third building module 501 is configured to build a training data set for downstream tasks
  • the fourth building module 502 is configured to use other network structures except the output layer in the electric power field pre-training model as the underlying encoder, build the output layer network structure according to the downstream task, and connect the output layer network structure to After the underlying encoder, a power domain model for downstream tasks is obtained.
  • the pre-training corpus of the power domain pre-training model is obtained by performing word segmentation processing on the original power corpus data and then using full-word masking, and the power domain model
  • the pre-training model includes an attention matrix, which introduces relative position encoding between words;
  • the training module 503 is configured to use the training data set to train the power domain model for downstream tasks.
  • the pre-training corpus of the electric power field model is constructed by using full-word masking, which avoids that when character masking is used to construct the pre-training corpus of the electric field model, the model can easily guess the masked words and ignore the words. and the semantic information between the entire sentence, which can improve the transfer ability of the pre-trained model.
  • the embodiment of the present application also introduces the relative position modeling between words in the built pre-training model, that is, the electric power domain model. For example, an attention matrix is added to introduce the relative position encoding between words.
  • the downstream task is a classification task
  • the output layer network structure is a fully connected network
  • a first network structure is further included between the underlying encoder and the fully connected network
  • the first network structure is used to extract the coding vectors of the first layer and the last layer in the underlying encoder and average them to obtain the first coding vector, and then average the first coding vectors of each word to obtain The encoding vector of the underlying encoder;
  • the fully connected network is used to output the confidence corresponding to each category based on the encoding vector of the underlying encoder.
  • the downstream task is a sequence labeling task
  • the output layer network structure is a conditional random field
  • a Dropout layer and a mapping layer are also included between the underlying encoder and the conditional random field layer
  • the output of the underlying encoder is a tensor in the shape of (batch_size, time_steps, hidden_size), where batch_size is the batch size, time_steps is the sequence length, and hidden_size is the hidden layer unit size of the underlying encoder;
  • the output of the underlying encoder is converted into a tensor in the shape of (batch_size, time_steps, num_classes) through the Dropout layer and the mapping layer, where num_classes is the number of target classes;
  • conditional random field layer is used to obtain the label of each element in the entire sequence based on the tensor of shape (batch_size, time_steps, num_classes).
  • the embodiment of the present application is a device embodiment based on the same inventive concept as the above-mentioned embodiment of the fine-tuning method for the power domain model. Therefore, please refer to the above-mentioned embodiment of the fine-tuning method for the power domain model for specific technical details and corresponding technical effects. No further details will be given.
  • the electronic device may include a processor 61 and a memory 62, where the processor 61 and the memory 62 may communicate with each other through a bus or other means.
  • Figure 6 Take the example of connecting via a bus.
  • the processor 61 may be a central processing unit (Central Processing Unit, CPU).
  • the processor 61 can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or Other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components and other chips, or combinations of the above types of chips.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • Other programmable logic devices discrete gate or transistor logic devices, discrete hardware components and other chips, or combinations of the above types of chips.
  • the memory 62 can be used to store non-transitory software programs, non-transitory computer executable programs and modules, such as program instructions corresponding to the power domain model pre-training method in the embodiment of the present application.
  • / module for example, the acquisition module 401, the processing module 402, the first building module 403, the second building module 404 and the pre-training module 405 shown in Figure 4
  • the fine-tuning method of the electric power domain model in the embodiment of the present application corresponds to program instructions/modules (for example, the third building module 501, the fourth building module 502 and the training module 503 shown in Figure 5).
  • the processor 61 executes various functional applications and data processing of the processor by running non-transient software programs, instructions and modules stored in the memory 62, that is, implementing the power domain model pre-training method or the power domain model in the above method embodiment. Fine-tuning methods for domain models.
  • the memory 62 may include a program storage area and a data storage area, where the program storage area may store an operating system and an application program required for at least one function; the storage data area may store data created by the processor 61 and the like.
  • memory 62 may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device.
  • memory 62 optionally includes memory located remotely relative to processor 61, and these remote memories may be connected to processor 61 through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • the one or more modules are stored in the memory 62.
  • the power domain model pre-training method or the power domain model fine-tuning method in the above method embodiment is executed.
  • embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium is used to store a computer program.
  • the above embodiments of the power field model pre-training method are implemented.
  • Each process or each process of implementing the above fine-tuning method embodiment of the power domain model can achieve the same technical effect. To avoid duplication, it will not be described again here.
  • Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (Phase-Change Random Access Memory, PRAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), and others.
  • RAM Random Access Memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • CD-ROM Compact Disc Read-Only Memory
  • DVD Digital Versatile Disc
  • magnetic tape cassette magnetic tape disk storage or other magnetic storage device, or Any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory media, such as modulated data. data signal and carrier.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Machine Translation (AREA)

Abstract

本申请公开了一种电力领域模型预训练方法、精调方法、装置及设备,其中,所述预训练方法包括:获取原始电力语料数据;对所述原始电力语料数据进行处理,所述处理至少包括分词处理;对处理后得到的电力语料数据,采用全词遮蔽的方法,构建电力领域模型的预训练语料;构建电力领域模型,所述电力领域模型包括注意力矩阵,所述注意力矩阵引入了词与词之间的相对位置编码;利用所述预训练语料,对所述电力领域模型进行预训练。本申请提供的技术方案,能够提升预训练模型的迁移能力。

Description

电力领域模型预训练方法、精调方法、装置、设备、存储介质和计算机程序产品
相关申请的交叉引用
本申请实施例基于申请号为202211060951.9、申请日为2022年09月01日、申请名称为“电力领域模型预训练方法、精调方法、装置及设备”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及但不限于人工智能技术领域,尤其涉及一种电力领域模型预训练方法、精调方法、装置、设备、存储介质和计算机程序产品。
背景技术
现有的自然语言处理(Natural Language Processing,NLP)模型包含的参数可以达到上百万。因此,训练出具有良好性能的NLP模型需要大量的训练样本和标签数据。通常,采用人工对训练样本进行标注。因此,获取大量的标签数据,需要较高的人工成本。
在此背景下,预训练加精调的模式广泛应用于NLP模型训练。首先利用成本较低且容易获取的训练数据训练一个预训练模型。通过这种方式,预训练模型可以学习到语言学的通用知识。因此,针对不同的下游任务,可以利用其相关的标签数据对其相关的参数进行精调,使得训练的NLP模型具有良好性能。
但是,在自然语言处理模型的预训练阶段,由于并非是针对下游任务进行训练的,而是针对预训练阶段的任务(例如预测遮蔽的词语)进行训练的,因此会导致预训练出的模型的迁移能力弱,即在对预训练模型进行精调得到针对下游任务的模型时,模型的适应性差,预测精度低。
发明内容
有鉴于此,本申请实施例提供了一种电力领域模型预训练方法、精调方法、装置、设备、存储介质和计算机程序产品。
根据第一方面,本申请实施例提供了一种电力领域模型预训练方法,所述方法包括:
获取原始电力语料数据;
对所述原始电力语料数据进行处理,所述处理至少包括分词处理;
对处理后得到的电力语料数据,采用全词遮蔽的方法,构建电力领域模型的预训练语料;
构建电力领域模型,所述电力领域模型包括注意力矩阵,所述注意力矩阵引入了词与词之间的相对位置编码;
利用所述预训练语料,对所述电力领域模型进行预训练。
根据第二方面,本申请实施例提供了一种电力领域模型的精调方法,包括:
针对下游任务构建训练用数据集;
将电力领域预训练模型中除输出层以外的其他网络结构作为底层编码器,并根据所述下游任务构建输出层网络结构,将所述输出层网络结构连接至所述底层编码器之后,得到针对下游任务的电力领域模型,所述电力领域预训练模型的预训练语料是通过对原始电力语料数据进行分词处理之后采用全词遮蔽得到的,且所述电力领域预训练模型包括注意力矩阵,所 述注意力矩阵引入了词与词之间的相对位置编码;
利用所述训练用数据集对所述针对下游任务的电力领域模型进行训练。
根据第三方面,本申请实施例提供了一种电力领域模型预训练装置,包括:
获取模块,被配置为获取原始电力语料数据;
处理模块,被配置为对所述原始电力语料数据进行处理,所述处理至少包括分词处理;
第一构建模块,被配置为对处理后得到的电力语料数据,采用全词遮蔽的方法,构建电力领域模型的预训练语料;
第二构建模块,被配置为构建电力领域模型,所述电力领域模型包括注意力矩阵,所述注意力矩阵引入了词与词之间的相对位置编码;
预训练模块,被配置为利用所述预训练语料,对所述电力领域模型进行预训练。
根据第四方面,本申请实施例提供了一种电力领域模型的精调装置,包括:
第三构建模块,被配置为针对下游任务构建训练用数据集;
第四构建模块,被配置为将电力领域预训练模型中除输出层以外的其他网络结构作为底层编码器,并根据所述下游任务构建输出层网络结构,将所述输出层网络结构连接至所述底层编码器之后,得到针对下游任务的电力领域模型,所述电力领域预训练模型的预训练语料是通过对原始电力语料数据进行分词处理之后采用全词遮蔽得到的,且所述电力领域预训练模型包括注意力矩阵,所述注意力矩阵引入了词与词之间的相对位置编码;
训练模块,被配置为利用所述训练用数据集对所述针对下游任务的电力领域模型进行训练。
根据第五方面,本申请实施例提供了一种电子设备,包括:
存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器用于存储计算机程序,所述计算机程序被所述处理器执行时,实现上述第一方面所述的电力领域模型预训练方法,或实现上述第二方面所述的电力领域模型的精调方法。
根据第六方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质被配置为存储计算机程序,所述计算机程序被处理器执行时,实现上述第一方面所述的电力领域模型预训练方法,或实现上述第二方面所述的电力领域模型的精调方法。
根据第七方面,本申请实施例提供了一种计算机程序产品,所述计算机程序产品包括计算机指令,在所述计算机指令在计算机设备上运行的情况下,使得所述计算机设备执行上述第一方面所述的电力领域模型预训练方法,或执行上述第二方面所述的电力领域模型的精调方法。
本申请实施例中,通过全词遮蔽的方式构建电力领域模型的预训练语料,避免了使用字符遮蔽方式构建电力领域模型的预训练语料时,模型能轻易猜出遮蔽的词语,而忽略了词语和整个句子之间的语义信息的问题,可以提升预训练模型的迁移能力。另外,本申请实施例还在构建的预训练模型,即电力领域模型中引入了词与词之间的相对位置建模,例如增加了引入词与词之间的相对位置编码的注意力矩阵,从而可以使得模型更加关注词与词之间的相对位置,进而对词与词之间的相对位置更加敏感,从而使得预训练的电力领域模型不仅适用于预训练阶段的遮蔽词语预测任务,而且更容易迁移至下游任务。
附图说明
通过参考附图会更加清楚的理解本申请的特征和优点,附图是示意性的而不应理解为对本申请进行任何限制,在附图中:
图1为本申请实施例提供的一种电力领域模型预训练方法的流程示意图;
图2为本申请实施例中的对原始电力语料数据进行处理的过程示意图;
图3为本申请实施例提供的一种电力领域模型的精调方法的流程示意图;
图4为本申请实施例提供的一种电力领域模型预训练装置的结构示意图;
图5为本申请实施例提供的一种电力领域模型的精调装置的结构示意图;
图6为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。此外,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。在以下各实施例的描述中,“多个”的含义是两个以上,除非另有明确具体的限定。
请参阅图1,本申请实施例提供一种电力领域模型预训练方法,所述方法包括:
S101:获取原始电力语料数据;
S102:对所述原始电力语料数据进行处理,所述处理至少包括分词处理;
S103:对处理后得到的电力语料数据,采用全词遮蔽的方法,构建电力领域模型的预训练语料;
S104:构建电力领域模型,所述电力领域模型包括注意力矩阵,所述注意力矩阵引入了词与词之间的相对位置编码;
S105:利用所述预训练语料,对所述电力领域模型进行预训练。
示例性地,电力领域模型可以是电力领域大模型,即电力领域大规模模型。原始电力语料数据可以是大量的电力数据,所述处理还可以包括清洗,清洗处理可以在分词操作之前,例如可以采用正则匹配、BeautifulSoup等工具包实现,清洗处理用于过滤掉原始电力语料数据中的一些特殊符号,包括乱码、html符号等,得到较为干净的语料数据。
在对所述电力领域模型进行训练时,利用所述电力领域模型对采用全词遮蔽方法构建的预训练语料中被遮蔽的词语进行预测,并将预测结果与被遮蔽前的词语进行比较,根据比较结果调整电力领域模型的参数。
本申请实施例中,通过全词遮蔽的方式构建电力领域模型的预训练语料,避免了使用字符遮蔽方式构建电力领域模型的预训练语料时,模型能轻易猜出遮蔽的词语,而忽略了词语和整个句子之间的语义信息的问题,可以提升预训练模型的迁移能力。另外,本申请实施例还在构建的预训练模型,即电力领域模型中引入了词与词之间的相对位置建模,例如增加了引入词与词之间的相对位置编码的注意力矩阵,从而可以使得模型更加关注词与词之间的相对位置,进而对词与词之间的相对位置更加敏感,从而使得预训练的电力领域模型不仅适用于预训练阶段的遮蔽词语预测任务,而且更容易迁移至下游任务。
一些实施方式中,引入了词与词之间的相对位置编码的所述注意力矩阵的算法公式如公式(1)所示:
Attention_rel(Q,K,V)=Attention(Q,K,V)+rel   (1);
其中,Attention(Q,K,V)为未引入所述相对位置编码的注意力矩阵的算法公式,该式计算的是针对一个注意力头的Attention矩阵。rel是与词与词之间的相对位置有关的参数,rel对每一个输入样本(sample,即一条预训练语料)为一个对应于一个注意力头的标量。
示例性地,Q、K和V分别代表Query、Key和Value,V是表示输入特征的向量,Q、K是计算Attention权重的特征向量。它们都是由输入 特征得到的。Attention(Q,K,V)是根据关注程度对V乘以相应权重。Attention机制中的Q,K,V,即是,对当前的Query和所有的Key计算相似度,将这个相似度值通过Softmax层得到一组权重,根据这组权重与对应Value的乘积求和得到Attention下的Value值。Q、K和V是将输入向量X通过矩阵WQ、WK、WV变换得到,WQ、WK、WV是三个可训练的参数矩阵。dk为K的维度大小。
本申请实施例中,相对位置编码采用T5的编码方式,将位置偏置引入到注意力矩阵中。即在注意力矩阵的基础上加入一项相对位置偏置rel。
一些实施方式中,所述对所述原始电力语料数据进行处理,包括:
采用BERT-CRF模型和电力领域词典对所述原始电力语料数据进行分词处理,所述BERT-CRF模型是利用电力分词语料进行训练得到。
其中,利用电力分词语料进行训练得到的所述BERT-CRF模型是一种电力领域分词工具。BERT模型是自然语言处理领域一个常用的预训练语言模型,BERT的全称是Bidirectional Encoder Representation from Transformers;CRF表示条件随机场(Conditional Random Fields),是一种传统的机器学习方法。BERT-CRF模型采用“BMES”的编码模式,其中,“B”代表当前字符为多字词语的开头字符,“M”代表当前字符为多字词语的中间字符,“E”代表当前字符为多字词语的结尾字符,“S”代表当前字符为一个单字词。例如“变压器的检修规范”经过标注后得到“B,M,E,S,B,E,B,E”,对应的分词结果为:“变压器/的/检修/规范”。电力领域词典也即电力词典。本申请实施例中,首先利用BERT-CRF模型对原始电力语料数据进行分词处理,然后利用电力词典,将被切分开的电力词语进行合并,得到最终的分词结果。这里分词处理所针对的原始电力语料数据可以是已经经过清洗处理的电力语料数据。请参阅图2,分词处理后得到的是一系列的词语组成的词语序列。
本申请实施例中,采用利用电力分词语料进行训练得到的BERT-CRF模型和电力领域词典对所述原始电力语料数据进行分词处理,能够将电力领域的实体作为一个整体进行切分,最大限度地保证电力专有名词不被分开。
在其他的可选实施方式中,还可以采用其他的电力领域分词工具并结合所述电力领域词典对所述原始电力语料数据进行分词处理。
传统的模型预训练阶段,采用非全词遮蔽的字符遮蔽方法,可能会导致词语在处理过程中出现部分遮蔽的问题。例如“变压器的检修规范”经过字符遮蔽,可能变成:“变”“[MASK]”“器”“的”“检”“修”“规”“范”。其中,“变压器”的“压”字被单独遮蔽了。这种情况可能会使模型更关注局部的词语信息。上例中,模型从“变”和“器”字就能猜出“压”字,进而忽略了词语和整个句子之间的语义信息。而全词遮蔽则会对整个电力名词进行遮蔽,上例经过全词遮蔽后,变为:“[MASK]”“[MASK]”“[MASK]”“的”“检”“修”“规”“范”。模型为了预测被遮蔽的电力名词“变压器”,需要从整个句子中挖掘被遮蔽词的语义信息,进而使模型建立起电力名词和整个句子之间的语义联系。
本申请的一些实施方式中,所述对处理后得到的电力语料数据,采用全词遮蔽的方法,构建电力领域模型的预训练语料,包括:
对所述处理后得到的电力语料数据采用预设概率进行随机全词遮蔽,将所有需要遮蔽的词语对应的字符中的一部分替换为随机字符、另一部分替换为遮蔽符号、剩余部分保留原来的字符不变。
举例来说,可以对分词处理后得到的词语序列采用0.15的概率进行随机全词遮蔽,对所有需要遮蔽的词语对应的字符按照下述方法处理:按照10%替换为随机字符、80%替换为遮蔽符号(例如上文所述的[MASK]),10%保留原来的字符不变的方法进行处理。
另外,本申请实施例中,所述电力领域模型可以是基于BERT模型构建的,因此,为保持模型训练的一致性,在采用全词遮蔽的方法构建电力领域模型的预训练语料时,对每个已进行全词遮蔽处理的句子,在句首加入特殊符号[CLS]、在句末加入特殊符号[SEP]。
请参阅图3,本申请实施例还提供一种电力领域模型的精调方法,包括:
S301:针对下游任务构建训练用数据集;
S302:将电力领域预训练模型中除输出层以外的其他网络结构(即电力领域预训练模型的编码层)作为底层编码器,并根据所述下游任务构建输出层网络结构,将所述输出层网络结构连接至所述底层编码器之后,得到针对下游任务的电力领域模型,所述电力领域预训练模型的预训练语料是通过对原始电力语料数据进行分词处理之后采用全词遮蔽得到的,且所述电力领域预训练模型包括注意力矩阵,所述注意力矩阵引入了词与词之间的相对位置编码;
S303:利用所述训练用数据集对所述针对下游任务的电力领域模型进行训练。
示例性地,所述电力领域预训练模型可以是利用上述实施例所述的任一种电力领域模型预训练方法预训练得到。
本申请实施例中,通过全词遮蔽的方式构建电力领域模型的预训练语料,避免了使用字符遮蔽方式构建电力领域模型的预训练语料时,模型能轻易猜出遮蔽的词语,而忽略了词语和整个句子之间的语义信息的问题,可以提升预训练模型的迁移能力。另外,本申请实施例还在构建的预训练模型,即电力领域模型中引入了词与词之间的相对位置建模,例如增加了引入词与词之间的相对位置编码的注意力矩阵,从而可以使得模型更加关注词与词之间的相对位置,进而对词与词之间的相对位置更加敏感,从而使得预训练的电力领域模型不仅适用于预训练阶段的遮蔽词语预测任务,而且更容易迁移至下游任务。
本申请实施例中,在电力领域模型的精调阶段,需要根据不同的下游任务设计不同的输出层网络结构。下面针对自然语言处理任务中的常见任务进行举例说明。
一些实施方式中,所述下游任务为分类任务,所述输出层网络结构为全连接网络;且所述底层编码器与所述全连接网络之间还包括第一网络结构;
所述第一网络结构用于抽取所述底层编码器中的第一层和最后一层的编码向量并求平均,得到第一编码向量,再对各个词的所述第一编码向量取平均得到所述底层编码器的编码向量;
所述全连接网络用于基于所述底层编码器的编码向量输出每个类别对应的置信度。
另一些实施方式中,所述下游任务为序列标注任务,所述输出层网络结构为条件随机场(CRF),且所述底层编码器与条件随机场层之间还包括Dropout层和映射层;
所述底层编码器的输出为(batch_size,time_steps,hidden_size)形状的张量,其中,batch_size为批大小、time_steps为序列长度、hidden_size为所述底层编码器的隐层单元大小;
所述底层编码器的输出经过所述Dropout层和所述映射层转换为(batch_size,time_steps,num_classes)形状的张量,其中,num_classes为目标类的数量;
所述条件随机场层用于基于所述(batch_size,time_steps,num_classes)形状的张量得到整个序列中每个元素的标签。该整个序列是指输入至针对序列标注任务的电力领域模型、待进行标注的序列。
其中,条件随机场作为序列标注任务的标注结构。所述Dropout层用于对所述底层编码器输出的(batch_size,time_steps,hidden_size)形状的张量中的元素以一定概率置零,可以增加模型的鲁棒性。经过Dropout的张量通过所述映射层转换为(batch_size,time_steps,num_classes)形状的张量。
相应地,请参考图4,本申请实施例提供一种电力领域模型预训练装置,该装置包括:
获取模块401,被配置为获取原始电力语料数据;
处理模块402,被配置为对所述原始电力语料数据进行处理,所述处理至少包括分词处理;
第一构建模块403,被配置为对处理后得到的电力语料数据,采用全词遮蔽的方法,构建电力领域模型的预训练语料;
第二构建模块404,被配置为构建电力领域模型,所述电力领域模型包括注意力矩阵,所述注意力矩阵引入了词与词之间的相对位置编码;
预训练模块405,被配置为利用所述预训练语料,对所述电力领域模型进行预训练。
本申请实施例中,通过全词遮蔽的方式构建电力领域模型的预训练语料,避免了使用字 符遮蔽方式构建电力领域模型的预训练语料时,模型能轻易猜出遮蔽的词语,而忽略了词语和整个句子之间的语义信息的问题,可以提升预训练模型的迁移能力。另外,本申请实施例还在构建的预训练模型,即电力领域模型中引入了词与词之间的相对位置建模,例如增加了引入词与词之间的相对位置编码的注意力矩阵,从而可以使得模型更加关注词与词之间的相对位置,进而对词与词之间的相对位置更加敏感,从而使得预训练的电力领域模型不仅适用于预训练阶段的遮蔽词语预测任务,而且更容易迁移至下游任务。
一些实施方式中,引入了词与词之间的相对位置编码的所述注意力矩阵的算法公式如公式(2)所示:
Attention_rel(Q,K,V)=Attention(Q,K,V)+rel   (2);
其中,Attention(Q,K,V)为未引入所述相对位置编码的注意力矩阵的算法公式,rel是与词与词之间的相对位置有关的参数。
一些实施方式中,所述处理模块402被配置为采用BERT-CRF模型和电力领域词典对所述原始电力语料数据进行分词处理,所述BERT-CRF模型是利用电力分词语料进行训练得到。
一些实施方式中,所述第一构建模块403包括:
遮蔽单元,被配置为对所述处理后得到的电力语料数据采用预设概率进行随机全词遮蔽,将所有需要遮蔽的词语对应的字符中的一部分替换为随机字符、另一部分替换为遮蔽符号、剩余部分保留原来的字符不变。
本申请实施例是与上述电力领域模型预训练方法实施例基于相同的发明构思的装置实施例,因此具体的技术细节和对应的技术效果请参阅上述电力领域模型预训练方法实施例,此处不再赘述。
相应地,请参考图5,本申请实施例提供一种电力领域模型的精调装置,该装置包括:
第三构建模块501,被配置为针对下游任务构建训练用数据集;
第四构建模块502,被配置为将电力领域预训练模型中除输出层以外的其他网络结构作为底层编码器,并根据所述下游任务构建输出层网络结构,将所述输出层网络结构连接至所述底层编码器之后,得到针对下游任务的电力领域模型,所述电力领域预训练模型的预训练语料是通过对原始电力语料数据进行分词处理之后采用全词遮蔽得到的,且所述电力领域预训练模型包括注意力矩阵,所述注意力矩阵引入了词与词之间的相对位置编码;
训练模块503,被配置为利用所述训练用数据集对所述针对下游任务的电力领域模型进行训练。
本申请实施例中,通过全词遮蔽的方式构建电力领域模型的预训练语料,避免了使用字符遮蔽方式构建电力领域模型的预训练语料时,模型能轻易猜出遮蔽的词语,而忽略了词语和整个句子之间的语义信息的问题,可以提升预训练模型的迁移能力。另外,本申请实施例还在构建的预训练模型,即电力领域模型中引入了词与词之间的相对位置建模,例如增加了引入词与词之间的相对位置编码的注意力矩阵,从而可以使得模型更加关注词与词之间的相对位置,进而对词与词之间的相对位置更加敏感,从而使得预训练的电力领域模型不仅适用于预训练阶段的遮蔽词语预测任务,而且更容易迁移至下游任务。
一些实施方式中,所述下游任务为分类任务,所述输出层网络结构为全连接网络;且所述底层编码器与所述全连接网络之间还包括第一网络结构;
所述第一网络结构用于抽取所述底层编码器中的第一层和最后一层的编码向量并求平均,得到第一编码向量,再对各个词的所述第一编码向量取平均得到所述底层编码器的编码向量;
所述全连接网络用于基于所述底层编码器的编码向量输出每个类别对应的置信度。
一些实施方式中,所述下游任务为序列标注任务,所述输出层网络结构为条件随机场,且所述底层编码器与条件随机场层之间还包括Dropout层和映射层;
所述底层编码器的输出为(batch_size,time_steps,hidden_size)形状的张量,其中,batch_size为批大小、time_steps为序列长度、hidden_size为所述底层编码器的隐层单元大小;
所述底层编码器的输出经过所述Dropout层和所述映射层转换为(batch_size,time_steps,num_classes)形状的张量,其中,num_classes为目标类的数量;
所述条件随机场层用于基于所述(batch_size,time_steps,num_classes)形状的张量得到整个序列中每个元素的标签。
本申请实施例是与上述电力领域模型的精调方法实施例基于相同的发明构思的装置实施例,因此具体的技术细节和对应的技术效果请参阅上述电力领域模型的精调方法实施例,此处不再赘述。
本申请实施例还提供了一种电子设备,如图6所示,该电子设备可以包括处理器61和存储器62,其中处理器61和存储器62可以通过总线或者其他方式互相通信连接,图6中以通过总线连接为例。
处理器61可以为中央处理器(Central Processing Unit,CPU)。处理器61还可以为其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等芯片,或者上述各类芯片的组合。
存储器62作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序、非暂态计算机可执行程序以及模块,如本申请实施例中的电力领域模型预训练方法对应的程序指令/模块(例如,图4所示的获取模块401、处理模块402、第一构建模块403、第二构建模块404和预训练模块405)或本申请实施例中的电力领域模型的精调方法对应的程序指令/模块(例如,图5所示的第三构建模块501、第四构建模块502和训练模块503)。处理器61通过运行存储在存储器62中的非暂态软件程序、指令以及模块,从而执行处理器的各种功能应用以及数据处理,即实现上述方法实施例中的电力领域模型预训练方法或电力领域模型的精调方法。
存储器62可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储处理器61所创建的数据等。此外,存储器62可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施例中,存储器62可选包括相对于处理器61远程设置的存储器,这些远程存储器可以通过网络连接至处理器61。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
所述一个或者多个模块存储在所述存储器62中,当被所述处理器61执行时,执行上述方法实施例中的电力领域模型预训练方法或电力领域模型的精调方法。
上述电子设备具体细节可以对应参阅上文中的方法实施例中对应的相关描述和效果进行理解,此处不再赘述。
相应地,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序被处理器执行时,实现上述电力领域模型预训练方法实施例的各个过程或者实现上述电力领域模型的精调方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(Phase-Change Random Access Memory,PRAM)、静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、其他类型的随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能光盘(Digital Versatile Disc,DVD)或其他光学存储、磁盒式磁带,磁带磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数 据信号和载波。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (12)

  1. 一种电力领域模型预训练方法,所述方法包括:
    获取原始电力语料数据;
    对所述原始电力语料数据进行处理,所述处理至少包括分词处理;
    对处理后得到的电力语料数据,采用全词遮蔽的方法,构建电力领域模型的预训练语料;
    构建电力领域模型,所述电力领域模型包括注意力矩阵,所述注意力矩阵引入了词与词之间的相对位置编码;
    利用所述预训练语料,对所述电力领域模型进行预训练。
  2. 根据权利要求1所述的方法,其中,引入了词与词之间的相对位置编码的所述注意力矩阵的算法公式为:
    Attention_rel(Q,K,V)=Attention(Q,K,V)+rel
    其中,Attention(Q,K,V)为未引入所述相对位置编码的注意力矩阵的算法公式,V是输入特征的向量,Q、K是计算Attention权重的特征向量,rel是词与词之间的相对位置有关的参数。
  3. 根据权利要求1或2所述的方法,其中,所述对所述原始电力语料数据进行处理,包括:
    采用BERT-CRF模型和电力领域词典对所述原始电力语料数据进行分词处理,所述BERT-CRF模型是利用电力分词语料进行训练得到。
  4. 根据权利要求1或2所述的方法,其中,所述对处理后得到的电力语料数据,采用全词遮蔽的方法,构建电力领域模型的预训练语料,包括:
    对所述处理后得到的电力语料数据采用预设概率进行随机全词遮蔽,将所有需要遮蔽的词语对应的字符中的一部分替换为随机字符、另一部分替换为遮蔽符号、剩余部分保留原来的字符不变。
  5. 一种电力领域模型的精调方法,所述方法包括:
    针对下游任务构建训练用数据集;
    将电力领域预训练模型中除输出层以外的其他网络结构作为底层编码器,并根据所述下游任务构建输出层网络结构,将所述输出层网络结构连接至所述底层编码器之后,得到针对下游任务的电力领域模型,所述电力领域预训练模型的预训练语料是通过对原始电力语料数据进行分词处理之后采用全词遮蔽得到的,且所述电力领域预训练模型包括注意力矩阵,所述注意力矩阵引入了词与词之间的相对位置编码;
    利用所述训练用数据集对所述针对下游任务的电力领域模型进行训练。
  6. 根据权利要求5所述的方法,其中,所述下游任务为分类任务,所述输出层网络结构为全连接网络;且所述底层编码器与所述全连接网络之间还包括第一网络结构;
    所述第一网络结构用于抽取所述底层编码器中的第一层和最后一层的编码向量并求平均,得到第一编码向量,再对各个词的所述第一编码向量取平均得到所述底层编码器的编码向量;
    所述全连接网络用于基于所述底层编码器的编码向量输出每个类别对应的置信度。
  7. 根据权利要求5所述的方法,其中,所述下游任务为序列标注任务,所述输出层网络结构为条件随机场,且所述底层编码器与条件随机场层之间还包括Dropout层和映射层;
    所述底层编码器的输出为batch_size,time_steps,hidden_size形状的张量,其中,batch_size为批大小、time_steps为序列长度、hidden_size为所述底层编码器的隐层单元大小;
    所述底层编码器的输出经过所述Dropout层和所述映射层转换为batch_size,time_steps,num_classes形状的张量,其中,num_classes为目标类的数量;
    所述条件随机场层用于基于所述batch_size,time_steps,num_classes形状的张量得到整个序列中每个元素的标签。
  8. 一种电力领域模型预训练装置,所述装置包括:
    获取模块,被配置为获取原始电力语料数据;
    处理模块,被配置为对所述原始电力语料数据进行处理,所述处理至少包括分词处理;
    第一构建模块,被配置为对处理后得到的电力语料数据,采用全词遮蔽的方法,构建电力领域模型的预训练语料;
    第二构建模块,被配置为构建电力领域模型,所述电力领域模型包括注意力矩阵,所述注意力矩阵引入了词与词之间的相对位置编码;
    预训练模块,被配置为利用所述预训练语料,对所述电力领域模型进行预训练。
  9. 一种电力领域模型的精调装置,所述装置包括:
    第三构建模块,被配置为针对下游任务构建训练用数据集;
    第四构建模块,被配置为将电力领域预训练模型中除输出层以外的其他网络结构作为底层编码器,并根据所述下游任务构建输出层网络结构,将所述输出层网络结构连接至所述底层编码器之后,得到针对下游任务的电力领域模型,所述电力领域预训练模型的预训练语料是通过对原始电力语料数据进行分词处理之后采用全词遮蔽得到的,且所述电力领域预训练模型包括注意力矩阵,所述注意力矩阵引入了词与词之间的相对位置编码;
    训练模块,被配置为利用所述训练用数据集对所述针对下游任务的电力领域模型进行训练。
  10. 一种电子设备,包括:
    存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器用于存储计算机程序,所述计算机程序被所述处理器执行时,实现权利要求1至4中任一项所述的电力领域模型预训练方法,或实现权利要求5至7中任一项所述的电力领域模型的精调方法。
  11. 一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序被处理器执行时,实现权利要求1至4中任一项所述的电力领域模型预训练方法,或实现权利要求5至7中任一项所述的电力领域模型的精调方法。
  12. 一种计算机程序产品,所述计算机程序产品包括计算机指令,在所述计算机指令在计算机设备上运行的情况下,使得所述计算机设备执行权利要求1至4中任一项所述的电力领域模型预训练方法,或执行权利要求5至7中任一项所述的电力领域模型的精调方法。
PCT/CN2023/115522 2022-09-01 2023-08-29 电力领域模型预训练方法、精调方法、装置、设备、存储介质和计算机程序产品 WO2024046316A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211060951.9A CN115129826B (zh) 2022-09-01 2022-09-01 电力领域模型预训练方法、精调方法、装置及设备
CN202211060951.9 2022-09-01

Publications (1)

Publication Number Publication Date
WO2024046316A1 true WO2024046316A1 (zh) 2024-03-07

Family

ID=83387399

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/115522 WO2024046316A1 (zh) 2022-09-01 2023-08-29 电力领域模型预训练方法、精调方法、装置、设备、存储介质和计算机程序产品

Country Status (2)

Country Link
CN (1) CN115129826B (zh)
WO (1) WO2024046316A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129826B (zh) * 2022-09-01 2022-11-22 国网智能电网研究院有限公司 电力领域模型预训练方法、精调方法、装置及设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487145A (zh) * 2020-12-01 2021-03-12 重庆邮电大学 一种o2o商铺食品安全监测方法
CN113239700A (zh) * 2021-04-27 2021-08-10 哈尔滨理工大学 改进bert的文本语义匹配设备、系统、方法及存储介质
US20210334475A1 (en) * 2020-04-24 2021-10-28 Microsoft Technology Licensing, Llc Efficient transformer language models with disentangled attention and multi-step decoding
CN113642330A (zh) * 2021-07-19 2021-11-12 西安理工大学 基于目录主题分类的轨道交通规范实体识别方法
CN114386410A (zh) * 2022-01-11 2022-04-22 腾讯科技(深圳)有限公司 预训练模型的训练方法和文本处理方法
CN114579695A (zh) * 2022-01-20 2022-06-03 杭州量知数据科技有限公司 一种事件抽取方法、装置、设备及存储介质
CN114722208A (zh) * 2022-06-08 2022-07-08 成都健康医联信息产业有限公司 一种健康医疗文本自动分类和安全等级自动分级方法
CN115129826A (zh) * 2022-09-01 2022-09-30 国网智能电网研究院有限公司 电力领域模型预训练方法、精调方法、装置及设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632252B (zh) * 2020-12-25 2021-09-17 中电金信软件有限公司 对话应答方法、装置、计算机设备和存储介质
CN112632972B (zh) * 2020-12-25 2024-03-15 浙江国际海运职业技术学院 一种电网设备故障报告内故障信息的快速提取方法
CN112612881B (zh) * 2020-12-28 2022-03-25 电子科技大学 基于Transformer的中文智能对话方法
JP2024503518A (ja) * 2021-01-20 2024-01-25 オラクル・インターナショナル・コーポレイション 固有表現認識モデルを用いたコンテキストタグ統合
CN114647715A (zh) * 2022-04-07 2022-06-21 杭州电子科技大学 一种基于预训练语言模型的实体识别方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210334475A1 (en) * 2020-04-24 2021-10-28 Microsoft Technology Licensing, Llc Efficient transformer language models with disentangled attention and multi-step decoding
CN112487145A (zh) * 2020-12-01 2021-03-12 重庆邮电大学 一种o2o商铺食品安全监测方法
CN113239700A (zh) * 2021-04-27 2021-08-10 哈尔滨理工大学 改进bert的文本语义匹配设备、系统、方法及存储介质
CN113642330A (zh) * 2021-07-19 2021-11-12 西安理工大学 基于目录主题分类的轨道交通规范实体识别方法
CN114386410A (zh) * 2022-01-11 2022-04-22 腾讯科技(深圳)有限公司 预训练模型的训练方法和文本处理方法
CN114579695A (zh) * 2022-01-20 2022-06-03 杭州量知数据科技有限公司 一种事件抽取方法、装置、设备及存储介质
CN114722208A (zh) * 2022-06-08 2022-07-08 成都健康医联信息产业有限公司 一种健康医疗文本自动分类和安全等级自动分级方法
CN115129826A (zh) * 2022-09-01 2022-09-30 国网智能电网研究院有限公司 电力领域模型预训练方法、精调方法、装置及设备

Also Published As

Publication number Publication date
CN115129826B (zh) 2022-11-22
CN115129826A (zh) 2022-09-30

Similar Documents

Publication Publication Date Title
CN112528672B (zh) 一种基于图卷积神经网络的方面级情感分析方法及装置
CN108959246B (zh) 基于改进的注意力机制的答案选择方法、装置和电子设备
CN111274394B (zh) 一种实体关系的抽取方法、装置、设备及存储介质
JP6601470B2 (ja) 自然言語の生成方法、自然言語の生成装置及び電子機器
CN111666427B (zh) 一种实体关系联合抽取方法、装置、设备及介质
WO2021121198A1 (zh) 基于语义相似度的实体关系抽取方法、装置、设备及介质
WO2024046316A1 (zh) 电力领域模型预训练方法、精调方法、装置、设备、存储介质和计算机程序产品
CN111522965A (zh) 一种基于迁移学习的实体关系抽取的问答方法及系统
WO2018086519A1 (zh) 一种特定文本信息的识别方法及装置
WO2022116445A1 (zh) 文本纠错模型建立方法、装置、介质及电子设备
US11475225B2 (en) Method, system, electronic device and storage medium for clarification question generation
CN113076739A (zh) 一种实现跨领域的中文文本纠错方法和系统
WO2022174496A1 (zh) 基于生成模型的数据标注方法、装置、设备及存储介质
WO2023137911A1 (zh) 基于小样本语料的意图分类方法、装置及计算机设备
JP2021033995A (ja) テキスト処理装置、方法、デバイス及びコンピューター読み取り可能な記憶媒体
WO2022142011A1 (zh) 一种地址识别方法、装置、计算机设备及存储介质
CN113780194A (zh) 多模态预训练方法和装置
CN114861889A (zh) 深度学习模型的训练方法、目标对象检测方法和装置
WO2023178802A1 (zh) 命名实体识别方法、装置、设备和计算机可读存储介质
CN115859980A (zh) 一种半监督式命名实体识别方法、系统及电子设备
JP2023025126A (ja) 深層学習モデルのトレーニング方法及び装置、テキストデータ処理方法及び装置、電子機器、記憶媒体、並びにコンピュータプログラム
KR102608867B1 (ko) 업계 텍스트를 증분하는 방법, 관련 장치 및 매체에 저장된 컴퓨터 프로그램
JP2023544925A (ja) データ評価方法、トレーニング方法および装置、電子機器、記憶媒体、コンピュータプログラム
CN111126084A (zh) 数据处理方法、装置、电子设备和存储介质
CN112699656A (zh) 一种广告标题重写方法、装置、设备及储存介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23859346

Country of ref document: EP

Kind code of ref document: A1